- 11 Feb, 2007 16 commits
-
-
Christoph Lameter authored
This patchset follows up on the earlier work in Andrew's tree to reduce the number of zones. The patches allow to go to a minimum of 2 zones. This one allows also to make ZONE_DMA optional and therefore the number of zones can be reduced to one. ZONE_DMA is usually used for ISA DMA devices. There are a number of reasons why we would not want to have ZONE_DMA 1. Some arches do not need ZONE_DMA at all. 2. With the advent of IOMMUs DMA zones are no longer needed. The necessity of DMA zones may drastically be reduced in the future. This patchset allows a compilation of a kernel without that overhead. 3. Devices that require ISA DMA get rare these days. All my systems do not have any need for ISA DMA. 4. The presence of an additional zone unecessarily complicates VM operations because it must be scanned and balancing logic must operate on its. 5. With only ZONE_NORMAL one can reach the situation where we have only one zone. This will allow the unrolling of many loops in the VM and allows the optimization of varous code paths in the VM. 6. Having only a single zone in a NUMA system results in a 1-1 correspondence between nodes and zones. Various additional optimizations to critical VM paths become possible. Many systems today can operate just fine with a single zone. If you look at what is in ZONE_DMA then one usually sees that nothing uses it. The DMA slabs are empty (Some arches use ZONE_DMA instead of ZONE_NORMAL, then ZONE_NORMAL will be empty instead). On all of my systems (i386, x86_64, ia64) ZONE_DMA is completely empty. Why constantly look at an empty zone in /proc/zoneinfo and empty slab in /proc/slabinfo? Non i386 also frequently have no need for ZONE_DMA and zones stay empty. The patchset was tested on i386 (UP / SMP), x86_64 (UP, NUMA) and ia64 (NUMA). The RFC posted earlier (see http://marc.theaimsgroup.com/?l=linux-kernel&m=115231723513008&w=2) had lots of #ifdefs in them. An effort has been made to minize the number of #ifdefs and make this as compact as possible. The job was made much easier by the ongoing efforts of others to extract common arch specific functionality. I have been running this for awhile now on my desktop and finally Linux is using all my available RAM instead of leaving the 16MB in ZONE_DMA untouched: christoph@pentium940:~$ cat /proc/zoneinfo Node 0, zone Normal pages free 4435 min 1448 low 1810 high 2172 active 241786 inactive 210170 scanned 0 (a: 0 i: 0) spanned 524224 present 524224 nr_anon_pages 61680 nr_mapped 14271 nr_file_pages 390264 nr_slab_reclaimable 27564 nr_slab_unreclaimable 1793 nr_page_table_pages 449 nr_dirty 39 nr_writeback 0 nr_unstable 0 nr_bounce 0 cpu: 0 pcp: 0 count: 156 high: 186 batch: 31 cpu: 0 pcp: 1 count: 9 high: 62 batch: 15 vm stats threshold: 20 cpu: 1 pcp: 0 count: 177 high: 186 batch: 31 cpu: 1 pcp: 1 count: 12 high: 62 batch: 15 vm stats threshold: 20 all_unreclaimable: 0 prev_priority: 12 temp_priority: 12 start_pfn: 0 This patch: In two places in the VM we use ZONE_DMA to refer to the first zone. If ZONE_DMA is optional then other zones may be first. So simply replace ZONE_DMA with zone 0. This also fixes ZONETABLE_PGSHIFT. If we have only a single zone then ZONES_PGSHIFT may become 0 because there is no need anymore to encode the zone number related to a pgdat. However, we still need a zonetable to index all the zones for each node if this is a NUMA system. Therefore define ZONETABLE_SHIFT unconditionally as the offset of the ZONE field in page flags. [apw@shadowen.org: fix mismerge] Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <willy@debian.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Lameter authored
Values are available via ZVC sums. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Lameter authored
Values are readily available via ZVC per node and global sums. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Lameter authored
Function is unnecessary now. We can use the summing features of the ZVCs to get the values we need. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Lameter authored
nr_free_pages is now a simple access to a global variable. Make it a macro instead of a function. The nr_free_pages now requires vmstat.h to be included. There is one occurrence in power management where we need to add the include. Directly refrer to global_page_state() there to clarify why the #include was added. [akpm@osdl.org: arm build fix] [akpm@osdl.org: sparc64 build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Lameter authored
The global and per zone counter sums are in arrays of longs. Reorder the ZVCs so that the most frequently used ZVCs are put into the same cacheline. That way calculations of the global, node and per zone vm state touches only a single cacheline. This is mostly important for 64 bit systems were one 128 byte cacheline takes only 8 longs. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Lameter authored
This is again simplifies some of the VM counter calculations through the use of the ZVC consolidated counters. [michal.k.k.piotrowski@gmail.com: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Lameter authored
The determination of the dirty ratio to determine writeback behavior is currently based on the number of total pages on the system. However, not all pages in the system may be dirtied. Thus the ratio is always too low and can never reach 100%. The ratio may be particularly skewed if large hugepage allocations, slab allocations or device driver buffers make large sections of memory not available anymore. In that case we may get into a situation in which f.e. the background writeback ratio of 40% cannot be reached anymore which leads to undesired writeback behavior. This patchset fixes that issue by determining the ratio based on the actual pages that may potentially be dirty. These are the pages on the active and the inactive list plus free pages. The problem with those counts has so far been that it is expensive to calculate these because counts from multiple nodes and multiple zones will have to be summed up. This patchset makes these counters ZVC counters. This means that a current sum per zone, per node and for the whole system is always available via global variables and not expensive anymore to calculate. The patchset results in some other good side effects: - Removal of the various functions that sum up free, active and inactive page counts - Cleanup of the functions that display information via the proc filesystem. This patch: The use of a ZVC for nr_inactive and nr_active allows a simplification of some counter operations. More ZVC functionality is used for sums etc in the following patches. [akpm@osdl.org: UP build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
After do_wp_page has tested page_mkwrite, it must release old_page after acquiring page table lock, not before: at some stage that ordering got reversed, leaving a (very unlikely) window in which old_page might be truncated, freed, and reused in the same position. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Randy Dunlap authored
With CONFIG_SPARSEMEM=y: mm/rmap.c:579: warning: format '%lx' expects type 'long unsigned int', but argument 2 has type 'int' Make __page_to_pfn() return unsigned long. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrew Morton authored
This early break prevents us from displaying info for the vm stats thresholds if the zone doesn't have any pages in its per-cpu pagesets. So my 800MB i386 box says: Node 0, zone DMA pages free 2365 min 16 low 20 high 24 active 0 inactive 0 scanned 0 (a: 0 i: 0) spanned 4096 present 4044 nr_anon_pages 0 nr_mapped 1 nr_file_pages 0 nr_slab_reclaimable 0 nr_slab_unreclaimable 0 nr_page_table_pages 0 nr_dirty 0 nr_writeback 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 protection: (0, 868, 868) pagesets all_unreclaimable: 0 prev_priority: 12 start_pfn: 0 Node 0, zone Normal pages free 199713 min 934 low 1167 high 1401 active 10215 inactive 4507 scanned 0 (a: 0 i: 0) spanned 225280 present 222420 nr_anon_pages 2685 nr_mapped 1110 nr_file_pages 12055 nr_slab_reclaimable 2216 nr_slab_unreclaimable 1527 nr_page_table_pages 213 nr_dirty 0 nr_writeback 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 protection: (0, 0, 0) pagesets cpu: 0 pcp: 0 count: 152 high: 186 batch: 31 cpu: 0 pcp: 1 count: 13 high: 62 batch: 15 vm stats threshold: 16 cpu: 1 pcp: 0 count: 34 high: 186 batch: 31 cpu: 1 pcp: 1 count: 10 high: 62 batch: 15 vm stats threshold: 16 all_unreclaimable: 0 prev_priority: 12 start_pfn: 4096 Just nuke all that search-for-the-first-non-empty-pageset code. Dunno why it was there in the first place.. Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mel Gorman authored
find_min_pfn_for_node() and find_min_pfn_with_active_regions() sort early_node_map[] on every call. This is an excessive amount of sorting and that can be avoided. This patch always searches the whole early_node_map[] in find_min_pfn_for_node() instead of returning the first value found. The map is then only sorted once when required. Successfully boot tested on a number of machines. [akpm@osdl.org: cleanup] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Robert P. J. Day authored
Remove the last vestiges of the long-deprecated "MAP_ANON" page protection flag: use "MAP_ANONYMOUS" instead. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Lameter authored
Use the pointer passed to cache_reap to determine the work pointer and consolidate exit paths. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Pekka Enberg authored
Clean up __cache_alloc and __cache_alloc_node functions a bit. We no longer need to do NUMA_BUILD tricks and the UMA allocation path is much simpler. No functional changes in this patch. Note: saves few kernel text bytes on x86 NUMA build due to using gotos in __cache_alloc_node() and moving __GFP_THISNODE check in to fallback_alloc(). Cc: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Manfred Spraul <manfred@colorfullife.com> Acked-by: Christoph Lameter <christoph@lameter.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Pekka Enberg authored
The PageSlab debug check in kfree_debugcheck() is broken for compound pages. It is also redundant as we already do BUG_ON for non-slab pages in page_get_cache() and page_get_slab() which are always called before we free any actual objects. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 09 Feb, 2007 24 commits
-
-
Jeff Garzik authored
The ATA_ENABLE_PATA define was never meant to be permanent, and in recent kernels, it's already been unconditionally enabled. Remove. Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Jeff Garzik authored
Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Alan authored
If we are doing a PIO setup for a CFA card and it blows up with a device error then assume it is an older CFA card which doesn't support this rather than failing the device out of existance. Stands seperate to the quieting patch but that is obviously useful with this change. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Robert Hancock authored
ata_pci_device_do_resume can fail if the PCI device couldn't be re-enabled. Update sata_nv to propagate the return value from this call and to not try to do any other resume activities if it fails. Fixes a compile warning. Signed-off-by: Robert Hancock <hancockr@shaw.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Robert Hancock authored
Update sata_nv to wait for the controller to indicate via the status register that it has entered the requested state when switching between ADMA mode and register mode. This issue came up recently when debugging some problems with cache flush command timeouts and while it didn't appear to fix that problem, this is something we should likely be doing in any case. Signed-off-by: Robert Hancock <hancockr@shaw.ca> Cc: Tejun Heo <htejun@gmail.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Robert Hancock authored
Some problems showed up recently with cache flush commands timing out on sata_nv. Previously these commands were always handled by transitioning to legacy mode from ADMA mode first. The timeout problem was worked around already by a change to the interrupt handling code for legacy mode, but for non-data commands like these it appears we can handle them in ADMA mode, so the switch to legacy mode is not needed. This patch changes the behavior so that we use ADMA mode to submit interrupt-driven commands with ATA_PROT_NODATA protocol. In addition to avoiding the problem mentioned above entirely, this avoids the overhead of switching to legacy mode and back to ADMA mode for handling cache flushes. When handling non-DMA-mapped commands, we leave the APRD blank and clear the NV_CPB_CTL_APRD_VALID field in the CPB so the controller does not attempt to read it. Signed-off-by: Robert Hancock <hancockr@shaw.ca> Cc: Jeff Garzik <jeff@garzik.org> Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Robert Hancock authored
This cleans up a few issues with the error handling in sata_nv in ADMA mode to make it more consistent with other NCQ-capable drivers like ahci and sata_sil24: - When a command failed, we would effectively set AC_ERR_DEV on the queued command always. In the case of NCQ commands this prevents libata from doing a log page query to determine the details of the failed command, since it thinks we've already analyzed. Just set flags in the port ehi->err_mask, then freeze or abort and let libata figure out what went wrong. - The code handled NV_ADMA_STAT_CPBERR as a "really bad error" which caused it to set error flags on every queued command. I don't know exactly what this flag means (no docs, grr!) but from what I can guess from the standard ADMA spec, it just means that one or more of the CPBs had an error, so we just need to go through and do our normal checks in this case. - In the error_handler function the code would always dump the state of all the CPBs. This output seems redundant at this point since libata already dumps the state of all active commands on errors (and it also triggers at times when it shouldn't, like when suspending). Take this out. [akpm@osdl.org: many coding-style fixes] Signed-off-by: Robert Hancock <hancockr@shaw.ca> Cc: Jeff Garzik <jeff@garzik.org> Cc: Tejun Heo <htejun@gmail.com> Cc: Allen Martin <AMartin@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Sergei Shtylyov authored
MPIIX has only single channel IDE which can be configured for either primary or secondary legacy I/O ports and IRQ. So, get rid of the unneeded second probe entry in mpiix_init_one() and of the invalid (but unused anyway) enable bits in mpiix_pre_reset(). Warning: this cleanup has only been compile-tested... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Sergei Shtylyov authored
Fix clearing/setting the wrong TIME/IE/PPE bits for a slave drive caused by a wrong shift count. Fix the PIO mode 1 being overclocked by wrongly selecting the fast timing bank. Also, fix/rephrase some comments while at it. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Sergei Shtylyov authored
Fix the PIO mode 2 using mode 0 timings -- this driver should enable the fast timing bank starting with PIO2, just like the ata_piix driver does. Also, fix/rephrase some comments while at it. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Alan authored
IORDY and IORDY enable/disable flags. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Alan authored
People are getting confused about which drivers to enable for PATA PIIX type devices. Change the ATA_PIIX line and help to make it clearer. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Tejun Heo authored
* Hardreset must not exit without actually performing reset regardless of link status. We're resetting the link after all. * Minor message update. * 150ms delay is meaningful iff link is online after reset is complete. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Tejun Heo authored
Follow the old SRST rule and delay 150ms between completion of hardreset and status checking. Debouncing delay should usually cover this but debounce duration could be shorter than 150ms under certain circumstances. Usefulness depends on host controller implementation but it can't hurt and serves as a reminder that 2s delay for GoVault should also be added here. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Eric D. Mudama authored
Per Jeff's suggestion, this patch rearranges the info printed for ATA drives into dmesg to add the full ATA firmware revision and model information, while keeping the output to 2 lines. Signed-off-by: Eric D. Mudama <edmudama@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Sergei Shtylyov authored
Fix the wrong "compatible" PIO mode choices: MWDMA0 has 480 ns cycle while PIO1 only has 383 ns cycle, and MWDMA2 timings matchs those of PIO4 exactly. Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Akira Iguchi authored
This patch is against each libata driver. Two IRQ calls are added in ata_port_operations. - irq_on() is used to enable interrupts. - irq_ack() is used to acknowledge a device interrupt. In most drivers, ata_irq_on() and ata_irq_ack() are used for irq_on and irq_ack respectively. In some drivers (ex: ahci, sata_sil24) which cannot use them as is, ata_dummy_irq_on() and ata_dummy_irq_ack() are used. Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp> Signed-off-by: Akira Iguchi <akira2.iguchi@toshiba.co.jp> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Akira Iguchi authored
This patch is against the libata core and headers. Two IRQ calls are added in ata_port_operations. - irq_on() is used to enable interrupts. - irq_ack() is used to acknowledge a device interrupt. In most drivers, ata_irq_on() and ata_irq_ack() are used for irq_on and irq_ack respectively. In some drivers (ex: ahci, sata_sil24) which cannot use them as is, ata_dummy_irq_on() and ata_dummy_irq_ack() are used. Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp> Signed-off-by: Akira Iguchi <akira2.iguchi@toshiba.co.jp> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Andrew Morton authored
In file included from drivers/infiniband/hw/ipath/ipath_diag.c:44: include/linux/io.h:35: warning: 'struct device' declared inside parameter list include/linux/io.h:35: warning: its scope is only this definition or declaration Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Tejun Heo authored
Convert libata core layer and LLDs to use iomap. * managed iomap is used. Pointer to pcim_iomap_table() is cached at host->iomap and used through out LLDs. This basically replaces host->mmio_base. * if possible, pcim_iomap_regions() is used Most iomap operation conversions are taken from Jeff Garzik <jgarzik@pobox.com>'s iomap branch. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Tejun Heo authored
devres updates for pata_platform were dropped while merging devres patches due to merge conflict. This is the updated version. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Tejun Heo authored
devres change moved iomap.o from obj-$(CONFIG_GENERIC_IOMAP) to lib-y making it not linked if no in-kernel driver uses it. Fix it. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Jeff Garzik authored
Signed-off-by: Jeff Garzik <jeff@garzik.org>
-
Tejun Heo authored
Implement pcim_iomap_regions(). This function takes mask of BARs to request and iomap. No BAR should have length of zero. BARs are iomapped using pcim_iomap_table(). Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-