1. 27 Sep, 2006 40 commits
    • Christoph Lameter's avatar
      [PATCH] zone_statistics: Use hot node instead of cold zone_pgdat · 5d292343
      Christoph Lameter authored
      Now that we have the node in the hot zone of struct zone we can avoid
      accessing zone_pgdat in zone_statistics.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5d292343
    • Christoph Lameter's avatar
      [PATCH] Do not allocate pagesets for unpopulated zones. · 66a55030
      Christoph Lameter authored
      We do not need to allocate pagesets for unpopulated zones.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      66a55030
    • Christoph Lameter's avatar
      [PATCH] Add node to zone for the NUMA case · d5f541ed
      Christoph Lameter authored
      Add the node in order to optimize zone_to_nid.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Acked-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d5f541ed
    • Christoph Lameter's avatar
      [PATCH] GFP_THISNODE for the slab allocator · 765c4507
      Christoph Lameter authored
      This patch insures that the slab node lists in the NUMA case only contain
      slabs that belong to that specific node.  All slab allocations use
      GFP_THISNODE when calling into the page allocator.  If an allocation fails
      then we fall back in the slab allocator according to the zonelists appropriate
      for a certain context.
      
      This allows a replication of the behavior of alloc_pages and alloc_pages node
      in the slab layer.
      
      Currently allocations requested from the page allocator may be redirected via
      cpusets to other nodes.  This results in remote pages on nodelists and that in
      turn results in interrupt latency issues during cache draining.  Plus the slab
      is handing out memory as local when it is really remote.
      
      Fallback for slab memory allocations will occur within the slab allocator and
      not in the page allocator.  This is necessary in order to be able to use the
      existing pools of objects on the nodes that we fall back to before adding more
      pages to a slab.
      
      The fallback function insures that the nodes we fall back to obey cpuset
      restrictions of the current context.  We do not allocate objects from outside
      of the current cpuset context like before.
      
      Note that the implementation of locality constraints within the slab allocator
      requires importing logic from the page allocator.  This is a mischmash that is
      not that great.  Other allocators (uncached allocator, vmalloc, huge pages)
      face similar problems and have similar minimal reimplementations of the basic
      fallback logic of the page allocator.  There is another way of implementing a
      slab by avoiding per node lists (see modular slab) but this wont work within
      the existing slab.
      
      V1->V2:
      - Use NUMA_BUILD to avoid #ifdef CONFIG_NUMA
      - Exploit GFP_THISNODE being 0 in the NON_NUMA case to avoid another
        #ifdef
      
      [akpm@osdl.org: build fix]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      765c4507
    • Christoph Lameter's avatar
      [PATCH] Disable GFP_THISNODE in the non-NUMA case · 77f700da
      Christoph Lameter authored
      GFP_THISNODE must be set to 0 in the non numa case otherwise we disable retry
      and warnings for failing allocations in the SMP and UP case.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      77f700da
    • Christoph Lameter's avatar
      [PATCH] Add NUMA_BUILD definition in kernel.h to avoid #ifdef CONFIG_NUMA · 08e0f6a9
      Christoph Lameter authored
      The NUMA_BUILD constant is always available and will be set to 1 on
      NUMA_BUILDs.  That way checks valid only under CONFIG_NUMA can easily be done
      without #ifdef CONFIG_NUMA
      
      F.e.
      
      if (NUMA_BUILD && <numa_condition>) {
      ...
      }
      
      [akpm: not a thing we'd normally do, but CONFIG_NUMA is special: it is
       causing ifdef explosion in core kernel, so let's see if this is a comfortable
       way in whcih to control that]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      08e0f6a9
    • Jes Sorensen's avatar
      [PATCH] Condense output of show_free_areas() · c7241913
      Jes Sorensen authored
      On larger systems, the amount of output dumped on the console when you do
      SysRq-M is beyond insane.  This patch is trying to reduce it somewhat as
      even with the smaller NUMA systems that have hit the desktop this seems to
      be a fair thing to do.
      
      The philosophy I have taken is as follows:
       1) If a zone is empty, don't tell, we don't need yet another line
          telling us so. The information is available since one can look up
          the fact how many zones were initialized in the first place.
       2) Put as much information on a line is possible, if it can be done
          in one line, rahter than two, then do it in one. I tried to format
          the temperature stuff for easy reading.
      
      Change show_free_areas() to not print lines for empty zones.  If no zone
      output is printed, the zone is empty.  This reduces the number of lines
      dumped to the console in sysrq on a large system by several thousand lines.
      
      Change the zone temperature printouts to use one line per CPU instead of
      two lines (one hot, one cold).  On a 1024 CPU, 1024 node system, this
      reduces the console output by over a million lines of output.
      
      While this is a bigger problem on large NUMA systems, it is also applicable
      to smaller desktop sized and mid range NUMA systems.
      
      Old format:
      
      Mem-info:
      Node 0 DMA per-cpu:
      cpu 0 hot: high 42, batch 7 used:24
      cpu 0 cold: high 14, batch 3 used:1
      cpu 1 hot: high 42, batch 7 used:34
      cpu 1 cold: high 14, batch 3 used:0
      cpu 2 hot: high 42, batch 7 used:0
      cpu 2 cold: high 14, batch 3 used:0
      cpu 3 hot: high 42, batch 7 used:0
      cpu 3 cold: high 14, batch 3 used:0
      cpu 4 hot: high 42, batch 7 used:0
      cpu 4 cold: high 14, batch 3 used:0
      cpu 5 hot: high 42, batch 7 used:0
      cpu 5 cold: high 14, batch 3 used:0
      cpu 6 hot: high 42, batch 7 used:0
      cpu 6 cold: high 14, batch 3 used:0
      cpu 7 hot: high 42, batch 7 used:0
      cpu 7 cold: high 14, batch 3 used:0
      Node 0 DMA32 per-cpu: empty
      Node 0 Normal per-cpu: empty
      Node 0 HighMem per-cpu: empty
      Node 1 DMA per-cpu:
      [snip]
      Free pages:     5410688kB (0kB HighMem)
      Active:9536 inactive:4261 dirty:6 writeback:0 unstable:0 free:338168 slab:1931 mapped:1900 pagetables:208
      Node 0 DMA free:1676304kB min:3264kB low:4080kB high:4896kB active:128048kB inactive:61568kB present:1970880kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 0 HighMem free:0kB min:512kB low:512kB high:512kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 1 DMA free:1951728kB min:3280kB low:4096kB high:4912kB active:5632kB inactive:1504kB present:1982464kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      ....
      
      New format:
      
      Mem-info:
      Node 0 DMA per-cpu:
      CPU    0: Hot: hi:   42, btch:   7 usd:  41   Cold: hi:   14, btch:   3 usd:   2
      CPU    1: Hot: hi:   42, btch:   7 usd:  40   Cold: hi:   14, btch:   3 usd:   1
      CPU    2: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
      CPU    3: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
      CPU    4: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
      CPU    5: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
      CPU    6: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
      CPU    7: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
      Node 1 DMA per-cpu:
      [snip]
      Free pages:     5411088kB (0kB HighMem)
      Active:9558 inactive:4233 dirty:6 writeback:0 unstable:0 free:338193 slab:1942 mapped:1918 pagetables:208
      Node 0 DMA free:1677648kB min:3264kB low:4080kB high:4896kB active:129296kB inactive:58864kB present:1970880kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 1 DMA free:1948448kB min:3280kB low:4096kB high:4912kB active:6864kB inactive:3536kB present:1982464kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Signed-off-by: default avatarJes Sorensen <jes@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c7241913
    • Christoph Lameter's avatar
      [PATCH] slab: fix kmalloc_node applying memory policies if nodeid == numa_node_id() · de3083ec
      Christoph Lameter authored
      kmalloc_node() falls back to ___cache_alloc() under certain conditions and
      at that point memory policies may be applied redirecting the allocation
      away from the current node.  Therefore kmalloc_node(...,numa_node_id()) or
      kmalloc_node(...,-1) may not return memory from the local node.
      
      Fix this by doing the policy check in __cache_alloc() instead of
      ____cache_alloc().
      
      This version here is a cleanup of Kiran's patch.
      
      - Tested on ia64.
      - Extra material removed.
      - Consolidate the exit path if alternate_node_alloc() returned an object.
      
      [akpm@osdl.org: warning fix]
      Signed-off-by: default avatarAlok N Kataria <alok.kataria@calsoftinc.com>
      Signed-off-by: default avatarRavikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: default avatarShai Fultheim <shai@scalex86.org>
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      de3083ec
    • Nick Piggin's avatar
      [PATCH] page invalidation cleanup · 0fd0e6b0
      Nick Piggin authored
      Clean up the invalidate code, and use a common function to safely remove
      the page from pagecache.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0fd0e6b0
    • Heiko Carstens's avatar
      [PATCH] own header file for struct page · 5b99cd0e
      Heiko Carstens authored
      This moves the definition of struct page from mm.h to its own header file
      page-struct.h.  This is a prereq to fix SetPageUptodate which is broken on
      s390:
      
      #define SetPageUptodate(_page)
             do {
                     struct page *__page = (_page);
                     if (!test_and_set_bit(PG_uptodate, &__page->flags))
                             page_test_and_clear_dirty(_page);
             } while (0)
      
      _page gets used twice in this macro which can cause subtle bugs.  Using
      __page for the page_test_and_clear_dirty call doesn't work since it causes
      yet another problem with the page_test_and_clear_dirty macro as well.
      
      In order to avoid all these problems caused by macros it seems to be a good
      idea to get rid of them and convert them to static inline functions.
      Because of header file include order it's necessary to have a seperate
      header file for the struct page definition.
      
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5b99cd0e
    • Andrew Morton's avatar
      [PATCH] vm: add per-zone writeout counter · e129b5c2
      Andrew Morton authored
      The VM is supposed to minimise the number of pages which get written off the
      LRU (for IO scheduling efficiency, and for high reclaim-success rates).  But
      we don't actually have a clear way of showing how true this is.
      
      So add `nr_vmscan_write' to /proc/vmstat and /proc/zoneinfo - the number of
      pages which have been written by the vm scanner in this zone and globally.
      
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e129b5c2
    • Mel Gorman's avatar
      [PATCH] Allow an arch to expand node boundaries · fb01439c
      Mel Gorman authored
      Arch-independent zone-sizing determines the size of a node
      (pgdat->node_spanned_pages) based on the physical memory that was
      registered by the architecture.  However, when
      CONFIG_MEMORY_HOTPLUG_RESERVE is set, the architecture expects that the
      spanned_pages will be much larger and that mem_map will be allocated that
      is used lated on memory hot-add.
      
      This patch allows an architecture that sets CONFIG_MEMORY_HOTPLUG_RESERVE
      to call push_node_boundaries() which will set the node beginning and end to
      at *least* the requested boundary.
      
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Keith Mannthey" <kmannth@gmail.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      fb01439c
    • Mel Gorman's avatar
      [PATCH] Account for holes that are outside the range of physical memory · 9c7cd687
      Mel Gorman authored
      absent_pages_in_range() made the assumption that users of the API would not
      care about holes beyound the end of physical memory.  This was not the
      case.  This patch will account for ranges outside of physical memory as
      holes correctly.
      
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Keith Mannthey" <kmannth@gmail.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9c7cd687
    • Mel Gorman's avatar
      [PATCH] Account for memmap and optionally the kernel image as holes · 0e0b864e
      Mel Gorman authored
      The x86_64 code accounted for memmap and some portions of the the DMA zone as
      holes.  This was because those areas would never be reclaimed and accounting
      for them as memory affects min watermarks.  This patch will account for the
      memmap as a memory hole.  Architectures may optionally use set_dma_reserve()
      if they wish to account for a portion of memory in ZONE_DMA as a hole.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Keith Mannthey" <kmannth@gmail.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0e0b864e
    • Mel Gorman's avatar
      [PATCH] Have ia64 use add_active_range() and free_area_init_nodes · 05e0caad
      Mel Gorman authored
      Size zones and holes in an architecture independent manner for ia64.
      
      [bob.picco@hp.com: fix ia64 FLATMEM+VIRTUAL_MEM_MAP]
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarBob Picco <bob.picco@hp.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Keith Mannthey" <kmannth@gmail.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: default avatarBob Picco <bob.picco@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      05e0caad
    • Mel Gorman's avatar
      [PATCH] Have x86_64 use add_active_range() and free_area_init_nodes · 5cb248ab
      Mel Gorman authored
      Size zones and holes in an architecture independent manner for x86_64.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Keith Mannthey" <kmannth@gmail.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5cb248ab
    • Mel Gorman's avatar
      [PATCH] Have x86 use add_active_range() and free_area_init_nodes · 4cfee88a
      Mel Gorman authored
      Size zones and holes in an architecture independent manner for x86.
      
      [akpm@osdl.org: build fix]
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Keith Mannthey" <kmannth@gmail.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4cfee88a
    • Mel Gorman's avatar
      [PATCH] Have Power use add_active_range() and free_area_init_nodes() · c67c3cb4
      Mel Gorman authored
      Size zones and holes in an architecture independent manner for Power.
      
      [judith@osdl.org: build fix]
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Keith Mannthey" <kmannth@gmail.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c67c3cb4
    • Mel Gorman's avatar
      [PATCH] Introduce mechanism for registering active regions of memory · c713216d
      Mel Gorman authored
      At a basic level, architectures define structures to record where active
      ranges of page frames are located.  Once located, the code to calculate zone
      sizes and holes in each architecture is very similar.  Some of this zone and
      hole sizing code is difficult to read for no good reason.  This set of patches
      eliminates the similar-looking architecture-specific code.
      
      The patches introduce a mechanism where architectures register where the
      active ranges of page frames are with add_active_range().  When all areas have
      been discovered, free_area_init_nodes() is called to initialise the pgdat and
      zones.  The zone sizes and holes are then calculated in an architecture
      independent manner.
      
      Patch 1 introduces the mechanism for registering and initialising PFN ranges
      Patch 2 changes ppc to use the mechanism - 139 arch-specific LOC removed
      Patch 3 changes x86 to use the mechanism - 136 arch-specific LOC removed
      Patch 4 changes x86_64 to use the mechanism - 74 arch-specific LOC removed
      Patch 5 changes ia64 to use the mechanism - 52 arch-specific LOC removed
      Patch 6 accounts for mem_map as a memory hole as the pages are not reclaimable.
      	It adjusts the watermarks slightly
      
      Tony Luck has successfully tested for ia64 on Itanium with tiger_defconfig,
      gensparse_defconfig and defconfig.  Bob Picco has also tested and debugged on
      IA64.  Jack Steiner successfully boot tested on a mammoth SGI IA64-based
      machine.  These were on patches against 2.6.17-rc1 and release 3 of these
      patches but there have been no ia64-changes since release 3.
      
      There are differences in the zone sizes for x86_64 as the arch-specific code
      for x86_64 accounts the kernel image and the starting mem_maps as memory holes
      but the architecture-independent code accounts the memory as present.
      
      The big benefit of this set of patches is a sizable reduction of
      architecture-specific code, some of which is very hairy.  There should be a
      greater reduction when other architectures use the same mechanisms for zone
      and hole sizing but I lack the hardware to test on.
      
      Additional credit;
      	Dave Hansen for the initial suggestion and comments on early patches
      	Andy Whitcroft for reviewing early versions and catching numerous
      		errors
      	Tony Luck for testing and debugging on IA64
      	Bob Picco for fixing bugs related to pfn registration, reviewing a
      		number of patch revisions, providing a number of suggestions
      		on future direction and testing heavily
      	Jack Steiner and Robin Holt for testing on IA64 and clarifying
      		issues related to memory holes
      	Yasunori for testing on IA64
      	Andi Kleen for reviewing and feeding back about x86_64
      	Christian Kujau for providing valuable information related to ACPI
      		problems on x86_64 and testing potential fixes
      
      This patch:
      
      Define the structure to represent an active range of page frames within a node
      in an architecture independent manner.  Architectures are expected to register
      active ranges of PFNs using add_active_range(nid, start_pfn, end_pfn) and call
      free_area_init_nodes() passing the PFNs of the end of each zone.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarBob Picco <bob.picco@hp.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Keith Mannthey" <kmannth@gmail.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c713216d
    • Andrew Morton's avatar
      [PATCH] fix x86_64-mm-spinlock-cleanup · 2bd0cfbd
      Andrew Morton authored
      We need processor.h for cpu_relax().
      
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2bd0cfbd
    • Alexey Dobriyan's avatar
      [PATCH] Make kmem_cache_destroy() return void · 133d205a
      Alexey Dobriyan authored
      un-, de-, -free, -destroy, -exit, etc functions should in general return
      void.  Also,
      
      There is very little, say, filesystem driver code can do upon failed
      kmem_cache_destroy().  If it will be decided to BUG in this case, BUG
      should be put in generic code, instead.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      133d205a
    • Alexey Dobriyan's avatar
      [PATCH] Really ignore kmem_cache_destroy return value · 1a1d92c1
      Alexey Dobriyan authored
      * Rougly half of callers already do it by not checking return value
      * Code in drivers/acpi/osl.c does the following to be sure:
      
      	(void)kmem_cache_destroy(cache);
      
      * Those who check it printk something, however, slab_error already printed
        the name of failed cache.
      * XFS BUGs on failed kmem_cache_destroy which is not the decision
        low-level filesystem driver should make. Converted to ignore.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1a1d92c1
    • Panagiotis Issaris's avatar
      [PATCH] fs: Removing useless casts · f52720ca
      Panagiotis Issaris authored
      * Removing useless casts
      * Removing useless wrapper
      * Conversion from kmalloc+memset to kzalloc
      Signed-off-by: default avatarPanagiotis Issaris <takis@issaris.org>
      Acked-by: default avatarDave Kleikamp <shaggy@austin.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f52720ca
    • Panagiotis Issaris's avatar
      [PATCH] fs: Conversions from kmalloc+memset to k(z|c)alloc · f8314dc6
      Panagiotis Issaris authored
      Conversions from kmalloc+memset to kzalloc.
      Signed-off-by: default avatarPanagiotis Issaris <takis@issaris.org>
      Jffs2-bit-acked-by: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f8314dc6
    • Eric Sandeen's avatar
      [PATCH] more ext3 16T overflow fixes · 32c2d2bc
      Eric Sandeen authored
      Some of the changes in balloc.c are just cosmetic, as Andreas pointed out -
      if they overflow they'll then underflow and things are fine.
      
      5th hunk actually fixes an overflow problem.
      
      Also check for potential overflows in inode & block counts when resizing.
      Signed-off-by: default avatarEric Sandeen <esandeen@redhat.com>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      32c2d2bc
    • Dave Kleikamp's avatar
      [PATCH] ext3: Fix sparse warnings · a4e4de36
      Dave Kleikamp authored
      Fixing up some endian-ness warnings in preparation to clone ext4 from ext3.
      Signed-off-by: default avatarDave Kleikamp <shaggy@austin.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a4e4de36
    • Dave Kleikamp's avatar
      [PATCH] ext3: More whitespace cleanups · e9ad5620
      Dave Kleikamp authored
      More white space cleanups in preparation of cloning ext4 from ext3.
      Removing spaces that precede a tab.
      Signed-off-by: default avatarDave Kleikamp <shaggy@austin.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e9ad5620
    • Vasily Averin's avatar
      [PATCH] ext3: wrong error behavior · 7543fc7b
      Vasily Averin authored
      SWsoft Virtuozzo/OpenVZ Linux kernel team has discovered that ext3 error
      behavior was broken in linux kernels since 2.5.x versions by the following
      patch:
      
      2002/10/31 02:15:26-05:00 tytso@snap.thunk.org
      Default mount options from superblock for ext2/3 filesystems
      http://linux.bkbits.net:8080/linux-2.6/gnupatch@3dc0d88eKbV9ivV4ptRNM8fBuA3JBQ
      
      In case ext3 file system is mounted with errors=continue
      (EXT3_ERRORS_CONTINUE) errors should be ignored when possible.  However at
      present in case of any error kernel aborts journal and remounts filesystem
      to read-only.  Such behavior was hit number of times and noted to differ
      from that of 2.4.x kernels.
      
      This patch fixes this:
      - do nothing in case of EXT3_ERRORS_CONTINUE,
      - set EXT3_MOUNT_ABORT and call journal_abort() in all other cases
      - panic() should be called after ext3_commit_super() to save
       sb marked as EXT3_ERROR_FS
      Signed-off-by: default avatarVasily Averin <vvs@sw.ru>
      Acked-by: default avatarKirill Korotaev <dev@sw.ru>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: "Stephen C. Tweedie" <sct@redhat.com>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7543fc7b
    • Mingming Cao's avatar
    • Mingming Cao's avatar
      [PATCH] ext3: turn on reservation dump on block allocation errors · 321fb9e8
      Mingming Cao authored
      In the past there were a few kernel panics related to block reservation
      tree operations failure (insert/remove etc).  It would be very useful to
      get the block allocation reservation map info when such error happens.
      Signed-off-by: default avatarMingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      321fb9e8
    • Eric Sandeen's avatar
      [PATCH] JBD: 16T fixes · 37ed3222
      Eric Sandeen authored
      These are a few places I've found in jbd that look like they may not be
      16T-safe, or consistent with the use of unsigned longs for block
      containers.  Problems here would be somewhat hard to hit, would require
      journal blocks past the 8T boundary, which would not be terribly common.
      Still, should fix.
      
      (some of these have come from the ext4 work on jbd as well).
      
      I think there's one more possibility that the wrap() function may not be
      safe IF your last block in the journal butts right up against the 232 block
      boundary, but that seems like a VERY remote possibility, and I'm not
      worrying about it at this point.
      Signed-off-by: default avatarEric Sandeen <esandeen@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      37ed3222
    • Eric Sandeen's avatar
      [PATCH] ext3: inode numbers are unsigned long · eee194e7
      Eric Sandeen authored
      This is primarily format string fixes, with changes to ialloc.c where large
      inode counts could overflow, and also pass around journal_inum as an
      unsigned long, just to be pedantic about it....
      Signed-off-by: default avatarEric Sandeen <esandeen@redhat.com>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      eee194e7
    • Eric Sandeen's avatar
      [PATCH] ext2: fix mounts at 16T · 41f04d85
      Eric Sandeen authored
      Signed-off-by: default avatarEric Sandeen <esandeen@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      41f04d85
    • Eric Sandeen's avatar
      [PATCH] fix ext3 mounts at 16T · 855565e8
      Eric Sandeen authored
      I need to do some actual IO testing now, but this gets things mounting for
      a 16T ext3 filesystem.  (patched up e2fsprogs is needed too, I'll send that
      off the kernel list)
      
      This patch fixes these issues in the kernel:
      
      o sbi->s_groups_count overflows in ext3_fill_super()
      
      	sbi->s_groups_count = (le32_to_cpu(es->s_blocks_count) -
      			       le32_to_cpu(es->s_first_data_block) +
      			       EXT3_BLOCKS_PER_GROUP(sb) - 1) /
      			      EXT3_BLOCKS_PER_GROUP(sb);
      
        at 16T, s_blocks_count is already maxed out; adding
        EXT3_BLOCKS_PER_GROUP(sb) overflows it and groups_count comes out to 0.
        Not really what we want, and causes a failed mount.
      
        Feel free to check my math (actually, please do!), but changing it this
        way should work & avoid the overflow:
      
        (A + B - 1)/B changed to: ((A - 1)/B) + 1
      
      o ext3_check_descriptors() overflows range checks
      
        ext3_check_descriptors() iterates over all block groups making sure
        that various bits are within the right block ranges...  on the last pass
        through, it is checking the error case
      
         [item] >= block + EXT3_BLOCKS_PER_GROUP(sb)
      
        where "block" is the first block in the last block group.  The last
        block in this group (and the last one that will fit in 32 bits) is block
        + EXT3_BLOCKS_PER_GROUP(sb)- 1.  block + EXT3_BLOCKS_PER_GROUP(sb) wraps
        back around to 0.
      
        so, make things clearer with "first_block" and "last_block" where those
        are first and last, inclusive, and use <, > rather than <, >=.
      
        Finally, the last block group may be smaller than the rest, so account
        for this on the last pass through: last_block = sb->s_blocks_count - 1;
      
      (a similar patch could be done for ext2; does anyone in their right mind
      use ext2 at 16T?  I'll send an ext2 patch doing the same thing if that's
      warranted)
      Signed-off-by: default avatarEric Sandeen <esandeen@redhat.com>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      855565e8
    • Alexey Dobriyan's avatar
    • Mingming Cao's avatar
      [PATCH] ext3 and jbd cleanup: remove whitespace · ae6ddcc5
      Mingming Cao authored
      Remove whitespace from ext3 and jbd, before we clone ext4.
      
      Signed-off-by: Mingming Cao<cmm@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ae6ddcc5
    • Josh Triplett's avatar
      [PATCH] jbd: add lock annotation to jbd_sync_bh · e7ab8d65
      Josh Triplett authored
      jbd_sync_bh releases journal->j_list_lock.  Add a lock annotation to this
      function so that sparse can check callers for lock pairing, and so that
      sparse will not complain about this function since it intentionally uses
      the lock in this manner.
      Signed-off-by: default avatarJosh Triplett <josh@freedesktop.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e7ab8d65
    • KAMEZAWA Hiroyuki's avatar
      [PATCH] fix "cpu to node relationship fixup: map cpu to node" · bbf2bef9
      KAMEZAWA Hiroyuki authored
      Fix build error introduced by 3212fe15
      
      Non-NUMA case should be handled.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      bbf2bef9
    • Linus Torvalds's avatar
      Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/i2c-2.6 · a5b08073
      Linus Torvalds authored
      * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/i2c-2.6: (30 commits)
        i2c: Drop unimplemented slave functions
        i2c: Constify i2c_algorithm declarations, part 2
        i2c: Constify i2c_algorithm declarations, part 1
        i2c: Let drivers constify i2c_algorithm data
        i2c-isa: Restore driver owner
        i2c-viapro: Add support for the VT8237A and VT8251
        i2c: Warn on i2c client creation failure
        i2c-core: Drop useless bitmaskings
        i2c-algo-pcf: Discard the mdelay data struct member
        i2c-algo-bit: Cleanups
        i2c-isa: Fail adding driver on attach_adapter error
        i2c: __must_check fixes (chip drivers)
        i2c-dev: attach/detach_adapter cleanups
        i2c-stub: Chip address as a module parameter
        i2c: Plan i2c-isa for removal
        i2c: New bus driver for TI OMAP boards
        i2c-algo-bit: Discard the mdelay data struct member
        i2c-matroxfb: Struct init conversion
        i2c: Fix copy-n-paste in subsystem Kconfig
        i2c-au1550: Add I2C support for Au1200
        ...
      a5b08073
    • Linus Torvalds's avatar
      Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6 · ff0972c2
      Linus Torvalds authored
      * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (28 commits)
        pciehp - fix wrong return value
        IA64: PCI: dont disable irq which is not enabled
        acpiphp: add support for ioapic hot-remove
        PCI: assign ioapic resource at hotplug
        acpiphp: disable bridges
        acpiphp: stop bus device before acpi_bus_trim
        PCI: add pci_stop_bus_device
        acpiphp: do not initialize existing ioapics
        acpiphp: initialize ioapics before starting devices
        acpiphp: set hpp values before starting devices
        PCI Hotplug: cleanup pcihp skeleton code.
        PCI: Restore PCI Express capability registers after PM event
        PCI: drivers/pci/hotplug/acpiphp_glue.c: make a function static
        PCI: Multiprobe sanitizer
        PCI: fix __must_check warnings
        PCI Hotplug: fix __must_check warnings
        SHPCHP: fix __must_check warnings
        PCI-Express AER implemetation: pcie_portdrv error handler
        PCI-Express AER implemetation: AER core and aerdriver
        PCI-Express AER implemetation: export pcie_port_bus_type
        ...
      ff0972c2