1. 30 Sep, 2009 1 commit
  2. 12 Oct, 2009 1 commit
  3. 13 Oct, 2009 1 commit
    • Arjan van de Ven's avatar
      gcc is not convinced that the floppy.c ioctl has sufficient bound checks: · 8c262146
      Arjan van de Ven authored
      In function `copy_from_user',
          inlined from `fd_copyin' at drivers/block/floppy.c:3080,
          inlined from `fd_ioctl' at drivers/block/floppy.c:3503:
      /home/arjan/linux/arch/x86/include/asm/uaccess_32.h:211:
      warning: call to `copy_from_user_overflow' declared with attribute
      warning: copy_from_user buffer size is not provably correct
      
      And frankly, as a human I have a hard time proving the same more or less
      (the size comes from the ioctl argument.  humpf.  maybe.  the code isn't
      very nice)
      
      This patch adds an explicit check to make 100% sure it's safe, better than
      finding out later that there indeed was a gap.
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8c262146
  4. 12 Oct, 2009 1 commit
  5. 23 Jul, 2009 1 commit
  6. 14 Feb, 2009 2 commits
  7. 28 Oct, 2009 1 commit
  8. 31 Oct, 2009 1 commit
    • KAMEZAWA Hiroyuki's avatar
      It's reported that OOM-Killer kills Gnone/KDE first. And yes, we can · cdd5ac71
      KAMEZAWA Hiroyuki authored
      reproduce it easily.
      
      Now, oom-killer uses mm->total_vm as its base value.  But in recent
      applications, there are a big gap between VM size and RSS size.  Because
      
        - Applications attaches much dynamic libraries. (Gnome, KDE, etc...)
        - Applications may alloc big VM area but use small part of them.
          (Java, and multi-threaded applications has this tendency because
           of default-size of stack.)
      
      I think using mm->total_vm as score for oom-kill is not good.  By the same
      reason, overcommit memory can't work as expected.  (In other words, if we
      depends on total_vm, using overcommit more positive is a good choice.)
      
      This patch uses mm->anon_rss/file_rss as base value for calculating badness.
      
      Following is changes to OOM score(badness) on an environment with 1.6G memory
      plus memory-eater(500M & 1G).
      
      Top 10 of badness score. (The highest one is the first candidate to be killed)
      Before
      badness program
      91228	gnome-settings-
      94210	clock-applet
      103202	mixer_applet2
      106563	tomboy
      112947	gnome-terminal
      128944	mmap              <----------- 500M malloc
      129332	nautilus
      215476	bash              <----------- parent of 2 mallocs.
      256944	mmap              <----------- 1G malloc
      423586	gnome-session
      
      After
      badness
      1911	mixer_applet2
      1955	clock-applet
      1986	xinit
      1989	gnome-session
      2293	nautilus
      2955	gnome-terminal
      4113	tomboy
      104163	mmap             <----------- 500M malloc.
      168577	bash             <----------- parent of 2 mallocs
      232375	mmap             <----------- 1G malloc
      
      seems good for me.  Maybe we can tweak this patch more, but this one will
      be a good one as a start point.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cdd5ac71
  9. 16 Oct, 2009 2 commits
  10. 15 Oct, 2009 2 commits
  11. 16 Oct, 2009 1 commit
    • Hugh Dickins's avatar
      Swap is duplicated (reference count incremented by one) whenever the same · 1f975884
      Hugh Dickins authored
      swap page is inserted into another mm (when forking finds a swap entry in
      place of a pte, or when reclaim unmaps a pte to insert the swap entry).
      
      swap_info_struct's vmalloc'ed swap_map is the array of these reference
      counts: but what happens when the unsigned short (or unsigned char since
      the preceding patch) is full? (and its high bit is kept for a cache flag)
      
      We then lose track of it, never freeing, leaving it in use until swapoff:
      at which point we _hope_ that a single pass will have found all instances,
      assume there are no more, and will lose user data if we're wrong.
      
      Swapping of KSM pages has not yet been enabled; but it is implemented,
      and makes it very easy for a user to overflow the maximum swap count:
      possible with ordinary process pages, but unlikely, even when pid_max
      has been raised from PID_MAX_DEFAULT.
      
      This patch implements swap count continuations: when the count overflows,
      a continuation page is allocated and linked to the original vmalloc'ed
      map page, and this used to hold the continuation counts for that entry
      and its neighbours.  These continuation pages are seldom referenced:
      the common paths all work on the original swap_map, only referring to
      a continuation page when the low "digit" of a count is incremented or
      decremented through SWAP_MAP_MAX.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1f975884
  12. 15 Oct, 2009 2 commits
  13. 16 Oct, 2009 1 commit
  14. 15 Oct, 2009 3 commits
  15. 13 Oct, 2009 16 commits
    • Jan Beulich's avatar
      - avoid wasting more precious resources (DMA or DMA32 pools), when · cde4016d
      Jan Beulich authored
        being called through vmalloc_32{,_user}()
      - explicitly allow using high memory here even if the outer allocation
        request doesn't allow it
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Acked-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cde4016d
    • David Rientjes's avatar
      Objects passed to NODEMASK_ALLOC() are relatively small in size and are · b763f773
      David Rientjes authored
      backed by slab caches that are not of large order, traditionally never
      greater than PAGE_ALLOC_COSTLY_ORDER.
      
      Thus, using GFP_KERNEL for these allocations on large machines when
      CONFIG_NODES_SHIFT > 8 will cause the page allocator to loop endlessly in
      the allocation attempt, each time invoking both direct reclaim or the oom
      killer.
      
      This is of particular interest when using NODEMASK_ALLOC() from a
      mempolicy context (either directly in mm/mempolicy.c or the mempolicy
      constrained hugetlb allocations) since the oom killer always kills current
      when allocations are constrained by mempolicies.  So for all present use
      cases in the kernel, current would end up being oom killed when direct
      reclaim fails.  That would allow the NODEMASK_ALLOC() to succeed but
      current would have sacrificed itself upon returning.
      
      This patch adds gfp flags to NODEMASK_ALLOC() to pass to kmalloc() on
      CONFIG_NODES_SHIFT > 8; this parameter is a nop on other configurations. 
      All current use cases either directly from hugetlb code or indirectly via
      NODEMASK_SCRATCH() union __GFP_NORETRY to avoid direct reclaim and the oom
      killer when the slab allocator needs to allocate additional pages.
      
      The side-effect of this change is that all current use cases of either
      NODEMASK_ALLOC() or NODEMASK_SCRATCH() need appropriate -ENOMEM handling
      when the allocation fails (never for CONFIG_NODES_SHIFT <= 8).  All
      current use cases were audited and do have appropriate error handling at
      this time.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b763f773
    • Lee Schermerhorn's avatar
      Offload the registration and unregistration of per node hstate sysfs · 3f5cc391
      Lee Schermerhorn authored
      attributes to a worker thread rather than attempt the
      allocation/attachment or detachment/freeing of the attributes in the
      context of the memory hotplug handler.
      
      I don't know that this is absolutely required, but the registration can
      sleep in allocations and other mem hot plug handlers do it this way.  If
      it turns out this is NOT required, we can drop this patch.
      
      N.B.,  Only tested build, boot, libhugetlbfs regression.
             i.e., no memory hotplug testing.
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: default avatarAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3f5cc391
    • Lee Schermerhorn's avatar
      Register per node hstate attributes only for nodes with memory. As · dfc42a8c
      Lee Schermerhorn authored
      suggested by David Rientjes.
      
      With Memory Hotplug, memory can be added to a memoryless node and a node
      with memory can become memoryless.  Therefore, add a memory on/off-line
      notifier callback to [un]register a node's attributes on transition
      to/from memoryless state.
      
      N.B.,  Only tested build, boot, libhugetlbfs regression.
             i.e., no memory hotplug testing.
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: default avatarAndi Kleen <andi@firstfloor.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dfc42a8c
    • David Rientjes's avatar
      When memory is hot-removed, its node must be cleared in N_HIGH_MEMORY if · a3855edb
      David Rientjes authored
      there are no present pages left.
      
      In such a situation, kswapd must also be stopped since it has nothing left
      to do.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a3855edb
    • Lee Schermerhorn's avatar
      Register per node hstate sysfs attributes only for nodes with memory. · 52fee6c3
      Lee Schermerhorn authored
      Global replacement of 'all online nodes" with "all nodes with memory" in
      mm/hugetlb.c.  Suggested by David Rientjes.
      
      A subsequent patch will handle adding/removing of per node hstate sysfs
      attributes when nodes transition to/from memoryless state via memory
      hotplug.
      
      NOTE: this patch has not been tested with memoryless nodes.
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: default avatarAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      52fee6c3
    • Lee Schermerhorn's avatar
      Update the kernel huge tlb documentation to describe the numa memory · c6a3de63
      Lee Schermerhorn authored
      policy based huge page management.  Additionaly, the patch includes a fair
      amount of rework to improve consistency, eliminate duplication and set the
      context for documenting the memory policy interaction.
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c6a3de63
    • Lee Schermerhorn's avatar
      Add the per huge page size control/query attributes to the per node · 53932d4c
      Lee Schermerhorn authored
      sysdevs:
      
      /sys/devices/system/node/node<ID>/hugepages/hugepages-<size>/
      	nr_hugepages       - r/w
      	free_huge_pages    - r/o
      	surplus_huge_pages - r/o
      
      The patch attempts to re-use/share as much of the existing global hstate
      attribute initialization and handling, and the "nodes_allowed" constraint
      processing as possible.
      
      Calling set_max_huge_pages() with no node indicates a change to global
      hstate parameters.  In this case, any non-default task mempolicy will be
      used to generate the nodes_allowed mask.  A valid node id indicates an
      update to that node's hstate parameters, and the count argument specifies
      the target count for the specified node.  From this info, we compute the
      target global count for the hstate and construct a nodes_allowed node mask
      contain only the specified node.
      
      Setting the node specific nr_hugepages via the per node attribute
      effectively ignores any task mempolicy or cpuset constraints.
      
      With this patch:
      
      (me):ls /sys/devices/system/node/node0/hugepages/hugepages-2048kB
      ./  ../  free_hugepages  nr_hugepages  surplus_hugepages
      
      Starting from:
      Node 0 HugePages_Total:     0
      Node 0 HugePages_Free:      0
      Node 0 HugePages_Surp:      0
      Node 1 HugePages_Total:     0
      Node 1 HugePages_Free:      0
      Node 1 HugePages_Surp:      0
      Node 2 HugePages_Total:     0
      Node 2 HugePages_Free:      0
      Node 2 HugePages_Surp:      0
      Node 3 HugePages_Total:     0
      Node 3 HugePages_Free:      0
      Node 3 HugePages_Surp:      0
      vm.nr_hugepages = 0
      
      Allocate 16 persistent huge pages on node 2:
      (me):echo 16 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
      
      [Note that this is equivalent to:
      	numactl -m 2 hugeadmin --pool-pages-min 2M:+16
      ]
      
      Yields:
      Node 0 HugePages_Total:     0
      Node 0 HugePages_Free:      0
      Node 0 HugePages_Surp:      0
      Node 1 HugePages_Total:     0
      Node 1 HugePages_Free:      0
      Node 1 HugePages_Surp:      0
      Node 2 HugePages_Total:    16
      Node 2 HugePages_Free:     16
      Node 2 HugePages_Surp:      0
      Node 3 HugePages_Total:     0
      Node 3 HugePages_Free:      0
      Node 3 HugePages_Surp:      0
      vm.nr_hugepages = 16
      
      Global controls work as expected--reduce pool to 8 persistent huge pages:
      (me):echo 8 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
      
      Node 0 HugePages_Total:     0
      Node 0 HugePages_Free:      0
      Node 0 HugePages_Surp:      0
      Node 1 HugePages_Total:     0
      Node 1 HugePages_Free:      0
      Node 1 HugePages_Surp:      0
      Node 2 HugePages_Total:     8
      Node 2 HugePages_Free:      8
      Node 2 HugePages_Surp:      0
      Node 3 HugePages_Total:     0
      Node 3 HugePages_Free:      0
      Node 3 HugePages_Surp:      0
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      53932d4c
    • Lee Schermerhorn's avatar
      Move definition of NUMA_NO_NODE from ia64 and x86_64 arch specific headers · 8bf6e8e5
      Lee Schermerhorn authored
      to generic header 'linux/numa.h' for use in generic code.  NUMA_NO_NODE
      replaces bare '-1' where it's used in this series to indicate "no node id
      specified".  Ultimately, it can be used to replace the -1 elsewhere where
      it is used similarly.
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8bf6e8e5
    • Lee Schermerhorn's avatar
      This patch derives a "nodes_allowed" node mask from the numa mempolicy of · 2b3f8ca0
      Lee Schermerhorn authored
      the task modifying the number of persistent huge pages to control the
      allocation, freeing and adjusting of surplus huge pages when the pool page
      count is modified via the new sysctl or sysfs attribute
      "nr_hugepages_mempolicy".  The nodes_allowed mask is derived as follows:
      
      * For "default" [NULL] task mempolicy, a NULL nodemask_t pointer
        is produced.  This will cause the hugetlb subsystem to use
        node_online_map as the "nodes_allowed".  This preserves the
        behavior before this patch.
      * For "preferred" mempolicy, including explicit local allocation,
        a nodemask with the single preferred node will be produced.
        "local" policy will NOT track any internode migrations of the
        task adjusting nr_hugepages.
      * For "bind" and "interleave" policy, the mempolicy's nodemask
        will be used.
      * Other than to inform the construction of the nodes_allowed node
        mask, the actual mempolicy mode is ignored.  That is, all modes
        behave like interleave over the resulting nodes_allowed mask
        with no "fallback".
      
      See the updated documentation [next patch] for more information
      about the implications of this patch.
      
      Examples:
      
      Starting with:
      
      	Node 0 HugePages_Total:     0
      	Node 1 HugePages_Total:     0
      	Node 2 HugePages_Total:     0
      	Node 3 HugePages_Total:     0
      
      Default behavior [with or without this patch] balances persistent
      hugepage allocation across nodes [with sufficient contiguous memory]:
      
      	sysctl vm.nr_hugepages[_mempolicy]=32
      
      yields:
      
      	Node 0 HugePages_Total:     8
      	Node 1 HugePages_Total:     8
      	Node 2 HugePages_Total:     8
      	Node 3 HugePages_Total:     8
      
      Of course, we only have nr_hugepages_mempolicy with the patch,
      but with default mempolicy, nr_hugepages_mempolicy behaves the
      same as nr_hugepages.
      
      Applying mempolicy--e.g., with numactl [using '-m' a.k.a.
      '--membind' because it allows multiple nodes to be specified
      and it's easy to type]--we can allocate huge pages on
      individual nodes or sets of nodes.  So, starting from the
      condition above, with 8 huge pages per node, add 8 more to
      node 2 using:
      
      	numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40
      
      This yields:
      
      	Node 0 HugePages_Total:     8
      	Node 1 HugePages_Total:     8
      	Node 2 HugePages_Total:    16
      	Node 3 HugePages_Total:     8
      
      The incremental 8 huge pages were restricted to node 2 by the
      specified mempolicy.
      
      Similarly, we can use mempolicy to free persistent huge pages
      from specified nodes:
      
      	numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32
      
      yields:
      
      	Node 0 HugePages_Total:     4
      	Node 1 HugePages_Total:     4
      	Node 2 HugePages_Total:    16
      	Node 3 HugePages_Total:     8
      
      The 8 huge pages freed were balanced over nodes 0 and 1.
      
      [rientjes@google.com: accomodate reworked NODEMASK_ALLOC]
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2b3f8ca0
    • Lee Schermerhorn's avatar
      Factor init_nodemask_of_node() out of the nodemask_of_node() macro. · 3a2b699a
      Lee Schermerhorn authored
      This will be used to populate the huge pages "nodes_allowed" nodemask for
      a single node when basing nodes_allowed on a preferred/local mempolicy or
      when a persistent huge page pool page count is modified via a per node
      sysfs attribute.
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3a2b699a
    • David Rientjes's avatar
      On Thu, 8 Oct 2009, Lee Schermerhorn wrote: · e651309a
      David Rientjes authored
      > @@ -1144,14 +1156,15 @@ static void __init report_hugepages(void
      >  }
      >
      >  #ifdef CONFIG_HIGHMEM
      > -static void try_to_free_low(struct hstate *h, unsigned long count)
      > +static void try_to_free_low(struct hstate *h, unsigned long count,
      > +						nodemask_t *nodes_allowed)
      >  {
      >  	int i;
      >
      >  	if (h->order >= MAX_ORDER)
      >  		return;
      >
      > -	for (i = 0; i < MAX_NUMNODES; ++i) {
      > +	for_each_node_mask(node, nodes_allowed_) {
      >  		struct page *page, *next;
      >  		struct list_head *freel = &h->hugepage_freelists[i];
      >  		list_for_each_entry_safe(page, next, freel, lru) {
      
      That's not looking good for i386, Andrew please fold the following into
      this patch when it's merged into -mm:
      
      [rientjes@google.com: fix HIGHMEM compile error]
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e651309a
    • Lee Schermerhorn's avatar
      In preparation for constraining huge page allocation and freeing by the · d1945f93
      Lee Schermerhorn authored
      controlling task's numa mempolicy, add a "nodes_allowed" nodemask pointer
      to the allocate, free and surplus adjustment functions.  For now, pass
      NULL to indicate default behavior--i.e., use node_online_map.  A
      subsqeuent patch will derive a non-default mask from the controlling
      task's numa mempolicy.
      
      Note that this method of updating the global hstate nr_hugepages under the
      constraint of a nodemask simplifies keeping the global state
      consistent--especially the number of persistent and surplus pages relative
      to reservations and overcommit limits.  There are undoubtedly other ways
      to do this, but this works for both interfaces: mempolicy and per node
      attributes.
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d1945f93
    • Lee Schermerhorn's avatar
      Modify the hstate_next_node* functions to allow them to be called to · 993a43a5
      Lee Schermerhorn authored
      obtain the "start_nid".  Then, whereas prior to this patch we
      unconditionally called hstate_next_node_to_{alloc|free}(), whether or not
      we successfully allocated/freed a huge page on the node, now we only call
      these functions on failure to alloc/free to advance to next allowed node.
      
      Factor out the next_node_allowed() function to handle wrap at end of
      node_online_map.  In this version, the allowed nodes include all of the
      online nodes.
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      993a43a5
    • David Rientjes's avatar
      This is a series of patches to provide control over the location of the · 9d9f5506
      David Rientjes authored
      allocation and freeing of persistent huge pages on a NUMA platform. 
      Please consider for merging into mmotm.
      
      This series uses two mechanisms to constrain the nodes from which
      persistent huge pages are allocated: 1) the task NUMA mempolicy of the
      task modifying a new sysctl "nr_hugepages_mempolicy", based on a
      suggestion by Mel Gorman; and 2) a subset of the hugepages hstate sysfs
      attributes have been added [in V4] to each node system device under:
      
      	/sys/devices/node/node[0-9]*/hugepages
      
      The per node attibutes allow direct assignment of a huge page count on a
      specific node, regardless of the task's mempolicy or cpuset constraints.  
      
      
      This patch:
      
      NODEMASK_ALLOC(x, m) assumes x is a type of struct, which is unnecessary. 
      It's perfectly reasonable to use this macro to allocate a nodemask_t,
      which is anonymous, either dynamically or on the stack depending on
      NODES_SHIFT.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9d9f5506
    • KOSAKI Motohiro's avatar
      Christoph pointed out inc_zone_page_state(NR_ISOLATED) should be placed · 4f50eaca
      KOSAKI Motohiro authored
      in right after isolate_page().
      
      This patch does it.
      Acked-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4f50eaca
  16. 25 Sep, 2009 4 commits