• Lee Schermerhorn's avatar
    hugetlb: derive huge pages nodes allowed from task mempolicy · 06808b08
    Lee Schermerhorn authored
    This patch derives a "nodes_allowed" node mask from the numa mempolicy of
    the task modifying the number of persistent huge pages to control the
    allocation, freeing and adjusting of surplus huge pages when the pool page
    count is modified via the new sysctl or sysfs attribute
    "nr_hugepages_mempolicy".  The nodes_allowed mask is derived as follows:
    
    * For "default" [NULL] task mempolicy, a NULL nodemask_t pointer
      is produced.  This will cause the hugetlb subsystem to use
      node_online_map as the "nodes_allowed".  This preserves the
      behavior before this patch.
    * For "preferred" mempolicy, including explicit local allocation,
      a nodemask with the single preferred node will be produced.
      "local" policy will NOT track any internode migrations of the
      task adjusting nr_hugepages.
    * For "bind" and "interleave" policy, the mempolicy's nodemask
      will be used.
    * Other than to inform the construction of the nodes_allowed node
      mask, the actual mempolicy mode is ignored.  That is, all modes
      behave like interleave over the resulting nodes_allowed mask
      with no "fallback".
    
    See the updated documentation [next patch] for more information
    about the implications of this patch.
    
    Examples:
    
    Starting with:
    
    	Node 0 HugePages_Total:     0
    	Node 1 HugePages_Total:     0
    	Node 2 HugePages_Total:     0
    	Node 3 HugePages_Total:     0
    
    Default behavior [with or without this patch] balances persistent
    hugepage allocation across nodes [with sufficient contiguous memory]:
    
    	sysctl vm.nr_hugepages[_mempolicy]=32
    
    yields:
    
    	Node 0 HugePages_Total:     8
    	Node 1 HugePages_Total:     8
    	Node 2 HugePages_Total:     8
    	Node 3 HugePages_Total:     8
    
    Of course, we only have nr_hugepages_mempolicy with the patch,
    but with default mempolicy, nr_hugepages_mempolicy behaves the
    same as nr_hugepages.
    
    Applying mempolicy--e.g., with numactl [using '-m' a.k.a.
    '--membind' because it allows multiple nodes to be specified
    and it's easy to type]--we can allocate huge pages on
    individual nodes or sets of nodes.  So, starting from the
    condition above, with 8 huge pages per node, add 8 more to
    node 2 using:
    
    	numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40
    
    This yields:
    
    	Node 0 HugePages_Total:     8
    	Node 1 HugePages_Total:     8
    	Node 2 HugePages_Total:    16
    	Node 3 HugePages_Total:     8
    
    The incremental 8 huge pages were restricted to node 2 by the
    specified mempolicy.
    
    Similarly, we can use mempolicy to free persistent huge pages
    from specified nodes:
    
    	numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32
    
    yields:
    
    	Node 0 HugePages_Total:     4
    	Node 1 HugePages_Total:     4
    	Node 2 HugePages_Total:    16
    	Node 3 HugePages_Total:     8
    
    The 8 huge pages freed were balanced over nodes 0 and 1.
    
    [rientjes@google.com: accomodate reworked NODEMASK_ALLOC]
    Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
    Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
    Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
    Reviewed-by: default avatarAndi Kleen <andi@firstfloor.org>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Randy Dunlap <randy.dunlap@oracle.com>
    Cc: Nishanth Aravamudan <nacc@us.ibm.com>
    Cc: Adam Litke <agl@us.ibm.com>
    Cc: Andy Whitcroft <apw@canonical.com>
    Cc: Eric Whitney <eric.whitney@hp.com>
    Cc: Christoph Lameter <cl@linux-foundation.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    06808b08
sysctl.c 61.4 KB