- 15 Oct, 2009 1 commit
-
-
Hugh Dickins authored
use of its cachelines: it's good for swap_duplicate() in particular if unsigned int max and swap_map are near the start. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 09 Nov, 2009 5 commits
-
-
Hugh Dickins authored
value to shmem/tmpfs swap pages: their swap counts are never incremented, and it helps swapoff's try_to_unuse() a little if it can immediately distinguish those pages from process pages. Since we've no use for SWAP_MAP_BAD | COUNT_CONTINUED, we might as well use that 0xbf value for SWAP_MAP_SHMEM. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
swap page is inserted into another mm (when forking finds a swap entry in place of a pte, or when reclaim unmaps a pte to insert the swap entry). swap_info_struct's vmalloc'ed swap_map is the array of these reference counts: but what happens when the unsigned short (or unsigned char since the preceding patch) is full? (and its high bit is kept for a cache flag) We then lose track of it, never freeing, leaving it in use until swapoff: at which point we _hope_ that a single pass will have found all instances, assume there are no more, and will lose user data if we're wrong. Swapping of KSM pages has not yet been enabled; but it is implemented, and makes it very easy for a user to overflow the maximum swap count: possible with ordinary process pages, but unlikely, even when pid_max has been raised from PID_MAX_DEFAULT. This patch implements swap count continuations: when the count overflows, a continuation page is allocated and linked to the original vmalloc'ed map page, and this used to hold the continuation counts for that entry and its neighbours. These continuation pages are seldom referenced: the common paths all work on the original swap_map, only referring to a continuation page when the low "digit" of a count is incremented or decremented through SWAP_MAP_MAX. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
chars: it's still very unusual to reach a swap count of 126, and the next patch allows it to be extended indefinitely. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
encode_swapmap() obscure what happens in the swap_map entry, just at those points where I need to understand it. Remove them, and pass more usable "usage" values to scan_swap_map(), swap_entry_free() and __swap_duplicate(), instead of the SWAP_MAP and SWAP_CACHE enum. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
block, remove extraneous whitespace and return, fix typo in a comment. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 03 Nov, 2009 2 commits
-
-
Hugh Dickins authored
initializing p->first_swap_extent.list at a point before p has been decided - we may kfree that newly allocated p and go on to reuse an existing free entry for p. Now, the patch is not actually wrong: an existing free entry will have a good empty first_swap_extent.list; but it looks suspicious, it seems strange to initialize a field in something we're about to kfree, and I'd rather we put that initialization back to where it was in 2.6.32. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Jiri Slaby authored
BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff810af160>] sys_swapon+0x1f0/0xc60 PGD 1dc0b067 PUD 1dc09067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: CPU 1 Modules linked in: Pid: 562, comm: swapon Tainted: G W 2.6.32-rc5-mm1_64 #867 RIP: 0010:[<ffffffff810af160>] [<ffffffff810af160>] sys_swapon+0x1f0/0xc60 ... It is due to swap_info_struct->first_swap_extent.list not being initialized. ->next is NULL in such a situation and destroy_swap_extents fails to iterate over the list with the BUG above. Introduced by swap_info-include-first_swap_extent.patch. Revert the INIT_LIST_HEAD move. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 09 Nov, 2009 3 commits
-
-
Hugh Dickins authored
swap_info_struct, instead of just the list_head: swap partitions need only that one, and for others it's used as a circular list anyway. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
to reserve an array of about 30 of them in bss, when most people will want only one. Change swap_info[] to an array of pointers. That does need a "type" field in the structure: pack it as a char with next type and short prio (aha, char is unsigned by default on PowerPC). Use the (admittedly peculiar) name "type" throughout for this index. /proc/swaps does not take swap_lock: I wouldn't want it to, but do take care with barriers when adding a new item to the array (never removed). Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
one other in-tree user: get_swap_bio(). Adjust its interface to map_swap_page(), so that we can then remove get_swap_info_struct(). But there is a popular user out-of-tree, TuxOnIce: so leave the declaration of swap_info_struct in linux/swap.h. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Nigel Cunningham <ncunningham@crca.org.au> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 13 Oct, 2009 2 commits
-
-
Jan Beulich authored
being called through vmalloc_32{,_user}() - explicitly allow using high memory here even if the outer allocation request doesn't allow it Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
David Rientjes authored
backed by slab caches that are not of large order, traditionally never greater than PAGE_ALLOC_COSTLY_ORDER. Thus, using GFP_KERNEL for these allocations on large machines when CONFIG_NODES_SHIFT > 8 will cause the page allocator to loop endlessly in the allocation attempt, each time invoking both direct reclaim or the oom killer. This is of particular interest when using NODEMASK_ALLOC() from a mempolicy context (either directly in mm/mempolicy.c or the mempolicy constrained hugetlb allocations) since the oom killer always kills current when allocations are constrained by mempolicies. So for all present use cases in the kernel, current would end up being oom killed when direct reclaim fails. That would allow the NODEMASK_ALLOC() to succeed but current would have sacrificed itself upon returning. This patch adds gfp flags to NODEMASK_ALLOC() to pass to kmalloc() on CONFIG_NODES_SHIFT > 8; this parameter is a nop on other configurations. All current use cases either directly from hugetlb code or indirectly via NODEMASK_SCRATCH() union __GFP_NORETRY to avoid direct reclaim and the oom killer when the slab allocator needs to allocate additional pages. The side-effect of this change is that all current use cases of either NODEMASK_ALLOC() or NODEMASK_SCRATCH() need appropriate -ENOMEM handling when the allocation fails (never for CONFIG_NODES_SHIFT <= 8). All current use cases were audited and do have appropriate error handling at this time. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 03 Nov, 2009 1 commit
-
-
Lee Schermerhorn authored
drivers/base/node.c:484: error: implicit declaration of function 'init_node_hugetlb_work' drivers/base/node.c:582: error: 'node_memory_callback' undeclared (first use in this function) Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 13 Oct, 2009 2 commits
-
-
Lee Schermerhorn authored
attributes to a worker thread rather than attempt the allocation/attachment or detachment/freeing of the attributes in the context of the memory hotplug handler. I don't know that this is absolutely required, but the registration can sleep in allocations and other mem hot plug handlers do it this way. If it turns out this is NOT required, we can drop this patch. N.B., Only tested build, boot, libhugetlbfs regression. i.e., no memory hotplug testing. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Lee Schermerhorn authored
suggested by David Rientjes. With Memory Hotplug, memory can be added to a memoryless node and a node with memory can become memoryless. Therefore, add a memory on/off-line notifier callback to [un]register a node's attributes on transition to/from memoryless state. N.B., Only tested build, boot, libhugetlbfs regression. i.e., no memory hotplug testing. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Acked-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 09 Nov, 2009 1 commit
-
-
David Rientjes authored
there are no present pages left. In such a situation, kswapd must also be stopped since it has nothing left to do. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Rik van Riel <riel@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 13 Oct, 2009 4 commits
-
-
Lee Schermerhorn authored
Global replacement of 'all online nodes" with "all nodes with memory" in mm/hugetlb.c. Suggested by David Rientjes. A subsequent patch will handle adding/removing of per node hstate sysfs attributes when nodes transition to/from memoryless state via memory hotplug. NOTE: this patch has not been tested with memoryless nodes. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Lee Schermerhorn authored
policy based huge page management. Additionaly, the patch includes a fair amount of rework to improve consistency, eliminate duplication and set the context for documenting the memory policy interaction. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Lee Schermerhorn authored
sysdevs: /sys/devices/system/node/node<ID>/hugepages/hugepages-<size>/ nr_hugepages - r/w free_huge_pages - r/o surplus_huge_pages - r/o The patch attempts to re-use/share as much of the existing global hstate attribute initialization and handling, and the "nodes_allowed" constraint processing as possible. Calling set_max_huge_pages() with no node indicates a change to global hstate parameters. In this case, any non-default task mempolicy will be used to generate the nodes_allowed mask. A valid node id indicates an update to that node's hstate parameters, and the count argument specifies the target count for the specified node. From this info, we compute the target global count for the hstate and construct a nodes_allowed node mask contain only the specified node. Setting the node specific nr_hugepages via the per node attribute effectively ignores any task mempolicy or cpuset constraints. With this patch: (me):ls /sys/devices/system/node/node0/hugepages/hugepages-2048kB ./ ../ free_hugepages nr_hugepages surplus_hugepages Starting from: Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 0 Node 2 HugePages_Free: 0 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 vm.nr_hugepages = 0 Allocate 16 persistent huge pages on node 2: (me):echo 16 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages [Note that this is equivalent to: numactl -m 2 hugeadmin --pool-pages-min 2M:+16 ] Yields: Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 16 Node 2 HugePages_Free: 16 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 vm.nr_hugepages = 16 Global controls work as expected--reduce pool to 8 persistent huge pages: (me):echo 8 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 8 Node 2 HugePages_Free: 8 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Lee Schermerhorn authored
to generic header 'linux/numa.h' for use in generic code. NUMA_NO_NODE replaces bare '-1' where it's used in this series to indicate "no node id specified". Ultimately, it can be used to replace the -1 elsewhere where it is used similarly. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 12 Nov, 2009 1 commit
-
-
Lee Schermerhorn authored
the task modifying the number of persistent huge pages to control the allocation, freeing and adjusting of surplus huge pages when the pool page count is modified via the new sysctl or sysfs attribute "nr_hugepages_mempolicy". The nodes_allowed mask is derived as follows: * For "default" [NULL] task mempolicy, a NULL nodemask_t pointer is produced. This will cause the hugetlb subsystem to use node_online_map as the "nodes_allowed". This preserves the behavior before this patch. * For "preferred" mempolicy, including explicit local allocation, a nodemask with the single preferred node will be produced. "local" policy will NOT track any internode migrations of the task adjusting nr_hugepages. * For "bind" and "interleave" policy, the mempolicy's nodemask will be used. * Other than to inform the construction of the nodes_allowed node mask, the actual mempolicy mode is ignored. That is, all modes behave like interleave over the resulting nodes_allowed mask with no "fallback". See the updated documentation [next patch] for more information about the implications of this patch. Examples: Starting with: Node 0 HugePages_Total: 0 Node 1 HugePages_Total: 0 Node 2 HugePages_Total: 0 Node 3 HugePages_Total: 0 Default behavior [with or without this patch] balances persistent hugepage allocation across nodes [with sufficient contiguous memory]: sysctl vm.nr_hugepages[_mempolicy]=32 yields: Node 0 HugePages_Total: 8 Node 1 HugePages_Total: 8 Node 2 HugePages_Total: 8 Node 3 HugePages_Total: 8 Of course, we only have nr_hugepages_mempolicy with the patch, but with default mempolicy, nr_hugepages_mempolicy behaves the same as nr_hugepages. Applying mempolicy--e.g., with numactl [using '-m' a.k.a. '--membind' because it allows multiple nodes to be specified and it's easy to type]--we can allocate huge pages on individual nodes or sets of nodes. So, starting from the condition above, with 8 huge pages per node, add 8 more to node 2 using: numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40 This yields: Node 0 HugePages_Total: 8 Node 1 HugePages_Total: 8 Node 2 HugePages_Total: 16 Node 3 HugePages_Total: 8 The incremental 8 huge pages were restricted to node 2 by the specified mempolicy. Similarly, we can use mempolicy to free persistent huge pages from specified nodes: numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32 yields: Node 0 HugePages_Total: 4 Node 1 HugePages_Total: 4 Node 2 HugePages_Total: 16 Node 3 HugePages_Total: 8 The 8 huge pages freed were balanced over nodes 0 and 1. [rientjes@google.com: accomodate reworked NODEMASK_ALLOC] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 13 Oct, 2009 5 commits
-
-
Lee Schermerhorn authored
This will be used to populate the huge pages "nodes_allowed" nodemask for a single node when basing nodes_allowed on a preferred/local mempolicy or when a persistent huge page pool page count is modified via a per node sysfs attribute. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
David Rientjes authored
> @@ -1144,14 +1156,15 @@ static void __init report_hugepages(void > } > > #ifdef CONFIG_HIGHMEM > -static void try_to_free_low(struct hstate *h, unsigned long count) > +static void try_to_free_low(struct hstate *h, unsigned long count, > + nodemask_t *nodes_allowed) > { > int i; > > if (h->order >= MAX_ORDER) > return; > > - for (i = 0; i < MAX_NUMNODES; ++i) { > + for_each_node_mask(node, nodes_allowed_) { > struct page *page, *next; > struct list_head *freel = &h->hugepage_freelists[i]; > list_for_each_entry_safe(page, next, freel, lru) { That's not looking good for i386, Andrew please fold the following into this patch when it's merged into -mm: [rientjes@google.com: fix HIGHMEM compile error] Signed-off-by: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Lee Schermerhorn authored
controlling task's numa mempolicy, add a "nodes_allowed" nodemask pointer to the allocate, free and surplus adjustment functions. For now, pass NULL to indicate default behavior--i.e., use node_online_map. A subsqeuent patch will derive a non-default mask from the controlling task's numa mempolicy. Note that this method of updating the global hstate nr_hugepages under the constraint of a nodemask simplifies keeping the global state consistent--especially the number of persistent and surplus pages relative to reservations and overcommit limits. There are undoubtedly other ways to do this, but this works for both interfaces: mempolicy and per node attributes. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Lee Schermerhorn authored
obtain the "start_nid". Then, whereas prior to this patch we unconditionally called hstate_next_node_to_{alloc|free}(), whether or not we successfully allocated/freed a huge page on the node, now we only call these functions on failure to alloc/free to advance to next allowed node. Factor out the next_node_allowed() function to handle wrap at end of node_online_map. In this version, the allowed nodes include all of the online nodes. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
David Rientjes authored
allocation and freeing of persistent huge pages on a NUMA platform. Please consider for merging into mmotm. This series uses two mechanisms to constrain the nodes from which persistent huge pages are allocated: 1) the task NUMA mempolicy of the task modifying a new sysctl "nr_hugepages_mempolicy", based on a suggestion by Mel Gorman; and 2) a subset of the hugepages hstate sysfs attributes have been added [in V4] to each node system device under: /sys/devices/node/node[0-9]*/hugepages The per node attibutes allow direct assignment of a huge page count on a specific node, regardless of the task's mempolicy or cpuset constraints. This patch: NODEMASK_ALLOC(x, m) assumes x is a type of struct, which is unnecessary. It's perfectly reasonable to use this macro to allocate a nodemask_t, which is anonymous, either dynamically or on the stack depending on NODES_SHIFT. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 14 Nov, 2009 1 commit
-
-
KOSAKI Motohiro authored
in right after isolate_page(). This patch does it. Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 25 Sep, 2009 4 commits
-
-
Wu Fengguang authored
Cc: Andi Kleen <ak@linux.intel.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wu Fengguang authored
> if (!kbuf) > return wrote ? wrote : -ENOMEM; > while (count > 0) { > - int len = size_inside_page(p, count); > + unsigned long sz = size_inside_page(p, count); > > - written = copy_from_user(kbuf, buf, len); > - if (written) { > + sz = copy_from_user(kbuf, buf, sz); Sorry, it introduced a bug: the "sz" will be zero in normal, > + if (sz) { > if (wrote + virtr) > break; > free_page((unsigned long)kbuf); > return -EFAULT; > } > - len = vwrite(kbuf, (char *)p, len); > + sz = vwrite(kbuf, (char *)p, sz); and get passed to vwrite here. This patch fixes it, the new var "n" will be used in another bug fixing patch following this one. Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wu Fengguang authored
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wu Fengguang authored
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 15 Sep, 2009 1 commit
-
-
Andrew Morton authored
Cc: Avi Kivity <avi@qumranet.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 12 Sep, 2009 4 commits
-
-
Andrew Morton authored
Cc: Andi Kleen <ak@linux.intel.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wu Fengguang authored
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wu Fengguang authored
Also apply it to /dev/kmem, whose alignment logic was buggy. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wu Fengguang authored
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 09 Nov, 2009 3 commits
-
-
Vincent Li authored
movement, either. Already, in commit 5343daceec, KOSAKI did it about shrink_inactive_list. This patch removes unnecessary overhead of page accounting and locking in shrink_active_list as follow-up work of commit 5343daceec. Signed-off-by: Vincent Li <macli@brc.ubc.ca> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Andrew Morton authored
#116: FILE: mm/mmap.c:1835: +static int __split_vma(struct mm_struct * mm, struct vm_area_struct * vma, ERROR: "foo * bar" should be "foo *bar" #138: FILE: mm/mmap.c:1888: +int split_vma(struct mm_struct * mm, struct vm_area_struct * vma, total: 2 errors, 0 warnings, 67 lines checked ./patches/mmap-dont-return-enomem-when-mapcount-is-temporarily-exceeded-in-munmap.patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
KOSAKI Motohiro authored
library called abort(). ======================================================== (gdb) bt #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000000003208e0 in raise () from /lib/libc.so.6.1 #2 0x2000000000324090 in abort () from /lib/libc.so.6.1 #3 0x200000000027c3e0 in __deallocate_stack () from /lib/libpthread.so.0 #4 0x200000000027f7c0 in start_thread () from /lib/libpthread.so.0 #5 0x200000000047ef60 in __clone2 () from /lib/libc.so.6.1 ======================================================== The fact is, glibc call munmap() when thread exitng time for freeing stack, and it assume munlock() never fail. However, munmap() often make vma splitting and it with many mapcount make -ENOMEM. Oh well, that's crazy, because stack unmapping never increase mapcount. The maxcount exceeding is only temporary. internal temporary exceeding shouldn't make ENOMEM. This patch does it. test_max_mapcount.c ================================================================== #include<stdio.h> #include<stdlib.h> #include<string.h> #include<pthread.h> #include<errno.h> #include<unistd.h> #define THREAD_NUM 30000 #define MAL_SIZE (8*1024*1024) void *wait_thread(void *args) { void *addr; addr = malloc(MAL_SIZE); sleep(10); return NULL; } void *wait_thread2(void *args) { sleep(60); return NULL; } int main(int argc, char *argv[]) { int i; pthread_t thread[THREAD_NUM], th; int ret, count = 0; pthread_attr_t attr; ret = pthread_attr_init(&attr); if(ret) { perror("pthread_attr_init"); } ret = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); if(ret) { perror("pthread_attr_setdetachstate"); } for (i = 0; i < THREAD_NUM; i++) { ret = pthread_create(&th, &attr, wait_thread, NULL); if(ret) { fprintf(stderr, "[%d] ", count); perror("pthread_create"); } else { printf("[%d] create OK.\n", count); } count++; ret = pthread_create(&thread[i], &attr, wait_thread2, NULL); if(ret) { fprintf(stderr, "[%d] ", count); perror("pthread_create"); } else { printf("[%d] create OK.\n", count); } count++; } sleep(3600); return 0; } ================================================================== Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-