- 23 Jul, 2009 1 commit
-
-
john stultz authored
reducing the amount of arch specific code we need to maintain. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 03 Nov, 2009 1 commit
-
-
Julia Lawall authored
test. If ldev can somehow be NULL, then the initialization of last_idx should be moved below the test. A simplified version of the semantic match that detects this problem is as follows (http://coccinelle.lip6.fr/): // <smpl> @match exists@ expression x, E; identifier fld; @@ * x->fld ... when != \(x = E\|&x\) * x == NULL // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 09 Nov, 2009 1 commit
-
-
Alexey Dobriyan authored
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 03 Nov, 2009 1 commit
-
-
Jie Zhang authored
is what we want in the default case, all memory allocation from userspace under NOMMU has to go through this interface, including malloc() which is allowed to return uninitialized memory. This can easily be a significant performance penalty. So for constrained embedded systems were security is irrelevant, allow people to avoid clearing memory unnecessarily. This also alters the ELF-FDPIC binfmt such that it obtains uninitialised memory for the brk and stack region. Signed-off-by: Jie Zhang <jie.zhang@analog.com> Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 14 Feb, 2009 2 commits
-
-
Andrew Morton authored
#23: FILE: arch/frv/kernel/gdb-stub.c:1883: + gdbstub_strcpy(output_buffer,"E02"); ^ ERROR: space required after that ',' (ctx:VxV) #32: FILE: arch/frv/kernel/gdb-stub.c:1911: + gdbstub_strcpy(output_buffer,"E02"); ^ total: 2 errors, 0 warnings, 16 lines checked ./patches/frv-duplicate-output_buffer-of-e03.patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Roel Kluin authored
two. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 09 Nov, 2009 1 commit
-
-
H Hartley Sweeten authored
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 14 Nov, 2009 1 commit
-
-
Andrew Morton authored
Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andi Kleen <andi@firstfloor.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 13 Nov, 2009 6 commits
-
-
Hugh Dickins authored
to the page while the node exists. Put a pointer to the stable_node into the ksm page's ->mapping. Then we don't need get_ksm_page() while traversing the stable tree: the page to compare against is sure to be present and correct, even if it's no longer visible through any of its existing rmap_items. And we can handle the forked ksm page case more efficiently: no need to memcmp our way through the tree to find its match. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
separate tree_item at the node, for several reasons it becomes awkward to keep rmap_items in the stable tree without a separate stable_node: lack of space in the nicely-sized rmap_item, the need for an anchor as rmap_items are removed, the need for a node even when temporarily no rmap_items are attached to it. So declare struct stable_node (rb_node to place it in the tree and hlist_head for the rmap_items hanging off it), and convert stable tree handling to use it: without yet taking advantage of it. Note how one stable_tree_insert() of a node now has _two_ stable_tree_append()s of the two rmap_items being merged. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
singly-linked list: we always traverse that list sequentially, and we don't even lose any prefetches (but should consider adding a few later). Name it rmap_list throughout. Do we need to free up that pointer? Not immediately, and in the end, we could continue to avoid it with a union; but having done the conversion, let's keep it this way, since there's no downside, and maybe we'll want more in future (struct rmap_item is a cache-friendly 32 bytes on 32-bit and 64 bytes on 64-bit, so we shall want to avoid expanding it). Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
down to replace_page(), so that it's easier to follow the rmap_item's page and the matching tree_page and the merged kpage through that code. In some places, e.g. break_cow(), pass rmap_item instead of separate mm and address. cmp_and_merge_page() initialize tree_page to NULL, to avoid a "may be used uninitialized" warning seen in one config by Anil SB. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
vm_page_prot must already be write-protected for an anonymous page (see mm/memory.c do_anonymous_page() for similar reliance on vm_page_prot). There is no need for try_to_merge_one_page() to get_page and put_page on newpage and oldpage: in every case we already hold a reference to each of them. But some instinct makes me move try_to_merge_one_page()'s unlock_page of oldpage down after replace_page(): that doesn't increase contention on the ksm page, and makes thinking about the transition easier. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
various places: don't dirty the rmap_item cacheline unnecessarily, just mask the flags out of the address when they have been set. 2. First get_next_rmap_item() removes an unstable rmap_item from its tree, then shortly afterwards cmp_and_merge_page() removes a stable rmap_item from its tree: it's easier just to do both at once (but definitely keep the BUG_ON(age > 1) which guards against a future omission). 3. When cmp_and_merge_page() moves an rmap_item from unstable to stable tree, it does its own rb_erase() and accounting: that's better expressed by remove_rmap_item_from_tree(). Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 12 Nov, 2009 16 commits
-
-
KOSAKI Motohiro authored
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
KOSAKI Motohiro authored
Then, we can remove it perfectly. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
KOSAKI Motohiro authored
sc.swap_cluster_max misuse. huge sc.swap_cluster_max might makes unnecessary OOM risk and no performance benefit. Now, we can stop its insane thing. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
KOSAKI Motohiro authored
memory shrinker) for hibernate performance improvement. and sc.swap_cluster_max was introduced by commit a06fe4d307 (Speed freeing memory for suspend). commit a06fe4d307 said Without the patch: Freed 14600 pages in 1749 jiffies = 32.61 MB/s (Anomolous!) Freed 88563 pages in 14719 jiffies = 23.50 MB/s Freed 205734 pages in 32389 jiffies = 24.81 MB/s With the patch: Freed 68252 pages in 496 jiffies = 537.52 MB/s Freed 116464 pages in 569 jiffies = 798.54 MB/s Freed 209699 pages in 705 jiffies = 1161.89 MB/s At that time, their patch was pretty worth. However, Modern Hardware trend and recent VM improvement broke its worth. From several reason, I think we should remove shrink_all_zones() at all. detail: 1) Old days, shrink_zone()'s slowness was mainly caused by stupid io-throttle at no i/o congestion. but current shrink_zone() is sane, not slow. 2) shrink_all_zone() try to shrink all pages at a time. but it doesn't works fine on numa system. example) System has 4GB memory and each node have 2GB. and hibernate need 1GB. optimal) steal 500MB from each node. shrink_all_zones) steal 1GB from node-0. Oh, Cache balancing logic was broken. ;) Unfortunately, Desktop system moved ahead NUMA at nowadays. (Side note, if hibernate require 2GB, shrink_all_zones() never success on above machine) 3) if the node has several I/O flighting pages, shrink_all_zones() makes pretty bad result. schenario) hibernate need 1GB 1) shrink_all_zones() try to reclaim 1GB from Node-0 2) but it only reclaimed 990MB 3) stupidly, shrink_all_zones() try to reclaim 1GB from Node-1 4) it reclaimed 990MB Oh, well. it reclaimed twice much than required. In the other hand, current shrink_zone() has sane baling out logic. then, it doesn't make overkill reclaim. then, we lost shrink_zones()'s risk. 4) SplitLRU VM always keep active/inactive ratio very carefully. inactive list only shrinking break its assumption. it makes unnecessary OOM risk. it obviously suboptimal. Now, shrink_all_memory() is only the wrapper function of do_try_to_free_pages(). it bring good reviewability and debuggability, and solve above problems. side note: Reclaim logic unificication makes two good side effect. - Fix recursive reclaim bug on shrink_all_memory(). it did forgot to use PF_MEMALLOC. it mean the system be able to stuck into deadlock. - Now, shrink_all_memory() got lockdep awareness. it bring good debuggability. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
KOSAKI Motohiro authored
1) reclaim batch size as isolate_lru_pages()'s argument 2) reclaim baling out thresolds The two meanings pretty unrelated. Thus, Let's separate it. this patch doesn't change any behavior. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Alex Chiang authored
Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Alex Chiang authored
/sys/devices/system/node/node#/ However, it's not convenient to go in the other direction, when looking at /sys/devices/system/cpu/cpu#/ Yes, you can muck about in sysfs, but adding these symlinks makes life a lot more convenient. Signed-off-by: Alex Chiang <achiang@hp.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Rientjes <rientjes@google.com> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Alex Chiang authored
interesting code by two levels. No functional change. Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Rientjes <rientjes@google.com> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Alex Chiang authored
interesting code by one level. No functional change. Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Rientjes <rientjes@google.com> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Alex Chiang authored
symlinks in sysfs) created symlinks from nodes to memory sections, e.g. /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 If you're examining the memory section though and are wondering what node it might belong to, you can find it by grovelling around in sysfs, but it's a little cumbersome. Add a reverse symlink for each memory section that points back to the node to which it belongs. Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: David Rientjes <rientjes@google.com> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
corrupted for it to have been called, it does print_bad_pte() and returns ... VM_FAULT_OOM, which is hard to understand. It made some sense when I did it for 2.6.15, when do_page_fault() just killed the current process; but nowadays it lets the OOM killer decide who to kill - so page table corruption in one process would be liable to kill another. Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee that the process will be killed, but is good enough for such a rare abnormality, accompanied as it is by the "BUG: Bad page map" message. And recent HWPOISON work has copied that code into do_swap_page(), when it finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
and CONFIG_DEBUG_LOCK_ALLOC adds another 12 or 24 bytes to it: lockdep enables both of those, and CONFIG_LOCK_STAT adds 8 or 16 bytes to that. When 2.6.15 placed the split page table lock inside struct page (usually sized 32 or 56 bytes), only CONFIG_DEBUG_SPINLOCK was a possibility, and we ignored the enlargement (but fitted in CONFIG_GENERIC_LOCKBREAK's 4 by letting the spinlock_t occupy both page->private and page->mapping). Should these debugging options be allowed to double the size of a struct page, when only one minority use of the page (as a page table) needs to fit a spinlock in there? Perhaps not. Take the easy way out: switch off SPLIT_PTLOCK_CPUS when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC is in force. I've sometimes tried to be cleverer, kmallocing a cacheline for the spinlock when it doesn't fit, but given up each time. Falling back to mm->page_table_lock (as we do when ptlock is not split) lets lockdep check out the strictest path anyway. And now that some arches allow 8192 cpus, use 999999 for infinity. (What has this got to do with KSM swapping? It doesn't care about the size of struct page, but may care about random junk in page->mapping - to be explained separately later.) Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
should look. It could hack page->index to get them to do what it wants, but it seems cleaner now to pass the address down to them. Make the same change to page_mkclean_one(), since it follows the same pattern; but there's no real need in its case. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
CONFIG_UNEVICTABLE_LRU. MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT is CONFIG_HAVE_MLOCK is CONFIG_MMU. rmap.o (and memory-failure.o) are only built when CONFIG_MMU, so don't need such conditions at all. Somehow, I feel no compulsion to remove the CONFIG_HAVE_MLOCK* lines from 169 defconfigs: leave those to evolve in due course. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
try_to_unmap_file(), which we'd prefer not to repeat for KSM swapping. Simplify it by moving it all down into try_to_unmap_one(). One thing is then lost, try_to_munlock()'s distinction between when no vma holds the page mlocked, and when a vma does mlock it, but we could not get mmap_sem to set the page flag. But its only caller takes no interest in that distinction (and is better testing SWAP_MLOCK anyway), so let's keep the code simple and return SWAP_AGAIN for both cases. try_to_unmap_file()'s TTU_MUNLOCK nonlinear handling was particularly amusing: once unravelled, it turns out to have been choosing between two different ways of doing the same nothing. Ah, no, one way was actually returning SWAP_FAIL when it meant to return SWAP_SUCCESS. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
in page->mapping, with the higher bits a pointer to the anon_vma; and have defined PageKsm(page) as that with NULL anon_vma. But KSM swapping will need to store a pointer there: so in preparation for that, now define PAGE_MAPPING_FLAGS as the low two bits, including PAGE_MAPPING_KSM (always set along with PAGE_MAPPING_ANON, until some other use for the bit emerges). Declare page_rmapping(page) to return the pointer part of page->mapping, and page_anon_vma(page) to return the anon_vma pointer when that's what it is. Use these in a few appropriate places: notably, unuse_vma() has been testing page->mapping, but is better to be testing page_anon_vma() (cases may be added in which flag bits are set without any pointer). Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 14 Nov, 2009 1 commit
-
-
KOSAKI Motohiro authored
Once the priority is higher, kswapd starts waiting on congestion. However, if the zone is below the min watermark then kswapd needs to continue working without delay as there is a danger of an increased rate of GFP_ATOMIC allocation failure. This patch changes the conditions under which kswapd waits on congestion by only going to sleep if the min watermarks are being met. [mel@csn.ul.ie: Add stats to track how relevant the logic is] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 12 Nov, 2009 4 commits
-
-
Mel Gorman authored
event of no IO congestion, kswapd can go to sleep very shortly after the high watermark was reached. If there are a constant stream of allocations from parallel processes, it can mean that kswapd went to sleep too quickly and the high watermark is not being maintained for sufficient length time. This patch makes kswapd go to sleep as a two-stage process. It first tries to sleep for HZ/10. If it is woken up by another process or the high watermark is no longer met, it's considered a premature sleep and kswapd continues work. Otherwise it goes fully to sleep. This adds more counters to distinguish between fast and slow breaches of watermarks. A "fast" premature sleep is one where the low watermark was hit in a very short time after kswapd going to sleep. A "slow" premature sleep indicates that the high watermark was breached after a very short interval. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Frans Pop <elendil@planet.nl> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Mel Gorman authored
that the commits 373c0a7e 8aa7e847 dramatically increased the number of GFP_ATOMIC failures that were occuring within a wireless driver. Reverting this patch seemed to help a lot even though it was pointed out that the congestion changes were very far away from high-order atomic allocations. The key to why the revert makes such a big difference is down to timing and how long direct reclaimers wait versus kswapd. With the patch reverted, the congestion_wait() is on the SYNC queue instead of the ASYNC. As a significant part of the workload involved reads, it makes sense that the SYNC list is what was truely congested and with the revert processes were waiting on congestion as expected. Hence, direct reclaimers stalled properly and kswapd was able to do its job with fewer stalls. This patch aims to fix the congestion_wait() behaviour for SYNC and ASYNC for direct reclaimers. Instead of making the congestion_wait() on the SYNC queue which would only fix a particular type of workload, this patch adds a third type of congestion_wait - BLK_RW_BOTH which first waits on the ASYNC and then the SYNC queue if the timeout has not been reached. In tests, this counter-intuitively results in kswapd stalling less and freeing up pages resulting in fewer allocation failures and fewer direct-reclaim-orientated stalls. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Frans Pop <elendil@planet.nl> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Andrew Morton authored
#99: FILE: mm/oom_kill.c:209: + ^I * to kill current.We have to random task kill in this case.$ ERROR: code indent should use tabs where possible #100: FILE: mm/oom_kill.c:210: + ^I * Hopefully, CONSTRAINT_THISNODE...but no way to handle it, now.$ ERROR: code indent should use tabs where possible #101: FILE: mm/oom_kill.c:211: + ^I */$ ERROR: code indent should use tabs where possible #107: FILE: mm/oom_kill.c:216: + ^I * The nodemask here is a nodemask passed to alloc_pages(). Now,$ ERROR: code indent should use tabs where possible #108: FILE: mm/oom_kill.c:217: + ^I * cpuset doesn't use this nodemask for its hardwall/softwall/hierarchy$ ERROR: code indent should use tabs where possible #109: FILE: mm/oom_kill.c:218: + ^I * feature. mempolicy is an only user of nodemask here.$ ERROR: code indent should use tabs where possible #111: FILE: mm/oom_kill.c:220: + ^I */$ ERROR: code indent should use tabs where possible #169: FILE: mm/page_alloc.c:1672: +^I ^I* GFP_THISNODE contains __GFP_NORETRY and we never hit this.$ ERROR: code indent should use tabs where possible #170: FILE: mm/page_alloc.c:1673: +^I ^I* Sanity check for bare calls of __GFP_THISNODE, not real OOM.$ ERROR: code indent should use tabs where possible #171: FILE: mm/page_alloc.c:1674: +^I ^I* The caller should handle page allocation failure by itself if$ ERROR: code indent should use tabs where possible #172: FILE: mm/page_alloc.c:1675: +^I ^I* it specifies __GFP_THISNODE.$ ERROR: code indent should use tabs where possible #173: FILE: mm/page_alloc.c:1676: +^I ^I* Note: Hugepage uses it but will hit PAGE_ALLOC_COSTLY_ORDER.$ ERROR: code indent should use tabs where possible #174: FILE: mm/page_alloc.c:1677: +^I ^I*/$ total: 13 errors, 0 warnings, 125 lines checked ./patches/oom-kill-fix-numa-consraint-check-with-nodemask-v42.patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hioryu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
KAMEZAWA Hiroyuki authored
as a bugfix not as an ehnancement. In these days, things are changed as - alloc_pages() eats nodemask as its arguments, __alloc_pages_nodemask(). - mempolicy don't maintain its own private zonelists. (And cpuset doesn't use nodemask for __alloc_pages_nodemask()) So, current oom-killer's check function is wrong. This patch does - check nodemask, if nodemask && nodemask doesn't cover all node_states[N_HIGH_MEMORY], this is CONSTRAINT_MEMORY_POLICY. - Scan all zonelist under nodemask, if it hits cpuset's wall this faiulre is from cpuset. And - modifies the caller of out_of_memory not to call oom if __GFP_THISNODE. This doesn't change "current" behavior. If callers use __GFP_THISNODE it should handle "page allocation failure" by itself. - handle __GFP_NOFAIL+__GFP_THISNODE path. This is something like a FIXME but this gfpmask is not used now. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hioryu@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 11 Nov, 2009 1 commit
-
-
David Rientjes authored
On Tue, 10 Nov 2009, akpm@linux-foundation.org wrote: > diff -puN mm/oom_kill.c~oom-kill-show-virtual-size-and-rss-information-of-the-killed-process mm/oom_kill.c > --- a/mm/oom_kill.c~oom-kill-show-virtual-size-and-rss-information-of-the-killed-process > +++ a/mm/oom_kill.c > @@ -352,6 +352,8 @@ static void dump_header(gfp_t gfp_mask, > dump_tasks(mem); > } > > +#define K(x) ((x) << (PAGE_SHIFT-10)) > + > /* > * Send SIGKILL to the selected process irrespective of CAP_SYS_RAW_IO > * flag though it's unlikely that we select a process with CAP_SYS_RAW_IO > @@ -371,9 +373,16 @@ static void __oom_kill_task(struct task_ > return; > } > > - if (verbose) > - printk(KERN_ERR "Killed process %d (%s)\n", > - task_pid_nr(p), p->comm); > + if (verbose) { > + task_lock(p); > + printk(KERN_ERR "Killed process %d (%s) " > + "vsz:%lukB, anon-rss:%lukB, file-rss:%lukB\n", > + task_pid_nr(p), p->comm, > + K(p->mm->total_vm), > + K(get_mm_counter(p->mm, anon_rss)), > + K(get_mm_counter(p->mm, file_rss))); > + task_unlock(p); > + } > > /* > * We give our sacrificial lamb high priority and access to There's a race there which can dereference a NULL p->mm. p->mm is protected by task_lock(), but there's no check added here that ensures p->mm is still valid. The previous check for !p->mm in __oom_kill_task() is not protected by task_lock(), so there's a race: select_bad_process() oom_kill_process(p) do_exit() exit_signals(p) /* PF_EXITING */ oom_kill_task(p) __oom_kill_task(p) exit_mm(p) task_lock(p) p->mm = NULL task_unlock(p) printk() of p->mm->total_vm Please merge this as a fix. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 10 Nov, 2009 1 commit
-
-
KOSAKI Motohiro authored
killed process has a memory leak or not at the first step. This patch adds vsz and rss information to the oom log to help this analysis. To save time for the debugging. example: =================================================================== rsyslogd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0 Pid: 1308, comm: rsyslogd Not tainted 2.6.32-rc6 #24 Call Trace: [<ffffffff8132e35b>] ?_spin_unlock+0x2b/0x40 [<ffffffff810f186e>] oom_kill_process+0xbe/0x2b0 (snip) 492283 pages non-shared Out of memory: kill process 2341 (memhog) score 527276 or a child Killed process 2341 (memhog) vsz:1054552kB, anon-rss:970588kB, file-rss:4kB =========================================================================== ^ | here Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 31 Oct, 2009 1 commit
-
-
KAMEZAWA Hiroyuki authored
reproduce it easily. Now, oom-killer uses mm->total_vm as its base value. But in recent applications, there are a big gap between VM size and RSS size. Because - Applications attaches much dynamic libraries. (Gnome, KDE, etc...) - Applications may alloc big VM area but use small part of them. (Java, and multi-threaded applications has this tendency because of default-size of stack.) I think using mm->total_vm as score for oom-kill is not good. By the same reason, overcommit memory can't work as expected. (In other words, if we depends on total_vm, using overcommit more positive is a good choice.) This patch uses mm->anon_rss/file_rss as base value for calculating badness. Following is changes to OOM score(badness) on an environment with 1.6G memory plus memory-eater(500M & 1G). Top 10 of badness score. (The highest one is the first candidate to be killed) Before badness program 91228 gnome-settings- 94210 clock-applet 103202 mixer_applet2 106563 tomboy 112947 gnome-terminal 128944 mmap <----------- 500M malloc 129332 nautilus 215476 bash <----------- parent of 2 mallocs. 256944 mmap <----------- 1G malloc 423586 gnome-session After badness 1911 mixer_applet2 1955 clock-applet 1986 xinit 1989 gnome-session 2293 nautilus 2955 gnome-terminal 4113 tomboy 104163 mmap <----------- 500M malloc. 168577 bash <----------- parent of 2 mallocs 232375 mmap <----------- 1G malloc seems good for me. Maybe we can tweak this patch more, but this one will be a good one as a start point. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 03 Nov, 2009 2 commits
-
-
Huang Shijie authored
no need to check it. Signed-off-by: Huang Shijie <shijie8@gmail.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Huang Shijie authored
Signed-off-by: Huang Shijie <shijie8@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-