1. 24 Aug, 2009 4 commits
  2. 13 Aug, 2009 1 commit
  3. 23 Jul, 2009 1 commit
  4. 13 Aug, 2009 1 commit
  5. 23 Jul, 2009 1 commit
  6. 13 Aug, 2009 1 commit
  7. 23 Jul, 2009 1 commit
  8. 13 Jul, 2009 1 commit
  9. 13 Aug, 2009 1 commit
  10. 12 Aug, 2009 1 commit
  11. 31 Jul, 2009 1 commit
  12. 23 Jul, 2009 1 commit
  13. 13 Aug, 2009 1 commit
  14. 04 Aug, 2009 1 commit
  15. 23 Jul, 2009 1 commit
  16. 13 Aug, 2009 1 commit
  17. 14 Feb, 2009 2 commits
  18. 09 Sep, 2009 10 commits
    • Hugh Dickins's avatar
      CONFIG_SHMEM off gives you (ramfs masquerading as) tmpfs, even when · 991ce3ef
      Hugh Dickins authored
      CONFIG_TMPFS is off: that's a little anomalous, and I'd intended to make
      more sense of it by removing CONFIG_TMPFS altogether, always enabling its
      code when CONFIG_SHMEM; but so many defconfigs have CONFIG_SHMEM on
      CONFIG_TMPFS off that we'd better leave that as is.
      
      But there is no point in asking for CONFIG_TMPFS if CONFIG_SHMEM is off:
      make TMPFS depend on SHMEM, which also prevents TMPFS_POSIX_ACL
      shmem_acl.o being pointlessly built into the kernel when SHMEM is off.
      
      And a selfish change, to prevent the world from being rebuilt when I
      switch between CONFIG_SHMEM on and off: the only CONFIG_SHMEM in the
      header files is mm.h shmem_lock() - give that a shmem.c stub instead.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      991ce3ef
    • Huang Shijie's avatar
      If (flags & MAP_LOCKED) is true, it means vm_flags has already contained · 47b7b6a1
      Huang Shijie authored
      the bit VM_LOCKED which is set by calc_vm_flag_bits().
      
      So there is no need to reset it again, just remove it.
      Signed-off-by: default avatarHuang Shijie <shijie8@gmail.com>
      Acked-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      47b7b6a1
    • Hugh Dickins's avatar
      __get_user_pages() has been taking its own GUP flags, then processing · fdf2984f
      Hugh Dickins authored
      them into FOLL flags for follow_page().  Though oddly named, the FOLL
      flags are more widely used, so pass them to __get_user_pages() now.
      Sorry, VM flags, VM_FAULT flags and FAULT_FLAGs are still distinct.
      
      (The patch to __get_user_pages() looks peculiar, with both gup_flags
      and foll_flags: the gup_flags remain constant; but as before there's
      an exceptional case, out of scope of the patch, in which foll_flags
      per page have FOLL_WRITE masked off.)
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fdf2984f
    • Hugh Dickins's avatar
      KAMEZAWA Hiroyuki has observed customers of earlier kernels taking · f671a4e8
      Hugh Dickins authored
      advantage of the ZERO_PAGE: which we stopped do_anonymous_page() from
      using in 2.6.24.  And there were a couple of regression reports on LKML.
      
      Following suggestions from Linus, reinstate do_anonymous_page() use of
      the ZERO_PAGE; but this time avoid dirtying its struct page cacheline
      with (map)count updates - let vm_normal_page() regard it as abnormal.
      
      Use it only on arches which __HAVE_ARCH_PTE_SPECIAL (x86, s390, sh32,
      most powerpc): that's not essential, but minimizes additional branches
      (keeping them in the unlikely pte_special case); and incidentally
      excludes mips (some models of which needed eight colours of ZERO_PAGE
      to avoid costly exceptions).
      
      Don't be fanatical about avoiding ZERO_PAGE updates: get_user_pages()
      callers won't want to make exceptions for it, so increment its count
      there.  Changes to mlock and migration? happily seems not needed.
      
      In most places it's quicker to check pfn than struct page address:
      prepare a __read_mostly zero_pfn for that.  Does get_dump_page()
      still need its ZERO_PAGE check? probably not, but keep it anyway.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f671a4e8
    • Hugh Dickins's avatar
      do_anonymous_page() has been wrong to dirty the pte regardless. · 1e9bc722
      Hugh Dickins authored
      If it's not going to mark the pte writable, then it won't help
      to mark it dirty here, and clogs up memory with pages which will
      need swap instead of being thrown away.  Especially wrong if no
      overcommit is chosen, and this vma is not yet VM_ACCOUNTed -
      we could exceed the limit and OOM despite no overcommit.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: <stable@kernel.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1e9bc722
    • Hugh Dickins's avatar
      follow_hugetlb_page() shouldn't be guessing about the coredump case · 7dc46b63
      Hugh Dickins authored
      either: pass the foll_flags down to it, instead of just the write bit.
      
      Remove that obscure huge_zeropage_ok() test.  The decision is easy,
      though unlike the non-huge case - here vm_ops->fault is always set.
      But we know that a fault would serve up zeroes, unless there's
      already a hugetlbfs pagecache page to back the range.
      
      (Alternatively, since hugetlb pages aren't swapped out under pressure,
      you could save more dump space by arguing that a page not yet faulted
      into this process cannot be relevant to the dump; but that would be
      more surprising.)
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7dc46b63
    • Hugh Dickins's avatar
      The "FOLL_ANON optimization" and its use_zero_page() test have caused · c2b1bd26
      Hugh Dickins authored
      confusion and bugs: why does it test VM_SHARED? for the very good but
      unsatisfying reason that VMware crashed without.  As we look to maybe
      reinstating anonymous use of the ZERO_PAGE, we need to sort this out.
      
      Easily done: it's silly for __get_user_pages() and follow_page() to
      be guessing whether it's safe to assume that they're being used for
      a coredump (which can take a shortcut snapshot where other uses must
      handle a fault) - just tell them with GUP_FLAGS_DUMP and FOLL_DUMP.
      
      get_dump_page() doesn't even want a ZERO_PAGE: an error suits fine.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c2b1bd26
    • Hugh Dickins's avatar
      In preparation for the next patch, add a simple get_dump_page(addr) · e35e64ad
      Hugh Dickins authored
      interface for the CONFIG_ELF_CORE dumpers to use, instead of calling
      get_user_pages() directly.  They're not interested in errors: they
      just want to use holes as much as possible, to save space and make
      sure that the data is aligned where the headers said it would be.
      
      Oh, and don't use that horrid DUMP_SEEK(off) macro!
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e35e64ad
    • Hugh Dickins's avatar
      GUP_FLAGS_IGNORE_VMA_PERMISSIONS and GUP_FLAGS_IGNORE_SIGKILL were · 94e9f60b
      Hugh Dickins authored
      flags added solely to prevent __get_user_pages() from doing some of
      what it usually does, in the munlock case: we can now remove them.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      94e9f60b
    • Hugh Dickins's avatar
      Hiroaki Wakabayashi points out that when mlock() has been interrupted · 70da9180
      Hugh Dickins authored
      by SIGKILL, the subsequent munlock() takes unnecessarily long because
      its use of __get_user_pages() insists on faulting in all the pages
      which mlock() never reached.
      
      It's worse than slowness if mlock() is terminated by Out Of Memory kill:
      the munlock_vma_pages_all() in exit_mmap() insists on faulting in all the
      pages which mlock() could not find memory for; so innocent bystanders are
      killed too, and perhaps the system hangs.
      
      __get_user_pages() does a lot that's silly for munlock(): so remove the
      munlock option from __mlock_vma_pages_range(), and use a simple loop of
      follow_page()s in munlock_vma_pages_range() instead; ignoring absent
      pages, and not marking present pages as accessed or dirty.
      
      (Change munlock() to only go so far as mlock() reached?  That does not
      work out, given the convention that mlock() claims complete success even
      when it has to give up early - in part so that an underlying file can be
      extended later, and those pages locked which earlier would give SIGBUS.)
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: <stable@kernel.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Hiroaki Wakabayashi <primulaelatior@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      70da9180
  19. 03 Sep, 2009 1 commit
  20. 04 Sep, 2009 1 commit
  21. 02 Sep, 2009 3 commits
    • Andrew Morton's avatar
      WARNING: line over 80 characters · b83f33b7
      Andrew Morton authored
      #45: FILE: mm/page_alloc.c:551:
      +		 * Remove pages from lists in a round-robin fashion. A batch_free
      
      total: 0 errors, 1 warnings, 46 lines checked
      
      ./patches/page-allocator-maintain-rolling-count-of-pages-to-free-from-the-pcp.patch has style problems, please review.  If any of these errors
      are false positives report them to the maintainer, see
      CHECKPATCH in MAINTAINERS.
      
      Please run checkpatch prior to sending patches
      
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b83f33b7
    • Mel Gorman's avatar
      When round-robin freeing pages from the PCP lists, empty lists may be · b4712e0f
      Mel Gorman authored
      encountered.  In the event one of the lists has more pages than another,
      there may be numerous checks for list_empty() which is undesirable.  This
      patch maintains a count of pages to free which is incremented when empty
      lists are encountered.  The intention is that more pages will then be
      freed from fuller lists than the empty ones reducing the number of empty
      list checks in the free path.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Cc:  Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b4712e0f
    • Mel Gorman's avatar
      The following two patches remove searching in the page allocator fast-path · aea38c74
      Mel Gorman authored
      by maintaining multiple free-lists in the per-cpu structure.  At the time
      the search was introduced, increasing the per-cpu structures would waste a
      lot of memory as per-cpu structures were statically allocated at
      compile-time.  This is no longer the case.
      
      The patches are as follows. They are based on mmotm-2009-08-27.
      
      Patch 1 adds multiple lists to struct per_cpu_pages, one per
      	migratetype that can be stored on the PCP lists.
      
      Patch 2 notes that the pcpu drain path check empty lists multiple times. The
      	patch reduces the number of checks by maintaining a count of free
      	lists encountered. Lists containing pages will then free multiple
      	pages in batch
      
      The patches were tested with kernbench, netperf udp/tcp, hackbench and
      sysbench.  The netperf tests were not bound to any CPU in particular and
      were run such that the results should be 99% confidence that the reported
      results are within 1% of the estimated mean.  sysbench was run with a
      postgres background and read-only tests.  Similar to netperf, it was run
      multiple times so that it's 99% confidence results are within 1%.  The
      patches were tested on x86, x86-64 and ppc64 as
      
      x86:	Intel Pentium D 3GHz with 8G RAM (no-brand machine)
      	kernbench	- No significant difference, variance well within noise
      	netperf-udp	- 1.34% to 2.28% gain
      	netperf-tcp	- 0.45% to 1.22% gain
      	hackbench	- Small variances, very close to noise
      	sysbench	- Very small gains
      
      x86-64:	AMD Phenom 9950 1.3GHz with 8G RAM (no-brand machine)
      	kernbench	- No significant difference, variance well within noise
      	netperf-udp	- 1.83% to 10.42% gains
      	netperf-tcp	- No conclusive until buffer >= PAGE_SIZE
      				4096	+15.83%
      				8192	+ 0.34% (not significant)
      				16384	+ 1%
      	hackbench	- Small gains, very close to noise
      	sysbench	- 0.79% to 1.6% gain
      
      ppc64:	PPC970MP 2.5GHz with 10GB RAM (it's a terrasoft powerstation)
      	kernbench	- No significant difference, variance well within noise
      	netperf-udp	- 2-3% gain for almost all buffer sizes tested
      	netperf-tcp	- losses on small buffers, gains on larger buffers
      			  possibly indicates some bad caching effect.
      	hackbench	- No significant difference
      	sysbench	- 2-4% gain
      
      
      
      This patch:
      
      Currently the per-cpu page allocator searches the PCP list for pages of
      the correct migrate-type to reduce the possibility of pages being
      inappropriate placed from a fragmentation perspective.  This search is
      potentially expensive in a fast-path and undesirable.  Splitting the
      per-cpu list into multiple lists increases the size of a per-cpu structure
      and this was potentially a major problem at the time the search was
      introduced.  These problem has been mitigated as now only the necessary
      number of structures is allocated for the running system.
      
      This patch replaces a list search in the per-cpu allocator with one list
      per migrate type.  The potential snag with this approach is when bulk
      freeing pages.  We round-robin free pages based on migrate type which has
      little bearing on the cache hotness of the page and potentially checks
      empty lists repeatedly in the event the majority of PCP pages are of one
      type.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      aea38c74
  22. 27 Aug, 2009 4 commits
    • KOSAKI Motohiro's avatar
      Andrew Morton pointed out oom_adjust_write() has very strange EIO · 294de8ab
      KOSAKI Motohiro authored
      and new line handling. this patch fixes it.
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      294de8ab
    • KOSAKI Motohiro's avatar
      Current oom_kill doesn't only kill the victim process, but also kill all · ed2a6974
      KOSAKI Motohiro authored
      thas shread the same mm.  it mean vfork parent will be killed.
      
      This is definitely incorrect.  another process have another oom_adj.  we
      shouldn't ignore their oom_adj (it might have OOM_DISABLE).
      
      following caller hit the minefield.
      
      ===============================
              switch (constraint) {
              case CONSTRAINT_MEMORY_POLICY:
                      oom_kill_process(current, gfp_mask, order, 0, NULL,
                                      "No available memory (MPOL_BIND)");
                      break;
      
      Note: force_sig(SIGKILL) send SIGKILL to all thread in the process.
      We don't need to care multi thread in here.
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ed2a6974
    • KOSAKI Motohiro's avatar
      oom-killer kills a process, not task. Then oom_score should be calculated · 2cd71712
      KOSAKI Motohiro authored
      as per-process too.  it makes consistency more and makes speed up
      select_bad_process().
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2cd71712
    • KOSAKI Motohiro's avatar
      Currently, OOM logic callflow is here. · 5bf2a6e3
      KOSAKI Motohiro authored
          __out_of_memory()
              select_bad_process()            for each task
                  badness()                   calculate badness of one task
                      oom_kill_process()      search child
                          oom_kill_task()     kill target task and mm shared tasks with it
      
      example, process-A have two thread, thread-A and thread-B and it have very
      fat memory and each thread have following oom_adj and oom_score.
      
           thread-A: oom_adj = OOM_DISABLE, oom_score = 0
           thread-B: oom_adj = 0,           oom_score = very-high
      
      Then, select_bad_process() select thread-B, but oom_kill_task() refuse
      kill the task because thread-A have OOM_DISABLE.  Thus __out_of_memory()
      call select_bad_process() again.  but select_bad_process() select the same
      task.  It mean kernel fall in livelock.
      
      The fact is, select_bad_process() must select killable task.  otherwise
      OOM logic go into livelock.
      
      And root cause is, oom_adj shouldn't be per-thread value.  it should be
      per-process value because OOM-killer kill a process, not thread.  Thus
      This patch moves oomkilladj (now more appropriately named oom_adj) from
      struct task_struct to struct signal_struct.  it naturally prevent
      select_bad_process() choose wrong task.
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5bf2a6e3