1. 23 Jul, 2009 4 commits
    • Izik Eidus's avatar
      ksm should try not to disturb other tasks as much as possible. · 87ca7e21
      Izik Eidus authored
      Signed-off-by: default avatarIzik Eidus <ieidus@redhat.com>
      Cc: Chris Wright <chrisw@redhat.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      87ca7e21
    • Izik Eidus's avatar
      Adding Hugh Dickins into the authors list. · 3fd63801
      Izik Eidus authored
      Signed-off-by: default avatarIzik Eidus <ieidus@redhat.com>
      Cc: Chris Wright <chrisw@redhat.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3fd63801
    • Hugh Dickins's avatar
      KSM's scan allows for user pages to be COWed or unmapped at any time, · 3d39da78
      Hugh Dickins authored
      without requiring any notification.  But its stable tree does assume that
      when it finds a KSM page where it placed a KSM page, then it is the same
      KSM page that it placed there.
      
      mremap move could break that assumption: if an area containing a KSM page
      was unmapped, then an area containing a different KSM page was moved with
      mremap into the place of the original, before KSM's scan came around to
      notice.  That could then poison a node of the stable tree, so that memcmps
      would "lie" and upset the ordering of the tree.
      
      Probably noone will ever need mremap move on a VM_MERGEABLE area; except
      that prohibiting it would make trouble for schemes in which we try making
      everything VM_MERGEABLE e.g.  for testing: an mremap which normally works
      would then fail mysteriously.
      
      There's no need to go to any trouble, such as re-sorting KSM's list of
      rmap_items to match the new layout: simply unmerge the area to COW all its
      KSM pages before moving, but leave VM_MERGEABLE on so that they're
      remerged later.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarChris Wright <chrisw@redhat.com>
      Signed-off-by: default avatarIzik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3d39da78
    • Izik Eidus's avatar
      Ksm is code that allows merging of identical pages between one or more · 90b39ee6
      Izik Eidus authored
      applications, in a way invisible to the applications that use it.  Pages
      that are merged are marked as read-only, then COWed when any application
      tries to change them.
      
      Whereas fork() allows sharing anonymous pages between parent and child,
      ksm can share anonymous pages between unrelated processes.
      
      Ksm works by walking over the memory pages of the applications it scans,
      in order to find identical pages.  It uses two sorted data structures,
      called the stable and unstable trees, to locate identical pages in an
      effective way.
      
      When ksm finds two identical pages, it marks them as readonly and merges
      them into a single page.  After the pages have been marked as readonly and
      merged into one, Linux treats them as normal copy-on-write pages, copying
      to a fresh anonymous page if write access is required later.
      
      Ksm scans and merges anonymous pages only in those memory areas that have
      been registered with it by madvise(addr, length, MADV_MERGEABLE).
      
      The ksm scanner is controlled by sysfs files in /sys/kernel/mm/ksm/:
      
      max_kernel_pages - the maximum number of unswappable kernel pages
                         which may be allocated by ksm (0 for unlimited).
      
      kernel_pages_allocated - how many ksm pages are currently allocated,
                               sharing identical content between different
                               processes (pages unswappable in this release).
      
      pages_shared - how many pages have been saved by sharing with ksm pages
                     (kernel_pages_allocated being excluded from this count).
      
      pages_to_scan - how many pages ksm should scan before sleeping.
      
      sleep_millisecs - how many milliseconds ksm should sleep between scans.
      
      run - write 0 to disable ksm, read 0 while ksm is disabled (default),
            write 1 to run ksm, read 1 while ksm is running,
            write 2 to disable ksm and unmerge all its pages.
      
      Includes contributions by Andrea Arcangeli Chris Wright and Hugh Dickins.
      Signed-off-by: default avatarIzik Eidus <ieidus@redhat.com>
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarChris Wright <chrisw@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      90b39ee6
  2. 24 Aug, 2009 2 commits
    • Hugh Dickins's avatar
      KSM will need to identify its kernel merged pages unambiguously, and · 8e0eabbe
      Hugh Dickins authored
      /proc/kpageflags will probably like to do so too.
      
      Since KSM will only be substituting anonymous pages, statistics are best
      preserved by making a PageKsm page a special PageAnon page: one with no
      anon_vma.
      
      But KSM then needs its own page_add_ksm_rmap() - keep it in ksm.h near
      PageKsm; and do_wp_page() must COW them, unlike singly mapped PageAnons.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarChris Wright <chrisw@redhat.com>
      Signed-off-by: default avatarIzik Eidus <ieidus@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8e0eabbe
    • Hugh Dickins's avatar
      page_dup_rmap(), used on each mapped page when forking, was originally · 9787d456
      Hugh Dickins authored
      just an inline atomic_inc of mapcount.  2.6.22 added CONFIG_DEBUG_VM
      out-of-line checks to it, which would need to be ever-so-slightly
      complicated to allow for the PageKsm() we're about to define.
      
      But I think these checks never caught anything.  And if it's coding errors
      we're worried about, such checks should be in page_remove_rmap() too, not
      just when forking; whereas if it's pagetable corruption we're worried
      about, then they shouldn't be limited to CONFIG_DEBUG_VM.
      
      Oh, just revert page_dup_rmap() to an inline atomic_inc of mapcount.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarChris Wright <chrisw@redhat.com>
      Signed-off-by: default avatarIzik Eidus <ieidus@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9787d456
  3. 13 Aug, 2009 1 commit
    • Hugh Dickins's avatar
      This patch presents the mm interface to a dummy version of ksm.c, for · 58d5b7d3
      Hugh Dickins authored
      better scrutiny of that interface: the real ksm.c follows later.
      
      When CONFIG_KSM is not set, madvise(2) reject MADV_MERGEABLE and
      MADV_UNMERGEABLE with EINVAL, since that seems more helpful than
      pretending that they can be serviced.  But when CONFIG_KSM=y, accept them
      even if KSM is not currently running, and even on areas which KSM will not
      touch (e.g.  hugetlb or shared file or special driver mappings).
      
      Like other madvices, report ENOMEM despite success if any area in the
      range is unmapped, and use EAGAIN to report out of memory.
      
      Define vma flag VM_MERGEABLE to identify an area on which KSM may try
      merging pages: leave it to ksm_madvise() to decide whether to set it. 
      Define mm flag MMF_VM_MERGEABLE to identify an mm which might contain
      VM_MERGEABLE areas, to minimize callouts when forking or exiting.
      
      Based upon earlier patches by Chris Wright and Izik Eidus.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarChris Wright <chrisw@redhat.com>
      Signed-off-by: default avatarIzik Eidus <ieidus@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      58d5b7d3
  4. 22 Aug, 2009 1 commit
    • Hugh Dickins's avatar
      The out-of-tree KSM used ioctls on fds cloned from /dev/ksm to register a · 330d6131
      Hugh Dickins authored
      memory area for merging: we prefer now to use an madvise(2) interface.
      
      This patch just defines MADV_MERGEABLE (to tell KSM it may merge pages in
      this area found identical to pages in other mergeable areas) and
      MADV_UNMERGEABLE (to undo that).
      
      Most architectures use asm-generic, but alpha, mips, parisc, xtensa need
      their own definitions: included here for mmotm convenience, but we'll
      probably want to split this and feed pieces to arch maintainers.
      
      Based upon earlier patches by Chris Wright and Izik Eidus.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarChris Wright <chrisw@redhat.com>
      Signed-off-by: default avatarIzik Eidus <ieidus@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      330d6131
  5. 24 Aug, 2009 3 commits
    • Hugh Dickins's avatar
      madvise.c has several levels of switch statements, what to do in which? · 72eb3452
      Hugh Dickins authored
      Move MADV_DOFORK code down from madvise_vma() to madvise_behavior(), so
      madvise_vma() can be a simple router, to madvise_behavior() by default.
      
      vma->vm_flags is an unsigned long so use the same type for new_flags.  Add
      missing comment lines to describe MADV_DONTFORK and MADV_DOFORK.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarChris Wright <chrisw@redhat.com>
      Signed-off-by: default avatarIzik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      72eb3452
    • Izik Eidus's avatar
      KSM is a linux driver that allows dynamicly sharing identical memory pages · 9923cb2a
      Izik Eidus authored
      between one or more processes.
      
      Unlike tradtional page sharing that is made at the allocation of the
      memory, ksm do it dynamicly after the memory was created.  Memory is
      periodically scanned; identical pages are identified and merged.
      
      The sharing is made in a transparent way to the procsess that use it.
      
      Ksm is highly important for hypervisors (kvm), where in production
      enviorments there might be many copys of the same data data among the host
      memory.  This kind of data can be: similar kernels, librarys, cache, and
      so on.
      
      Even that ksm was wrote for kvm, any userspace application that want to
      use it to share its data can try it.
      
      Ksm may be useful for any application that might have similar (page
      aligment) data strctures among the memory, ksm will find this data merge
      it to one copy, and even if it will be changed and thereforew copy on
      writed, ksm will merge it again as soon as it will be identical again.
      
      Another reason to consider using ksm is the fact that it might simplify
      alot the userspace code of application that want to use shared private
      data, instead that the application will mange shared area, ksm will do
      this for the application, and even write to this data will be allowed
      without any synchinization acts from the application.
      
      Ksm was designed to be a loadable module that doesn't change the VM code
      of linux.
      
      
      
      This patch:
      
      The set_pte_at_notify() macro allows setting a pte in the shadow page
      table directly, instead of flushing the shadow page table entry and then
      getting vmexit to set it.  It uses a new change_pte() callback to do so.
      
      set_pte_at_notify() is an optimization for kvm, and other users of
      mmu_notifiers, for COW pages.  It is useful for kvm when ksm is used,
      because it allows kvm not to have to receive vmexit and only then map the
      ksm page into the shadow page table, but instead map it directly at the
      same time as Linux maps the page into the host page table.
      
      Users of mmu_notifiers who don't implement new mmu_notifier_change_pte()
      callback will just receive the mmu_notifier_invalidate_page() callback.
      Signed-off-by: default avatarIzik Eidus <ieidus@redhat.com>
      Signed-off-by: default avatarChris Wright <chrisw@redhat.com>
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9923cb2a
    • Johannes Weiner's avatar
      By the time PG_mlocked is cleared in the page freeing path, nobody else is · 848e1c4d
      Johannes Weiner authored
      looking at our page->flags anymore.
      
      It is thus safe to make the test-and-clear non-atomic and thereby removing
      an unnecessary and expensive operation from a hotpath.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      848e1c4d
  6. 21 Jul, 2009 1 commit
  7. 24 Aug, 2009 7 commits
  8. 16 Jul, 2009 1 commit
  9. 24 Aug, 2009 1 commit
    • Rik van Riel's avatar
      When way too many processes go into direct reclaim, it is possible for all · 456a5e9f
      Rik van Riel authored
      of the pages to be taken off the LRU.  One result of this is that the next
      process in the page reclaim code thinks there are no reclaimable pages
      left and triggers an out of memory kill.
      
      One solution to this problem is to never let so many processes into the
      page reclaim path that the entire LRU is emptied.  Limiting the system to
      only having half of each inactive list isolated for reclaim should be
      safe.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      456a5e9f
  10. 16 Jul, 2009 1 commit
  11. 24 Aug, 2009 8 commits
  12. 14 Jul, 2009 1 commit
  13. 24 Aug, 2009 2 commits
  14. 17 Jul, 2009 1 commit
    • Lee Schermerhorn's avatar
      I noticed that alloc_bootmem_huge_page() will only advance to the next · 83d92902
      Lee Schermerhorn authored
      node on failure to allocate a huge page, potentially filling nodes with
      huge-pages.  I asked about this on linux-mm and linux-numa, cc'ing the
      usual huge page suspects.
      
      Mel Gorman responded:
      
      	I strongly suspect that the same node being used until allocation
      	failure instead of round-robin is an oversight and not deliberate
      	at all. It appears to be a side-effect of a fix made way back in
      	commit 63b4613c ["hugetlb: fix
      	hugepage allocation with memoryless nodes"]. Prior to that patch
      	it looked like allocations would always round-robin even when
      	allocation was successful.
      
      This patch--factored out of my "hugetlb mempolicy" series--moves the
      advance of the hstate next node from which to allocate up before the test
      for success of the attempted allocation.
      
      Note that alloc_bootmem_huge_page() is only used for order > MAX_ORDER
      huge pages.
      
      I'll post a separate patch for mainline/stable, as the above mentioned
      "balance freeing" series renamed the next node to alloc function.
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: default avatarMel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarAndy Whitcroft <apw@canonical.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      83d92902
  15. 29 Jun, 2009 1 commit
  16. 30 Jun, 2009 1 commit
    • Lee Schermerhorn's avatar
      Fixes bug detected by libhugetlbfs test suite in: · 73c71f31
      Lee Schermerhorn authored
      hugetlb-use-free_pool_huge_page-to-return-unused-surplus-pages.patch
      
      Can't just "continue" for node with no surplus pages when returning
      unused surplus.  We need to advance to 'next node to free'.
      
      With this fix, the "hugetlb balance free across nodes" series passes
      the test suite.
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      73c71f31
  17. 29 Jun, 2009 1 commit
  18. 24 Aug, 2009 1 commit
  19. 09 Sep, 2009 1 commit
  20. 24 Aug, 2009 1 commit