1. 09 Jan, 2006 18 commits
    • Christoph Lameter's avatar
      [PATCH] Swap Migration V5: migrate_pages() function · 49d2e9cc
      Christoph Lameter authored
      This adds the basic page migration function with a minimal implementation that
      only allows the eviction of pages to swap space.
      
      Page eviction and migration may be useful to migrate pages, to suspend
      programs or for remapping single pages (useful for faulty pages or pages with
      soft ECC failures)
      
      The process is as follows:
      
      The function wanting to migrate pages must first build a list of pages to be
      migrated or evicted and take them off the lru lists via isolate_lru_page().
      isolate_lru_page determines that a page is freeable based on the LRU bit set.
      
      Then the actual migration or swapout can happen by calling migrate_pages().
      
      migrate_pages does its best to migrate or swapout the pages and does multiple
      passes over the list.  Some pages may only be swappable if they are not dirty.
       migrate_pages may start writing out dirty pages in the initial passes over
      the pages.  However, migrate_pages may not be able to migrate or evict all
      pages for a variety of reasons.
      
      The remaining pages may be returned to the LRU lists using putback_lru_pages().
      
      Changelog V4->V5:
      - Use the lru caches to return pages to the LRU
      
      Changelog V3->V4:
      - Restructure code so that applying patches to support full migration does
        require minimal changes. Rename swapout_pages() to migrate_pages().
      
      Changelog V2->V3:
      - Extract common code from shrink_list() and swapout_pages()
      Signed-off-by: default avatarMike Kravetz <kravetz@us.ibm.com>
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: "Michael Kerrisk" <mtk-manpages@gmx.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      49d2e9cc
    • Christoph Lameter's avatar
      [PATCH] Swap Migration V5: PF_SWAPWRITE to allow writing to swap · 930d9152
      Christoph Lameter authored
      Add PF_SWAPWRITE to control a processes permission to write to swap.
      
      - Use PF_SWAPWRITE in may_write_to_queue() instead of checking for kswapd
        and pdflush
      
      - Set PF_SWAPWRITE flag for kswapd and pdflush
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      930d9152
    • Christoph Lameter's avatar
      [PATCH] Swap Migration V5: LRU operations · 21eac81f
      Christoph Lameter authored
      This is the start of the `swap migration' patch series.
      
      Swap migration allows the moving of the physical location of pages between
      nodes in a numa system while the process is running.  This means that the
      virtual addresses that the process sees do not change.  However, the system
      rearranges the physical location of those pages.
      
      The main intent of page migration patches here is to reduce the latency of
      memory access by moving pages near to the processor where the process
      accessing that memory is running.
      
      The patchset allows a process to manually relocate the node on which its
      pages are located through the MF_MOVE and MF_MOVE_ALL options while
      setting a new memory policy.
      
      The pages of process can also be relocated from another process using the
      sys_migrate_pages() function call.  Requires CAP_SYS_ADMIN.  The migrate_pages
      function call takes two sets of nodes and moves pages of a process that are
      located on the from nodes to the destination nodes.
      
      Manual migration is very useful if for example the scheduler has relocated a
      process to a processor on a distant node.  A batch scheduler or an
      administrator can detect the situation and move the pages of the process
      nearer to the new processor.
      
      sys_migrate_pages() could be used on non-numa machines as well, to force all
      of a particualr process's pages out to swap, if someone thinks that's useful.
      
      Larger installations usually partition the system using cpusets into sections
      of nodes.  Paul has equipped cpusets with the ability to move pages when a
      task is moved to another cpuset.  This allows automatic control over locality
      of a process.  If a task is moved to a new cpuset then also all its pages are
      moved with it so that the performance of the process does not sink
      dramatically (as is the case today).
      
      Swap migration works by simply evicting the page.  The pages must be faulted
      back in.  The pages are then typically reallocated by the system near the node
      where the process is executing.
      
      For swap migration the destination of the move is controlled by the allocation
      policy.  Cpusets set the allocation policy before calling sys_migrate_pages()
      in order to move the pages as intended.
      
      No allocation policy changes are performed for sys_migrate_pages().  This
      means that the pages may not faulted in to the specified nodes if no
      allocation policy was set by other means.  The pages will just end up near the
      node where the fault occurred.
      
      There's another patch series in the pipeline which implements "direct
      migration".
      
      The direct migration patchset extends the migration functionality to avoid
      going through swap.  The destination node of the relation is controllable
      during the actual moving of pages.  The crutch of using the allocation policy
      to relocate is not necessary and the pages are moved directly to the target.
      Its also faster since swap is not used.
      
      And sys_migrate_pages() can then move pages directly to the specified node.
      Implement functions to isolate pages from the LRU and put them back later.
      
      This patch:
      
      An earlier implementation was provided by Hirokazu Takahashi
      <taka@valinux.co.jp> and IWAMOTO Toshihiro <iwamoto@valinux.co.jp> for the
      memory hotplug project.
      
      From: Magnus
      
      This breaks out isolate_lru_page() and putpack_lru_page().  Needed for swap
      migration.
      Signed-off-by: default avatarMagnus Damm <magnus.damm@gmail.com>
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      21eac81f
    • Christoph Lameter's avatar
      [PATCH] add schedule_on_each_cpu() · 15316ba8
      Christoph Lameter authored
      swap migration's isolate_lru_page() currently uses an IPI to notify other
      processors that the lru caches need to be drained if the page cannot be
      found on the LRU.  The IPI interrupt may interrupt a processor that is just
      processing lru requests and cause a race condition.
      
      This patch introduces a new function run_on_each_cpu() that uses the
      keventd() to run the LRU draining on each processor.  Processors disable
      preemption when dealing the LRU caches (these are per processor) and thus
      executing LRU draining from another process is safe.
      
      Thanks to Lee Schermerhorn <lee.schermerhorn@hp.com> for finding this race
      condition.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      15316ba8
    • Nick Piggin's avatar
      [PATCH] mm: free_pages opt · 48db57f8
      Nick Piggin authored
      Try to streamline free_pages_bulk by ensuring callers don't pass in a
      'count' that exceeds the list size.
      
      Some cleanups:
      Rename __free_pages_bulk to __free_one_page.
      Put the page list manipulation from __free_pages_ok into free_one_page.
      Make __free_pages_ok static.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      48db57f8
    • Nick Piggin's avatar
      [PATCH] mm: cleanup zone_pcp · 23316bc8
      Nick Piggin authored
      Use zone_pcp everywhere even though NUMA code "knows" the internal details
      of the zone.  Stop other people trying to copy, and it looks nicer.
      
      Also, only print the pagesets of online cpus in zoneinfo.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: "Seth, Rohit" <rohit.seth@intel.com>
      Cc: Christoph Lameter <christoph@lameter.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      23316bc8
    • Rohit Seth's avatar
      [PATCH] Make high and batch sizes of per_cpu_pagelists configurable · 8ad4b1fb
      Rohit Seth authored
      As recently there has been lot of traffic on the right values for batch and
      high water marks for per_cpu_pagelists.  This patch makes these two
      variables configurable through /proc interface.
      
      A new tunable /proc/sys/vm/percpu_pagelist_fraction is added.  This entry
      controls the fraction of pages at most in each zone that are allocated for
      each per cpu page list.  The min value for this is 8.  It means that we
      don't allow more than 1/8th of pages in each zone to be allocated in any
      single per_cpu_pagelist.
      
      The batch value of each per cpu pagelist is also updated as a result.  It
      is set to pcp->high/4.  The upper limit of batch is (PAGE_SHIFT * 8)
      Signed-off-by: default avatarRohit Seth <rohit.seth@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8ad4b1fb
    • Andrew Morton's avatar
      [PATCH] drop-pagecache · 9d0243bc
      Andrew Morton authored
      Add /proc/sys/vm/drop_caches.  When written to, this will cause the kernel to
      discard as much pagecache and/or reclaimable slab objects as it can.  THis
      operation requires root permissions.
      
      It won't drop dirty data, so the user should run `sync' first.
      
      Caveats:
      
      a) Holds inode_lock for exorbitant amounts of time.
      
      b) Needs to be taught about NUMA nodes: propagate these all the way through
         so the discarding can be controlled on a per-node basis.
      
      This is a debugging feature: useful for getting consistent results between
      filesystem benchmarks.  We could possibly put it under a config option, but
      it's less than 300 bytes.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9d0243bc
    • Christoph Lameter's avatar
      [PATCH] slab: remove nested #ifdef CONFIG_NUMA · bec6b0c8
      Christoph Lameter authored
      For some reason there is an #ifdef CONFIG_NUMA within another #ifdef
      CONFIG_NUMA in the page allocator.  Remove innermost #ifdef CONFIG_NUMA
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      bec6b0c8
    • Pekka Enberg's avatar
      [PATCH] slab: fix code formatting · b28a02de
      Pekka Enberg authored
      The slab allocator code is inconsistent in coding style and messy.  For this
      patch, I ran Lindent for mm/slab.c and fixed up goofs by hand.
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b28a02de
    • Pekka Enberg's avatar
      [PATCH] slab: extract slab order calculation to separate function · 4d268eba
      Pekka Enberg authored
      This patch moves the ugly loop that determines the 'optimal' size (page order)
      of cache slabs from kmem_cache_create() to a separate function and cleans it
      up a bit.
      
      Thanks to Matthew Wilcox for the help with this patch.
      Signed-off-by: default avatarMatthew Dobson <colpatch@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4d268eba
    • Pekka Enberg's avatar
      [PATCH] slab: extract slabinfo header printing to separate function · 85289f98
      Pekka Enberg authored
      This patch extracts slabinfo header printing to a separate function
      print_slabinfo_header() to make s_start() more readable.
      Signed-off-by: default avatarMatthew Dobson <colpatch@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      85289f98
    • Pekka Enberg's avatar
      [PATCH] slab: remove unused align parameter from alloc_percpu · f9f75005
      Pekka Enberg authored
      __alloc_percpu and alloc_percpu both take an 'align' argument which is
      completely ignored.  snmp6_mib_init() in net/ipv6/af_inet6.c attempts to use
      it, but it will be ignored.  Therefore, remove the 'align' argument and fixup
      the lone caller.
      Signed-off-by: default avatarMatthew Dobson <colpatch@us.ibm.com>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f9f75005
    • Olaf Hering's avatar
      [PATCH] Fix compilation with CONFIG_MEMORY_HOTPLUG=y and gcc41. · b792de39
      Olaf Hering authored
      Fix compilation with CONFIG_MEMORY_HOTPLUG=y and gcc41.
      Also remove unneeded declations, add a public function.
      
      drivers/base/memory.c:53: error: static declaration of 'register_memory_notifier' follows non-static declaration
      include/linux/memory.h:85: error: previous declaration of 'register_memory_notifier' was here
      drivers/base/memory.c:58: error: static declaration of 'unregister_memory_notifier' follows non-static declaration
      include/linux/memory.h:86: error: previous declaration of 'unregister_memory_notifier' was here
      drivers/base/memory.c:68: error: static declaration of 'register_memory' follows non-static declaration
      include/linux/memory.h:73: error: previous declaration of 'register_memory' was here
      Signed-off-by: default avatarOlaf Hering <olh@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b792de39
    • Woody Suwalski's avatar
      [PATCH] ARM Netwinder watchdog wdt977 update · 4dab06fa
      Woody Suwalski authored
      Cleanup for the ARM-only watchdog driver wdt977.
      
      This is probably the last update, since we want to merge with w83977f_wdt.
      Jose Goncalves has ported this driver to i386, so probably we can iron out
      configuration differences.
      Signed-off-by: default avatarWoody Suwalski <woodys@xandros.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4dab06fa
    • Marcelo Tosatti's avatar
      [PATCH] small hp_sdc_rtc cleanup: use no_llseek · 70c00ba0
      Marcelo Tosatti authored
      Use no_llseek function.
      Signed-off-by: default avatarMarcelo Tosatti <marcelo.tosatti@cyclades.com>
      Cc: "Brian S. Julin" <bri@calyx.com>
      Acked-by: default avatarVojtech Pavlik <vojtech@suse.cz>
      Cc: Dmitry Torokhov <dtor_core@ameritech.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      70c00ba0
    • Andrew Morton's avatar
      [PATCH] asm-generic/atomic.h needs types.h · 5998bf1d
      Andrew Morton authored
      For BITS_PER_LONG
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5998bf1d
    • Andrew Morton's avatar
      [PATCH] revert "mm: page_state fixes" · 84c2008a
      Andrew Morton authored
      Hugh says:
      
      page_alloc_cpu_notify() specifically contains code to
      
       		/* Add dead cpu's page_states to our own. */
      
      which handles this more efficiently.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      84c2008a
  2. 07 Jan, 2006 22 commits