1. 07 Dec, 2006 40 commits
    • Rafael J. Wysocki's avatar
      [PATCH] swsusp: Untangle thaw_processes · a9b6f562
      Rafael J. Wysocki authored
      Move the loop from thaw_processes() to a separate function and call it
      independently for kernel threads and user space processes so that the order
      of thawing tasks is clearly visible.
      
      Drop thaw_kernel_threads() which is never used.
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@suspend2.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a9b6f562
    • Stephen Hemminger's avatar
      [PATCH] convert pm_sem to a mutex · a6d70980
      Stephen Hemminger authored
      The power management semaphore is only used as mutex, so convert it.
      
      [akpm@osdl.org: fix rotten bug]
      Signed-off-by: default avatarStephen Hemminger <shemminger@osdl.org>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a6d70980
    • Rafael J. Wysocki's avatar
      [PATCH] suspend to disk fails if gdb is suspended with a traced child · 3eb1b3a4
      Rafael J. Wysocki authored
      Fix http://bugzilla.kernel.org/show_bug.cgi?id=7534
      
      Fix the freezing of processes so that it won't fail if there is a traced
      process the parent of which has been stopped.
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Cc: maurice barnum <pixi+kbug@burble.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3eb1b3a4
    • Rafael J. Wysocki's avatar
      [PATCH] swsusp: Measure memory shrinking time · 0d3a9abe
      Rafael J. Wysocki authored
      Make swsusp measure and print the time needed to shrink memory during the
      suspend.
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@suspend2.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0d3a9abe
    • Siddha, Suresh B's avatar
      [PATCH] suspend: don't change cpus_allowed for task initiating the suspend · 112cecb2
      Siddha, Suresh B authored
      Don't modify the cpus_allowed of the task initiating the suspend.
      _cpu_down() already makes sure that the task doing the suspend doesn't run
      on dying cpu.
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Nigel Cunningham <nigel@suspend2.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      112cecb2
    • Rafael J. Wysocki's avatar
      [PATCH] swsusp: Support i386 systems with PAE or without PSE · 2d4a34c9
      Rafael J. Wysocki authored
      Make swsusp support i386 systems with PAE or without PSE.
      
      This is done by creating temporary page tables located in resume-safe page
      frames before the suspend image is restored in the same way as x86_64 does
      it.
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Nigel Cunningham <ncunningham@linuxmail.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2d4a34c9
    • Nigel Cunningham's avatar
      [PATCH] swsusp: thaw userspace and kernel space separately · ff39593a
      Nigel Cunningham authored
      Modify process thawing so that we can thaw kernel space without thawing
      userspace, and thaw kernelspace first.  This will be useful in later
      patches, where I intend to get swsusp thawing kernel threads only before
      seeking to free memory.
      Signed-off-by: default avatarNigel Cunningham <nigel@suspend2.net>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ff39593a
    • Nigel Cunningham's avatar
      [PATCH] swsusp: clean up whitespace in freezer output · 14b5b7cf
      Nigel Cunningham authored
      Minor whitespace and formatting modifications for the freezer.
      Signed-off-by: default avatarNigel Cunningham <nigel@suspend2.net>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      14b5b7cf
    • Nigel Cunningham's avatar
      [PATCH] swsusp: quieten Freezer if !CONFIG_PM_DEBUG · 32d50f57
      Nigel Cunningham authored
      The freezer currently prints an '=' for every process that is frozen.  This
      is pretty pointless, as the equals sign says nothing about which process is
      frozen, and makes logs look messier (especially if there were a large
      number of processes running).  All we really need to know is that we
      started trying to freeze processes and what processes (if any) failed to
      freeze, or that we succeeded.
      Signed-off-by: default avatarNigel Cunningham <nigel@suspend2.net>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      32d50f57
    • Nigel Cunningham's avatar
      [PATCH] Add include/linux/freezer.h and move definitions from sched.h · 7dfb7103
      Nigel Cunningham authored
      Move process freezing functions from include/linux/sched.h to freezer.h, so
      that modifications to the freezer or the kernel configuration don't require
      recompiling just about everything.
      
      [akpm@osdl.org: fix ueagle driver]
      Signed-off-by: default avatarNigel Cunningham <nigel@suspend2.net>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7dfb7103
    • Stefan Seyfried's avatar
      [PATCH] swsusp: fix platform mode · 8a05aac2
      Stefan Seyfried authored
      At some point after 2.6.13, in-kernel software suspend got "incomplete" for
      the so-called "platform" mode.  pm_ops->prepare() is never called.  A
      visible sign of this is the "moon" light on thinkpads not flashing during
      suspend.  Fix by readding the pm_ops->prepare call during suspend.
      Signed-off-by: default avatarStefan Seyfried <seife@suse.de>
      Acked-by: default avatar"Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8a05aac2
    • Rafael J. Wysocki's avatar
      [PATCH] swsusp: use __GFP_WAIT · 85949121
      Rafael J. Wysocki authored
      swsusp uses GFP_ATOMIC, but it can afford to use __GFP_WAIT, which will
      permit it to reclaim clean pagecache instead of emitting scary
      page-allocation-failure messages.
      
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      85949121
    • Rafael J. Wysocki's avatar
      [PATCH] swsusp: Improve handling of highmem · 8357376d
      Rafael J. Wysocki authored
      Currently swsusp saves the contents of highmem pages by copying them to the
      normal zone which is quite inefficient (eg.  it requires two normal pages
      to be used for saving one highmem page).  This may be improved by using
      highmem for saving the contents of saveable highmem pages.
      
      Namely, during the suspend phase of the suspend-resume cycle we try to
      allocate as many free highmem pages as there are saveable highmem pages.
      If there are not enough highmem image pages to store the contents of all of
      the saveable highmem pages, some of them will be stored in the "normal"
      memory.  Next, we allocate as many free "normal" pages as needed to store
      the (remaining) image data.  We use a memory bitmap to mark the allocated
      free pages (ie.  highmem as well as "normal" image pages).
      
      Now, we use another memory bitmap to mark all of the saveable pages
      (highmem as well as "normal") and the contents of the saveable pages are
      copied into the image pages.  Then, the second bitmap is used to save the
      pfns corresponding to the saveable pages and the first one is used to save
      their data.
      
      During the resume phase the pfns of the pages that were saveable during the
      suspend are loaded from the image and used to mark the "unsafe" page
      frames.  Next, we try to allocate as many free highmem page frames as to
      load all of the image data that had been in the highmem before the suspend
      and we allocate so many free "normal" page frames that the total number of
      allocated free pages (highmem and "normal") is equal to the size of the
      image.  While doing this we have to make sure that there will be some extra
      free "normal" and "safe" page frames for two lists of PBEs constructed
      later.
      
      Now, the image data are loaded, if possible, into their "original" page
      frames.  The image data that cannot be written into their "original" page
      frames are loaded into "safe" page frames and their "original" kernel
      virtual addresses, as well as the addresses of the "safe" pages containing
      their copies, are stored in one of two lists of PBEs.
      
      One list of PBEs is for the copies of "normal" suspend pages (ie.  "normal"
      pages that were saveable during the suspend) and it is used in the same way
      as previously (ie.  by the architecture-dependent parts of swsusp).  The
      other list of PBEs is for the copies of highmem suspend pages.  The pages
      in this list are restored (in a reversible way) right before the
      arch-dependent code is called.
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8357376d
    • Rafael J. Wysocki's avatar
      [PATCH] swsusp: update userland interface documentation · bf73bae6
      Rafael J. Wysocki authored
      The swsusp userland interface has recently changed for a couple of times, but
      the changes have not been documented.  Fix this, and document the
      SNAPSHOT_SET_SWAP_AREA ioctl().
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      bf73bae6
    • Rafael J. Wysocki's avatar
      [PATCH] swsusp: add ioctl for swap files support · 37b2ba12
      Rafael J. Wysocki authored
      To be able to use swap files as suspend storage from the userland suspend
      tools we need an additional ioctl() that will allow us to provide the kernel
      with both the swap header's offset and the identification of the resume
      partition.
      
      The new ioctl() should be regarded as a replacement for the
      SNAPSHOT_SET_SWAP_FILE ioctl() that from now on will be considered as
      obsolete, but has to stay for backwards compatibility of the interface.
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      37b2ba12
    • Rafael J. Wysocki's avatar
      [PATCH] swsusp: document support for swap files · ecbd0da1
      Rafael J. Wysocki authored
      Document the "resume_offset=" command line parameter as well as the way in
      which swap files are supported by swsusp.
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ecbd0da1
    • Rafael J. Wysocki's avatar
      [PATCH] swsusp: add resume_offset command line parameter · 9a154d9d
      Rafael J. Wysocki authored
      Add the kernel command line parameter "resume_offset=" allowing us to specify
      the offset, in <PAGE_SIZE> units, from the beginning of the partition pointed
      to by the "resume=" parameter at which the swap header is located.
      
      This offset can be determined, for example, by an application using the FIBMAP
      ioctl to obtain the swap header's block number for given file.
      
      [akpm@osdl.org: we don't know what type sector_t is]
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9a154d9d
    • Rafael J. Wysocki's avatar
      [PATCH] swsusp: use block device offsets to identify swap locations · 3aef83e0
      Rafael J. Wysocki authored
      Make swsusp use block device offsets instead of swap offsets to identify swap
      locations and make it use the same code paths for writing as well as for
      reading data.
      
      This allows us to use the same code for handling swap files and swap
      partitions and to simplify the code, eg.  by dropping rw_swap_page_sync().
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3aef83e0
    • Rafael J. Wysocki's avatar
      [PATCH] swsusp: rearrange swap-handling code · 3fc6b34f
      Rafael J. Wysocki authored
      Rearrange the code in kernel/power/swap.c so that the next patch is more
      readable.
      
      [This patch only moves the existing code.]
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3fc6b34f
    • Rafael J. Wysocki's avatar
      [PATCH] swsusp: use partition device and offset to identify swap areas · 915bae9e
      Rafael J. Wysocki authored
      The Linux kernel handles swap files almost in the same way as it handles swap
      partitions and there are only two differences between these two types of swap
      areas:
      
      (1) swap files need not be contiguous,
      
      (2) the header of a swap file is not in the first block of the partition
          that holds it.  From the swsusp's point of view (1) is not a problem,
          because it is already taken care of by the swap-handling code, but (2) has
          to be taken into consideration.
      
      In principle the location of a swap file's header may be determined with the
      help of appropriate filesystem driver.  Unfortunately, however, it requires
      the filesystem holding the swap file to be mounted, and if this filesystem is
      journaled, it cannot be mounted during a resume from disk.  For this reason we
      need some other means by which swap areas can be identified.
      
      For example, to identify a swap area we can use the partition that holds the
      area and the offset from the beginning of this partition at which the swap
      header is located.
      
      The following patch allows swsusp to identify swap areas this way.  It changes
      swap_type_of() so that it takes an additional argument representing an offset
      of the swap header within the partition represented by its first argument.
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      915bae9e
    • Stefan Seyfried's avatar
      [PATCH] uswsusp: add pmops->{prepare,enter,finish} support (aka "platform mode") · 3592695c
      Stefan Seyfried authored
      Add an ioctl to the userspace swsusp code that enables the usage of the
      pmops->prepare, pmops->enter and pmops->finish methods (the in-kernel
      suspend knows these as "platform method").  These are needed on many
      machines to (among others) speed up resuming by letting the BIOS skip some
      steps or let my hp nx5000 recognise the correct ac_adapter state after
      resume again.
      
      It also ensures on many machines, that changed hardware (unplugged AC
      adapters) gets correctly detected and that kacpid does not run wild after
      resume.
      Signed-off-by: default avatarStefan Seyfried <seife@suse.de>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3592695c
    • Alan Cox's avatar
      [PATCH] alpha: switch to pci_get API · 074cec54
      Alan Cox authored
      Now that we have pci_get_bus_and_slot we can do the job correctly.  Note that
      some of these calls intentionally leak a device - this is because the device
      in question is always needed from boot to reboot.
      Signed-off-by: default avatarAlan Cox <alan@redhat.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      074cec54
    • Mariusz Kozlowski's avatar
      [PATCH] h8300 stray bracket fix · 3869aa29
      Mariusz Kozlowski authored
      Signed-off-by: default avatarMariusz Kozlowski <m.kozlowski@tuxland.pl>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3869aa29
    • Paul Mundt's avatar
      [PATCH] avr32: fixup kprobes preemption handling · 4d3eeeac
      Paul Mundt authored
      While working on SH kprobes, I noticed that avr32 got the preemption
      handling wrong in the no probe case.  The idea is that upon entry of
      kprobe_handler() preemption is disabled outright across the life of the
      kprobe, only to be re-enabled in post_kprobe_handler().
      
      However, in the event that the probe is never activated, there's never any
      chance of hitting the post probe handler, which allows for the current
      avr32 implementation to disable preemption indefinitely, as it's currently
      missing a re-enable when no probe is activated.
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4d3eeeac
    • Adrian Bunk's avatar
      [PATCH] arch/frv/kernel/futex.c must #include <linux/uaccess.h> · e9c1528a
      Adrian Bunk authored
      This patch fixes the following compile error with
      -Werror-implicit-function-declaration
      (without -Werror-implicit-function-declaration it's a link error):
      
          ...
            CC      arch/frv/kernel/futex.o
          /home/bunk/linux/kernel-2.6/linux-2.6.19-rc6-mm2/arch/frv/kernel/futex.c:
          In function 'futex_atomic_op_inuser':
          /home/bunk/linux/kernel-2.6/linux-2.6.19-rc6-mm2/arch/frv/kernel/futex.c:203:
          error: implicit declaration of function 'pagefault_disable'
          /home/bunk/linux/kernel-2.6/linux-2.6.19-rc6-mm2/arch/frv/kernel/futex.c:226:
          error: implicit declaration of function 'pagefault_enable'
          make[2]: *** [arch/frv/kernel/futex.o] Error 1
          ...
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e9c1528a
    • Eric Sesterhenn's avatar
    • Nick Piggin's avatar
      [PATCH] radix-tree: RCU lockless readside · 7cf9c2c7
      Nick Piggin authored
      Make radix tree lookups safe to be performed without locks.  Readers are
      protected against nodes being deleted by using RCU based freeing.  Readers
      are protected against new node insertion by using memory barriers to ensure
      the node itself will be properly written before it is visible in the radix
      tree.
      
      Each radix tree node keeps a record of their height (above leaf nodes).
      This height does not change after insertion -- when the radix tree is
      extended, higher nodes are only inserted in the top.  So a lookup can take
      the pointer to what is *now* the root node, and traverse down it even if
      the tree is concurrently extended and this node becomes a subtree of a new
      root.
      
      "Direct" pointers (tree height of 0, where root->rnode points directly to
      the data item) are handled by using the low bit of the pointer to signal
      whether rnode is a direct pointer or a pointer to a radix tree node.
      
      When a reader wants to traverse the next branch, they will take a copy of
      the pointer.  This pointer will be either NULL (and the branch is empty) or
      non-NULL (and will point to a valid node).
      
      [akpm@osdl.org: cleanups]
      [Lee.Schermerhorn@hp.com: bugfixes, comments, simplifications]
      [clameter@sgi.com: build fix]
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7cf9c2c7
    • Arnaldo Carvalho de Melo's avatar
      [PATCH] Save some bytes in struct mm_struct · 36de6437
      Arnaldo Carvalho de Melo authored
      Before:
      [acme@newtoy net-2.6.20]$ pahole --cacheline 32 kernel/sched.o mm_struct
      
      /* include2/asm/processor.h:542 */
      struct mm_struct {
              struct vm_area_struct *    mmap;                 /*     0     4 */
              struct rb_root             mm_rb;                /*     4     4 */
              struct vm_area_struct *    mmap_cache;           /*     8     4 */
              long unsigned int          (*get_unmapped_area)(); /*    12     4 */
              void                       (*unmap_area)();      /*    16     4 */
              long unsigned int          mmap_base;            /*    20     4 */
              long unsigned int          task_size;            /*    24     4 */
              long unsigned int          cached_hole_size;     /*    28     4 */
              /* ---------- cacheline 1 boundary ---------- */
              long unsigned int          free_area_cache;      /*    32     4 */
              pgd_t *                    pgd;                  /*    36     4 */
              atomic_t                   mm_users;             /*    40     4 */
              atomic_t                   mm_count;             /*    44     4 */
              int                        map_count;            /*    48     4 */
              struct rw_semaphore        mmap_sem;             /*    52    64 */
              spinlock_t                 page_table_lock;      /*   116    40 */
              struct list_head           mmlist;               /*   156     8 */
              mm_counter_t               _file_rss;            /*   164     4 */
              mm_counter_t               _anon_rss;            /*   168     4 */
              long unsigned int          hiwater_rss;          /*   172     4 */
              long unsigned int          hiwater_vm;           /*   176     4 */
              long unsigned int          total_vm;             /*   180     4 */
              long unsigned int          locked_vm;            /*   184     4 */
              long unsigned int          shared_vm;            /*   188     4 */
              /* ---------- cacheline 6 boundary ---------- */
              long unsigned int          exec_vm;              /*   192     4 */
              long unsigned int          stack_vm;             /*   196     4 */
              long unsigned int          reserved_vm;          /*   200     4 */
              long unsigned int          def_flags;            /*   204     4 */
              long unsigned int          nr_ptes;              /*   208     4 */
              long unsigned int          start_code;           /*   212     4 */
              long unsigned int          end_code;             /*   216     4 */
              long unsigned int          start_data;           /*   220     4 */
              /* ---------- cacheline 7 boundary ---------- */
              long unsigned int          end_data;             /*   224     4 */
              long unsigned int          start_brk;            /*   228     4 */
              long unsigned int          brk;                  /*   232     4 */
              long unsigned int          start_stack;          /*   236     4 */
              long unsigned int          arg_start;            /*   240     4 */
              long unsigned int          arg_end;              /*   244     4 */
              long unsigned int          env_start;            /*   248     4 */
              long unsigned int          env_end;              /*   252     4 */
              /* ---------- cacheline 8 boundary ---------- */
              long unsigned int          saved_auxv[44];       /*   256   176 */
              unsigned int               dumpable:2;           /*   432     4 */
              cpumask_t                  cpu_vm_mask;          /*   436     4 */
              mm_context_t               context;              /*   440    68 */
              long unsigned int          swap_token_time;      /*   508     4 */
              /* ---------- cacheline 16 boundary ---------- */
              char                       recent_pagein;        /*   512     1 */
      
              /* XXX 3 bytes hole, try to pack */
      
              int                        core_waiters;         /*   516     4 */
              struct completion *        core_startup_done;    /*   520     4 */
              struct completion          core_done;            /*   524    52 */
              rwlock_t                   ioctx_list_lock;      /*   576    36 */
              struct kioctx *            ioctx_list;           /*   612     4 */
      }; /* size: 616, sum members: 613, holes: 1, sum holes: 3, cachelines: 20,
            last cacheline: 8 bytes */
      
      After:
      
      [acme@newtoy net-2.6.20]$ pahole --cacheline 32 kernel/sched.o mm_struct
      /* include2/asm/processor.h:542 */
      struct mm_struct {
              struct vm_area_struct *    mmap;                 /*     0     4 */
              struct rb_root             mm_rb;                /*     4     4 */
              struct vm_area_struct *    mmap_cache;           /*     8     4 */
              long unsigned int          (*get_unmapped_area)(); /*    12     4 */
              void                       (*unmap_area)();      /*    16     4 */
              long unsigned int          mmap_base;            /*    20     4 */
              long unsigned int          task_size;            /*    24     4 */
              long unsigned int          cached_hole_size;     /*    28     4 */
              /* ---------- cacheline 1 boundary ---------- */
              long unsigned int          free_area_cache;      /*    32     4 */
              pgd_t *                    pgd;                  /*    36     4 */
              atomic_t                   mm_users;             /*    40     4 */
              atomic_t                   mm_count;             /*    44     4 */
              int                        map_count;            /*    48     4 */
              struct rw_semaphore        mmap_sem;             /*    52    64 */
              spinlock_t                 page_table_lock;      /*   116    40 */
              struct list_head           mmlist;               /*   156     8 */
              mm_counter_t               _file_rss;            /*   164     4 */
              mm_counter_t               _anon_rss;            /*   168     4 */
              long unsigned int          hiwater_rss;          /*   172     4 */
              long unsigned int          hiwater_vm;           /*   176     4 */
              long unsigned int          total_vm;             /*   180     4 */
              long unsigned int          locked_vm;            /*   184     4 */
              long unsigned int          shared_vm;            /*   188     4 */
              /* ---------- cacheline 6 boundary ---------- */
              long unsigned int          exec_vm;              /*   192     4 */
              long unsigned int          stack_vm;             /*   196     4 */
              long unsigned int          reserved_vm;          /*   200     4 */
              long unsigned int          def_flags;            /*   204     4 */
              long unsigned int          nr_ptes;              /*   208     4 */
              long unsigned int          start_code;           /*   212     4 */
              long unsigned int          end_code;             /*   216     4 */
              long unsigned int          start_data;           /*   220     4 */
              /* ---------- cacheline 7 boundary ---------- */
              long unsigned int          end_data;             /*   224     4 */
              long unsigned int          start_brk;            /*   228     4 */
              long unsigned int          brk;                  /*   232     4 */
              long unsigned int          start_stack;          /*   236     4 */
              long unsigned int          arg_start;            /*   240     4 */
              long unsigned int          arg_end;              /*   244     4 */
              long unsigned int          env_start;            /*   248     4 */
              long unsigned int          env_end;              /*   252     4 */
              /* ---------- cacheline 8 boundary ---------- */
              long unsigned int          saved_auxv[44];       /*   256   176 */
              cpumask_t                  cpu_vm_mask;          /*   432     4 */
              mm_context_t               context;              /*   436    68 */
              long unsigned int          swap_token_time;      /*   504     4 */
              char                       recent_pagein;        /*   508     1 */
              unsigned char              dumpable:2;           /*   509     1 */
      
              /* XXX 2 bytes hole, try to pack */
      
              int                        core_waiters;         /*   512     4 */
              struct completion *        core_startup_done;    /*   516     4 */
              struct completion          core_done;            /*   520    52 */
              rwlock_t                   ioctx_list_lock;      /*   572    36 */
              struct kioctx *            ioctx_list;           /*   608     4 */
      }; /* size: 612, sum members: 610, holes: 1, sum holes: 2, cachelines: 20,
            last cacheline: 4 bytes */
      
      [acme@newtoy net-2.6.20]$ codiff -V /tmp/sched.o.before kernel/sched.o
      /pub/scm/linux/kernel/git/acme/net-2.6.20/kernel/sched.c:
        struct mm_struct |   -4
          dumpable:2;
           from: unsigned int          /*   432(30)    4(2) */
           to:   unsigned char         /*   509(6)     1(2) */
      < SNIP other offset changes >
       1 struct changed
      [acme@newtoy net-2.6.20]$
      
      I'm not aware of any problem about using 2 byte wide bitfields where
      previously a 4 byte wide one was, holler if there is any, I wouldn't be
      surprised, bitfields are things from hell.
      
      For the curious, 432(30) means: at offset 432 from the struct start, at
      offset 30 in the bitfield (yeah, it comes backwards, hellish, huh?) ditto
      for 509(6), while 4(2) and 1(2) means "struct field size(bitfield size)".
      
      Now we have a 2 bytes hole and are using only 4 bytes of the last 32
      bytes cacheline, any takers? :-)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@mandriva.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      36de6437
    • Andy Whitcroft's avatar
      [PATCH] mm: make compound page destructor handling explicit · 33f2ef89
      Andy Whitcroft authored
      Currently we we use the lru head link of the second page of a compound page
      to hold its destructor.  This was ok when it was purely an internal
      implmentation detail.  However, hugetlbfs overrides this destructor
      violating the layering.  Abstract this out as explicit calls, also
      introduce a type for the callback function allowing them to be type
      checked.  For each callback we pre-declare the function, causing a type
      error on definition rather than on use elsewhere.
      
      [akpm@osdl.org: cleanups]
      Signed-off-by: default avatarAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      33f2ef89
    • Christoph Lameter's avatar
      [PATCH] slab: better fallback allocation behavior · 3c517a61
      Christoph Lameter authored
      Currently we simply attempt to allocate from all allowed nodes using
      GFP_THISNODE.  However, GFP_THISNODE does not do reclaim (it wont do any at
      all if the recent GFP_THISNODE patch is accepted).  If we truly run out of
      memory in the whole system then fallback_alloc may return NULL although
      memory may still be available if we would perform more thorough reclaim.
      
      This patch changes fallback_alloc() so that we first only inspect all the
      per node queues for available slabs.  If we find any then we allocate from
      those.  This avoids slab fragmentation by first getting rid of all partial
      allocated slabs on every node before allocating new memory.
      
      If we cannot satisfy the allocation from any per node queue then we extend
      a slab.  We now call into the page allocator without specifying
      GFP_THISNODE.  The page allocator will then implement its own fallback (in
      the given cpuset context), perform necessary reclaim (again considering not
      a single node but the whole set of allowed nodes) and then return pages for
      a new slab.
      
      We identify from which node the pages were allocated and then insert the
      pages into the corresponding per node structure.  In order to do so we need
      to modify cache_grow() to take a parameter that specifies the new slab.
      kmem_getpages() can no longer set the GFP_THISNODE flag since we need to be
      able to use kmem_getpage to allocate from an arbitrary node.  GFP_THISNODE
      needs to be specified when calling cache_grow().
      
      One key advantage is that the decision from which node to allocate new
      memory is removed from slab fallback processing.  The patch allows to go
      back to use of the page allocators fallback/reclaim logic.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3c517a61
    • Christoph Lameter's avatar
      [PATCH] GFP_THISNODE must not trigger global reclaim · 952f3b51
      Christoph Lameter authored
      The intent of GFP_THISNODE is to make sure that an allocation occurs on a
      particular node.  If this is not possible then NULL needs to be returned so
      that the caller can choose what to do next on its own (the slab allocator
      depends on that).
      
      However, GFP_THISNODE currently triggers reclaim before returning a failure
      (GFP_THISNODE means GFP_NORETRY is set).  If we have over allocated a node
      then we will currently do some reclaim before returning NULL.  The caller
      may want memory from other nodes before reclaim should be triggered.  (If
      the caller wants reclaim then he can directly use __GFP_THISNODE instead).
      
      There is no flag to avoid reclaim in the page allocator and adding yet
      another GFP_xx flag would be difficult given that we are out of available
      flags.
      
      So just compare and see if all bits for GFP_THISNODE (__GFP_THISNODE,
      __GFP_NORETRY and __GFP_NOWARN) are set.  If so then we return NULL before
      waking up kswapd.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      952f3b51
    • Christoph Lameter's avatar
      [PATCH] slab: fix two issues in kmalloc_node / __cache_alloc_node · 5bcd234d
      Christoph Lameter authored
      This addresses two issues:
      
      1. Kmalloc_node() may intermittently return NULL if we are allocating
         from the current node and are unable to obtain memory for the current
         node from the page allocator.  This is because we call ___cache_alloc()
         if nodeid == numa_node_id() and ____cache_alloc is not able to fallback
         to other nodes.
      
         This was introduced in the 2.6.19 development cycle.  <= 2.6.18 in
         that case does not do a restricted allocation and blindly trusts the
         page allocator to have given us memory from the indicated node.  It
         inserts the page regardless of the node it came from into the queues for
         the current node.
      
      2. If kmalloc_node() is used on a node that has not been bootstrapped
         yet then we may try to pass an invalid node number to
         ____cache_alloc_node() triggering a BUG().
      
         Change the function to call fallback_alloc() instead.  Only call
         fallback_alloc() if we are allowed to fallback at all.  The need to
         handle a node not bootstrapped yet also first surfaced in the 2.6.19
         cycle.
      
      Update the comments since they were still describing the old kmalloc_node
      from 2.6.12.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5bcd234d
    • Andrew Morton's avatar
      [PATCH] slab: deprecate kmem_cache_t · 1b1cec4b
      Andrew Morton authored
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1b1cec4b
    • Christoph Lameter's avatar
      [PATCH] slab: remove kmem_cache_t · e18b890b
      Christoph Lameter authored
      Replace all uses of kmem_cache_t with struct kmem_cache.
      
      The patch was generated using the following script:
      
      	#!/bin/sh
      	#
      	# Replace one string by another in all the kernel sources.
      	#
      
      	set -e
      
      	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
      		quilt add $file
      		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
      		mv /tmp/$$ $file
      		quilt refresh
      	done
      
      The script was run like this
      
      	sh replace kmem_cache_t "struct kmem_cache"
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e18b890b
    • Christoph Lameter's avatar
      [PATCH] slab: remove SLAB_DMA · 441e143e
      Christoph Lameter authored
      SLAB_DMA is an alias of GFP_DMA. This is the last one so we
      remove the leftover comment too.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      441e143e
    • Christoph Lameter's avatar
      [PATCH] slab: remove SLAB_KERNEL · e94b1766
      Christoph Lameter authored
      SLAB_KERNEL is an alias of GFP_KERNEL.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e94b1766
    • Christoph Lameter's avatar
      [PATCH] slab: remove SLAB_ATOMIC · 54e6ecb2
      Christoph Lameter authored
      SLAB_ATOMIC is an alias of GFP_ATOMIC
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      54e6ecb2
    • Christoph Lameter's avatar
      [PATCH] slab: remove SLAB_USER · f7267c0c
      Christoph Lameter authored
      SLAB_USER is an alias of GFP_USER
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f7267c0c
    • Christoph Lameter's avatar
      [PATCH] slab: remove SLAB_NOFS · e6b4f8da
      Christoph Lameter authored
      SLAB_NOFS is an alias of GFP_NOFS.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e6b4f8da
    • Christoph Lameter's avatar
      [PATCH] slab: remove SLAB_NOIO · 55acbda0
      Christoph Lameter authored
      SLAB_NOIO is an alias of GFP_NOIO with a single instance of use.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      55acbda0