1. 02 Sep, 2009 2 commits
    • Mel Gorman's avatar
      When round-robin freeing pages from the PCP lists, empty lists may be · b4712e0f
      Mel Gorman authored
      encountered.  In the event one of the lists has more pages than another,
      there may be numerous checks for list_empty() which is undesirable.  This
      patch maintains a count of pages to free which is incremented when empty
      lists are encountered.  The intention is that more pages will then be
      freed from fuller lists than the empty ones reducing the number of empty
      list checks in the free path.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Cc:  Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b4712e0f
    • Mel Gorman's avatar
      The following two patches remove searching in the page allocator fast-path · aea38c74
      Mel Gorman authored
      by maintaining multiple free-lists in the per-cpu structure.  At the time
      the search was introduced, increasing the per-cpu structures would waste a
      lot of memory as per-cpu structures were statically allocated at
      compile-time.  This is no longer the case.
      
      The patches are as follows. They are based on mmotm-2009-08-27.
      
      Patch 1 adds multiple lists to struct per_cpu_pages, one per
      	migratetype that can be stored on the PCP lists.
      
      Patch 2 notes that the pcpu drain path check empty lists multiple times. The
      	patch reduces the number of checks by maintaining a count of free
      	lists encountered. Lists containing pages will then free multiple
      	pages in batch
      
      The patches were tested with kernbench, netperf udp/tcp, hackbench and
      sysbench.  The netperf tests were not bound to any CPU in particular and
      were run such that the results should be 99% confidence that the reported
      results are within 1% of the estimated mean.  sysbench was run with a
      postgres background and read-only tests.  Similar to netperf, it was run
      multiple times so that it's 99% confidence results are within 1%.  The
      patches were tested on x86, x86-64 and ppc64 as
      
      x86:	Intel Pentium D 3GHz with 8G RAM (no-brand machine)
      	kernbench	- No significant difference, variance well within noise
      	netperf-udp	- 1.34% to 2.28% gain
      	netperf-tcp	- 0.45% to 1.22% gain
      	hackbench	- Small variances, very close to noise
      	sysbench	- Very small gains
      
      x86-64:	AMD Phenom 9950 1.3GHz with 8G RAM (no-brand machine)
      	kernbench	- No significant difference, variance well within noise
      	netperf-udp	- 1.83% to 10.42% gains
      	netperf-tcp	- No conclusive until buffer >= PAGE_SIZE
      				4096	+15.83%
      				8192	+ 0.34% (not significant)
      				16384	+ 1%
      	hackbench	- Small gains, very close to noise
      	sysbench	- 0.79% to 1.6% gain
      
      ppc64:	PPC970MP 2.5GHz with 10GB RAM (it's a terrasoft powerstation)
      	kernbench	- No significant difference, variance well within noise
      	netperf-udp	- 2-3% gain for almost all buffer sizes tested
      	netperf-tcp	- losses on small buffers, gains on larger buffers
      			  possibly indicates some bad caching effect.
      	hackbench	- No significant difference
      	sysbench	- 2-4% gain
      
      
      
      This patch:
      
      Currently the per-cpu page allocator searches the PCP list for pages of
      the correct migrate-type to reduce the possibility of pages being
      inappropriate placed from a fragmentation perspective.  This search is
      potentially expensive in a fast-path and undesirable.  Splitting the
      per-cpu list into multiple lists increases the size of a per-cpu structure
      and this was potentially a major problem at the time the search was
      introduced.  These problem has been mitigated as now only the necessary
      number of structures is allocated for the running system.
      
      This patch replaces a list search in the per-cpu allocator with one list
      per migrate type.  The potential snag with this approach is when bulk
      freeing pages.  We round-robin free pages based on migrate type which has
      little bearing on the cache hotness of the page and potentially checks
      empty lists repeatedly in the event the majority of PCP pages are of one
      type.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      aea38c74
  2. 27 Aug, 2009 5 commits
  3. 10 Sep, 2009 2 commits
    • Eric B Munson's avatar
      Add a flag for mmap that will be used to request a huge page region that · dbfec8d1
      Eric B Munson authored
      will look like anonymous memory to userspace.  This is accomplished by
      using a file on the internal vfsmount.  MAP_HUGETLB is a modifier of
      MAP_ANONYMOUS and so must be specified with it.  The region will behave
      the same as a MAP_ANONYMOUS region using small pages.
      Signed-off-by: default avatarEric B Munson <ebmunson@us.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dbfec8d1
    • Eric B Munson's avatar
      This patchset adds a flag to mmap that allows the user to request that an · 17ad0eea
      Eric B Munson authored
      anonymous mapping be backed with huge pages.  This mapping will borrow
      functionality from the huge page shm code to create a file on the kernel
      internal mount and use it to approximate an anonymous mapping.  The
      MAP_HUGETLB flag is a modifier to MAP_ANONYMOUS and will not work without
      both flags being preset.
      
      A new flag is necessary because there is no other way to hook into huge
      pages without creating a file on a hugetlbfs mount which wouldn't be
      MAP_ANONYMOUS.
      
      To userspace, this mapping will behave just like an anonymous mapping
      because the file is not accessible outside of the kernel.
      
      This patchset is meant to simplify the programming model.  Presently there
      is a large chunk of boiler platecode, contained in libhugetlbfs, required
      to create private, hugepage backed mappings.  This patch set would allow
      use of hugepages without linking to libhugetlbfs or having hugetblfs
      mounted.
      
      Unification of the VM code would provide these same benefits, but it has
      been resisted each time that it has been suggested for several reasons: it
      would break PAGE_SIZE assumptions across the kernel, it makes page-table
      abstractions really expensive, and it does not provide any benefit on
      architectures that do not support huge pages, incurring fast path
      penalties without providing any benefit on these architectures.
      
      
      
      This patch:
      
      There are two means of creating mappings backed by huge pages:
      
              1. mmap() a file created on hugetlbfs
              2. Use shm which creates a file on an internal mount which essentially
                 maps it MAP_SHARED
      
      The internal mount is only used for shared mappings but there is very
      little that stops it being used for private mappings. This patch extends
      hugetlbfs_file_setup() to deal with the creation of files that will be
      mapped MAP_PRIVATE on the internal hugetlbfs mount. This extended API is
      used in a subsequent patch to implement the MAP_HUGETLB mmap() flag.
      Signed-off-by: default avatarEric Munson <ebmunson@us.ibm.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      17ad0eea
  4. 25 Aug, 2009 1 commit
  5. 22 Aug, 2009 2 commits
  6. 21 Aug, 2009 1 commit
  7. 20 Aug, 2009 5 commits
    • Jan Beulich's avatar
      This is being done by allowing boot time allocations to specify that they · 70e289a9
      Jan Beulich authored
      may want a sub-page sized amount of memory.
      
      Overall this seems more consistent with the other hash table allocations,
      and allows making two supposedly mm-only variables really mm-only
      (nr_{kernel,all}_pages).
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      70e289a9
    • Jan Beulich's avatar
      Since alloc_bootmem() will never return inaccessible (via virtual · 9913a2c6
      Jan Beulich authored
      addressing) memory anyway, using the ..._low() variant only makes sense
      when the physical address range of the allocated memory must fulfill
      further constraints, espacially since on 64-bits (or more generally in all
      cases where the pools the two variants allocate from are than the full
      available range.
      
      Probably the use in alloc_tce_table() could also be eliminated (based on
      code inspection of pci-calgary_64.c), but that seems too risky given I
      know nothing about that hardware and have no way to test it.
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9913a2c6
    • Jan Beulich's avatar
      Sizing of memory allocations shouldn't depend on the number of physical · 0fad9958
      Jan Beulich authored
      pages found in a system, as that generally includes (perhaps a huge amount
      of) non-RAM pages.  The amount of what actually is usable as storage
      should instead be used as a basis here.
      
      Some of the calculations (i.e.  those not intending to use high memory)
      should likely even use (totalram_pages - totalhigh_pages).
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Dave Airlie <airlied@linux.ie>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0fad9958
    • Jan Beulich's avatar
      Sizing of memory allocations shouldn't depend on the number of physical · efabf0cf
      Jan Beulich authored
      pages found in a system, as that generally includes (perhaps a huge amount
      of) non-RAM pages.  The amount of what actually is usable as storage
      should instead be used as a basis here.
      
      In line with that, the memory hotplug code should update num_physpages in
      a way that it retains its original (post-boot) meaning; in particular,
      decreasing the value should at best be done with great care - this patch
      doesn't try to ever decrease this value at all as it doesn't really seem
      meaningful to do so.
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      efabf0cf
    • Mel Gorman's avatar
      After anti-fragmentation was merged, a bug was reported whereby devices · 369befaf
      Mel Gorman authored
      that depended on high-order atomic allocations were failing.  The solution
      was to preserve a property in the buddy allocator which tended to keep the
      minimum number of free pages in the zone at the lower physical addresses
      and contiguous.  To preserve this property, MIGRATE_RESERVE was introduced
      and a number of pageblocks at the start of a zone would be marked
      "reserve", the number of which depended on min_free_kbytes.
      
      Anti-fragmentation works by avoiding the mixing of page migratetypes
      within the same pageblock.  One way of helping this is to increase
      min_free_kbytes because it becomes less like that it will be necessary to
      place pages of of MIGRATE_RESERVE is unbounded, the free memory is kept
      there in large contiguous blocks instead of helping anti-fragmentation as
      much as it should.  With the page-allocator tracepoint patches applied, it
      was found during anti-fragmentation tests that the number of
      fragmentation-related events were far higher than expected even with
      min_free_kbytes at higher values.
      
      This patch limits the number of MIGRATE_RESERVE blocks that exist per zone
      to two.  For example, with a sufficient min_free_kbytes, 4MB of memory
      will be kept aside on an x86-64 and remain more or less free and
      contiguous for the systems uptime.  This should be sufficient for devices
      depending on high-order atomic allocations while helping fragmentation
      control when min_free_kbytes is tuned appropriately.  As side-effect of
      this patch is that the reserve variable is converted to int as unsigned
      long was the wrong type to use when ensuring that only the required number
      of reserve blocks are created.
      
      With the patches applied, fragmentation-related events as measured by the
      page allocator tracepoints were significantly reduced when running some
      fragmentation stress-tests on systems with min_free_kbytes tuned to a
      value appropriate for hugepage allocations at runtime.  On x86, the events
      recorded were reduced by 99.8%, on x86-64 by 99.72% and on ppc64 by
      99.83%.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      369befaf
  8. 13 Aug, 2009 2 commits
  9. 24 Aug, 2009 1 commit
  10. 27 Aug, 2009 1 commit
  11. 24 Aug, 2009 2 commits
  12. 11 Aug, 2009 2 commits
  13. 12 Aug, 2009 2 commits
  14. 18 Aug, 2009 1 commit
  15. 24 Aug, 2009 1 commit
  16. 12 Aug, 2009 1 commit
    • Daisuke Nishimura's avatar
      After commit 355cfa73 ("mm: modify swap_map and add SWAP_HAS_CACHE flag"), · f80362c0
      Daisuke Nishimura authored
      read_swap_cache_async() will busy-wait while a entry doesn't exist in swap
      cache but it has SWAP_HAS_CACHE flag.
      
      Such entries can exist on add/delete path of swap cache.  On add path,
      add_to_swap_cache() is called soon after SWAP_HAS_CACHE flag is set, and
      on delete path, swapcache_free() will be called (SWAP_HAS_CACHE flag is
      cleared) soon after __delete_from_swap_cache() is called.  So, the
      busy-wait works well in most cases.
      
      But this mechanism can cause soft lockup if add_to_swap_cache() sleeps and
      read_swap_cache_async() tries to swap-in the same entry on the same cpu.
      
      This patch calls radix_tree_preload() before swapcache_prepare() and
      divides add_to_swap_cache() into two part: radix_tree_preload() part and
      radix_tree_insert() part(define it as __add_to_swap_cache()).
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f80362c0
  17. 11 Aug, 2009 3 commits
    • Mel Gorman's avatar
      Knowing tracepoints exist is not quite the same as knowing what they · 0aba8dc8
      Mel Gorman authored
      should be used for.  This patch adds a document giving a basic description
      of the kmem tracepoints and why they might be useful to a performance
      analyst.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Ming Chun <macli@brc.ubc.ca>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0aba8dc8
    • Mel Gorman's avatar
      The documentation for ftrace, events and tracepoints is pretty extensive. · 30e9dec2
      Mel Gorman authored
      Similarly, the perf PCL tools help files --help are there and the code
      simple enough to figure out what much of the switches mean.  However,
      pulling the discrete bits and pieces together and translating that into
      "how do I solve a problem" requires a fair amount of imagination.
      
      This patch adds a simple document intended to get someone started on the
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Ming Chun <macli@brc.ubc.ca>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      30e9dec2
    • Mel Gorman's avatar
      This patch adds a simple post-processing script for the · 2e607c90
      Mel Gorman authored
      page-allocator-related trace events.  It can be used to give an indication
      of who the most allocator-intensive processes are and how often the zone
      lock was taken during the tracing period.  Example output looks like
      
      Process                   Pages      Pages      Pages    Pages       PCPU     PCPU     PCPU   Fragment Fragment  MigType Fragment Fragment  Unknown
      details                  allocd     allocd      freed    freed      pages   drains  refills   Fallback  Causing  Changed   Severe Moderate
                                      under lock     direct  pagevec      drain
      swapper-0                     0          0          2        0          0        0        0          0        0        0        0        0        0
      Xorg-3770                 10603       5952       3685     6978       5996      194      192          0        0        0        0        0        0
      modprobe-21397               51          0          0       86         31        1        0          0        0        0        0        0        0
      xchat-5370                  228         93          0        0          0        0        3          0        0        0        0        0        0
      awesome-4317                 32         32          0        0          0        0       32          0        0        0        0        0        0
      thinkfan-3863                 2          0          1        1          0        0        0          0        0        0        0        0        0
      hald-addon-stor-3935          2          0          0        0          0        0        0          0        0        0        0        0        0
      akregator-4506                1          1          0        0          0        0        1          0        0        0        0        0        0
      xmms-14888                    0          0          1        0          0        0        0          0        0        0        0        0        0
      khelper-12                    1          0          0        0          0        0        0          0        0        0        0        0        0
      
      Optionally, the output can include information on the parent or aggregate
      based on process name instead of aggregating based on each pid. Example output
      including parent information and stripped out the PID looks something like;
      
      Process                        Pages      Pages      Pages    Pages       PCPU     PCPU     PCPU   Fragment Fragment  MigType Fragment Fragment  Unknown
      details                       allocd     allocd      freed    freed      pages   drains  refills   Fallback  Causing  Changed   Severe Moderate
                                           under lock     direct  pagevec      drain
      gdm-3756 :: Xorg-3770           3796       2976         99     3813       3224      104       98          0        0        0        0        0        0
      init-1 :: hald-3892                1          0          0        0          0        0        0          0        0        0        0        0        0
      git-21447 :: editor-21448          4          0          4        0          0        0        0          0        0        0        0        0        0
      
      This says that Xorg allocated 3796 pages and it's parent process is gdm
      with a PID of 3756;
      
      The postprocessor parses the text output of tracing.  While there is a
      binary format, the expectation is that the binary output can be readily
      translated into text and post-processed offline.  Obviously if the text
      format changes, the parser will break but the regular expression parser is
      fairly rudimentary so should be readily adjustable.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Ming Chun <macli@brc.ubc.ca>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2e607c90
  18. 13 Aug, 2009 1 commit
    • Andrew Morton's avatar
      mm/page_alloc.c: In function 'free_pages_bulk': · 2d04241d
      Andrew Morton authored
      mm/page_alloc.c:549: error: implicit declaration of function 'trace_mm_page_pcpu_drain'
      mm/page_alloc.c: In function '__rmqueue_fallback':
      mm/page_alloc.c:879: error: implicit declaration of function 'trace_mm_page_alloc_extfrag'
      mm/page_alloc.c: In function '__rmqueue':
      mm/page_alloc.c:915: error: implicit declaration of function 'trace_mm_page_alloc_zone_locked'
      mm/page_alloc.c: In function 'free_hot_page':
      mm/page_alloc.c:1106: error: implicit declaration of function 'trace_mm_page_free_direct'
      mm/page_alloc.c: In function '__alloc_pages_nodemask':
      mm/page_alloc.c:1951: error: implicit declaration of function 'trace_mm_page_alloc'
      mm/page_alloc.c: In function '__pagevec_free':
      mm/page_alloc.c:1987: error: implicit declaration of function 'trace_mm_pagevec_free'
      
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Li Ming Chun <macli@brc.ubc.ca>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2d04241d
  19. 11 Aug, 2009 3 commits
    • Mel Gorman's avatar
      The page allocation trace event reports that a page was successfully · de4c81a5
      Mel Gorman authored
      allocated but it does not specify where it came from.  When analysing
      performance, it can be important to distinguish between pages coming from
      the per-cpu allocator and pages coming from the buddy lists as the latter
      requires the zone lock to the taken and more data structures to be
      examined.
      
      This patch adds a trace event for __rmqueue reporting when a page is being
      allocated from the buddy lists.  It distinguishes between being called to
      refill the per-cpu lists or whether it is a high-order allocation. 
      Similarly, this patch adds an event to catch when the PCP lists are being
      drained a little and pages are going back to the buddy lists.
      
      This is trickier to draw conclusions from but high activity on those
      events could explain why there were a large number of cache misses on a
      page-allocator-intensive workload.  The coalescing and splitting of
      buddies involves a lot of writing of page metadata and cache line bounces
      not to mention the acquisition of an interrupt-safe lock necessary to
      enter this path.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Ming Chun <macli@brc.ubc.ca>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      de4c81a5
    • Mel Gorman's avatar
      Fragmentation avoidance depends on being able to use free pages from lists · 3a561180
      Mel Gorman authored
      of the appropriate migrate type.  In the event this is not possible,
      __rmqueue_fallback() selects a different list and in some circumstances
      change the migratetype of the pageblock.  Simplistically, the more times
      this event occurs, the more likely that fragmentation will be a problem
      later for hugepage allocation at least but there are other considerations
      such as the order of page being split to satisfy the allocation.
      
      This patch adds a trace event for __rmqueue_fallback() that reports what
      page is being used for the fallback, the orders of relevant pages, the
      desired migratetype and the migratetype of the lists being used, whether
      the pageblock changed type and whether this event is important with
      respect to fragmentation avoidance or not.  This information can be used
      to help analyse fragmentation avoidance and help decide whether
      min_free_kbytes should be increased or not.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Ming Chun <macli@brc.ubc.ca>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3a561180
    • Mel Gorman's avatar
      This patch adds trace events for the allocation and freeing of pages, · ec2ebbd0
      Mel Gorman authored
      including the freeing of pagevecs.  Using the events, it will be known
      what struct page and pfns are being allocated and freed and what the call
      site was in many cases.
      
      The page alloc tracepoints be used as an indicator as to whether the
      workload was heavily dependant on the page allocator or not.  You can make
      a guess based on vmstat but you can't get a per-process breakdown. 
      Depending on the call path, the call_site for page allocation may be
      __get_free_pages() instead of a useful callsite.  Instead of passing down
      a return address similar to slab debugging, the user should enable the
      stacktrace and seg-addr options to get a proper stack trace.
      
      The pagevec free tracepoint has a different usecase.  It can be used to
      get a idea of how many pages are being dumped off the LRU and whether it
      is kswapd doing the work or a process doing direct reclaim.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Ming Chun <macli@brc.ubc.ca>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ec2ebbd0
  20. 24 Aug, 2009 1 commit
  21. 04 Aug, 2009 1 commit
    • Andrew Morton's avatar
      ERROR: code indent should use tabs where possible · b2e67f8a
      Andrew Morton authored
      #219: FILE: arch/s390/mm/init.c:108:
      +                nr_free_pages() << (PAGE_SHIFT-10),$
      
      total: 1 errors, 0 warnings, 162 lines checked
      
      ./patches/arches-drop-superfluous-casts-in-nr_free_pages-callers.patch has style problems, please review.  If any of these errors
      are false positives report them to the maintainer, see
      CHECKPATCH in MAINTAINERS.
      
      Please run checkpatch prior to sending patches
      
      Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b2e67f8a