1. 13 Aug, 2009 2 commits
  2. 24 Aug, 2009 1 commit
  3. 27 Aug, 2009 1 commit
  4. 24 Aug, 2009 2 commits
  5. 11 Aug, 2009 2 commits
  6. 12 Aug, 2009 2 commits
  7. 18 Aug, 2009 1 commit
  8. 24 Aug, 2009 1 commit
  9. 12 Aug, 2009 1 commit
    • Daisuke Nishimura's avatar
      After commit 355cfa73 ("mm: modify swap_map and add SWAP_HAS_CACHE flag"), · f80362c0
      Daisuke Nishimura authored
      read_swap_cache_async() will busy-wait while a entry doesn't exist in swap
      cache but it has SWAP_HAS_CACHE flag.
      
      Such entries can exist on add/delete path of swap cache.  On add path,
      add_to_swap_cache() is called soon after SWAP_HAS_CACHE flag is set, and
      on delete path, swapcache_free() will be called (SWAP_HAS_CACHE flag is
      cleared) soon after __delete_from_swap_cache() is called.  So, the
      busy-wait works well in most cases.
      
      But this mechanism can cause soft lockup if add_to_swap_cache() sleeps and
      read_swap_cache_async() tries to swap-in the same entry on the same cpu.
      
      This patch calls radix_tree_preload() before swapcache_prepare() and
      divides add_to_swap_cache() into two part: radix_tree_preload() part and
      radix_tree_insert() part(define it as __add_to_swap_cache()).
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f80362c0
  10. 11 Aug, 2009 3 commits
    • Mel Gorman's avatar
      Knowing tracepoints exist is not quite the same as knowing what they · 0aba8dc8
      Mel Gorman authored
      should be used for.  This patch adds a document giving a basic description
      of the kmem tracepoints and why they might be useful to a performance
      analyst.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Ming Chun <macli@brc.ubc.ca>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0aba8dc8
    • Mel Gorman's avatar
      The documentation for ftrace, events and tracepoints is pretty extensive. · 30e9dec2
      Mel Gorman authored
      Similarly, the perf PCL tools help files --help are there and the code
      simple enough to figure out what much of the switches mean.  However,
      pulling the discrete bits and pieces together and translating that into
      "how do I solve a problem" requires a fair amount of imagination.
      
      This patch adds a simple document intended to get someone started on the
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Ming Chun <macli@brc.ubc.ca>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      30e9dec2
    • Mel Gorman's avatar
      This patch adds a simple post-processing script for the · 2e607c90
      Mel Gorman authored
      page-allocator-related trace events.  It can be used to give an indication
      of who the most allocator-intensive processes are and how often the zone
      lock was taken during the tracing period.  Example output looks like
      
      Process                   Pages      Pages      Pages    Pages       PCPU     PCPU     PCPU   Fragment Fragment  MigType Fragment Fragment  Unknown
      details                  allocd     allocd      freed    freed      pages   drains  refills   Fallback  Causing  Changed   Severe Moderate
                                      under lock     direct  pagevec      drain
      swapper-0                     0          0          2        0          0        0        0          0        0        0        0        0        0
      Xorg-3770                 10603       5952       3685     6978       5996      194      192          0        0        0        0        0        0
      modprobe-21397               51          0          0       86         31        1        0          0        0        0        0        0        0
      xchat-5370                  228         93          0        0          0        0        3          0        0        0        0        0        0
      awesome-4317                 32         32          0        0          0        0       32          0        0        0        0        0        0
      thinkfan-3863                 2          0          1        1          0        0        0          0        0        0        0        0        0
      hald-addon-stor-3935          2          0          0        0          0        0        0          0        0        0        0        0        0
      akregator-4506                1          1          0        0          0        0        1          0        0        0        0        0        0
      xmms-14888                    0          0          1        0          0        0        0          0        0        0        0        0        0
      khelper-12                    1          0          0        0          0        0        0          0        0        0        0        0        0
      
      Optionally, the output can include information on the parent or aggregate
      based on process name instead of aggregating based on each pid. Example output
      including parent information and stripped out the PID looks something like;
      
      Process                        Pages      Pages      Pages    Pages       PCPU     PCPU     PCPU   Fragment Fragment  MigType Fragment Fragment  Unknown
      details                       allocd     allocd      freed    freed      pages   drains  refills   Fallback  Causing  Changed   Severe Moderate
                                           under lock     direct  pagevec      drain
      gdm-3756 :: Xorg-3770           3796       2976         99     3813       3224      104       98          0        0        0        0        0        0
      init-1 :: hald-3892                1          0          0        0          0        0        0          0        0        0        0        0        0
      git-21447 :: editor-21448          4          0          4        0          0        0        0          0        0        0        0        0        0
      
      This says that Xorg allocated 3796 pages and it's parent process is gdm
      with a PID of 3756;
      
      The postprocessor parses the text output of tracing.  While there is a
      binary format, the expectation is that the binary output can be readily
      translated into text and post-processed offline.  Obviously if the text
      format changes, the parser will break but the regular expression parser is
      fairly rudimentary so should be readily adjustable.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Ming Chun <macli@brc.ubc.ca>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2e607c90
  11. 13 Aug, 2009 1 commit
    • Andrew Morton's avatar
      mm/page_alloc.c: In function 'free_pages_bulk': · 2d04241d
      Andrew Morton authored
      mm/page_alloc.c:549: error: implicit declaration of function 'trace_mm_page_pcpu_drain'
      mm/page_alloc.c: In function '__rmqueue_fallback':
      mm/page_alloc.c:879: error: implicit declaration of function 'trace_mm_page_alloc_extfrag'
      mm/page_alloc.c: In function '__rmqueue':
      mm/page_alloc.c:915: error: implicit declaration of function 'trace_mm_page_alloc_zone_locked'
      mm/page_alloc.c: In function 'free_hot_page':
      mm/page_alloc.c:1106: error: implicit declaration of function 'trace_mm_page_free_direct'
      mm/page_alloc.c: In function '__alloc_pages_nodemask':
      mm/page_alloc.c:1951: error: implicit declaration of function 'trace_mm_page_alloc'
      mm/page_alloc.c: In function '__pagevec_free':
      mm/page_alloc.c:1987: error: implicit declaration of function 'trace_mm_pagevec_free'
      
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Li Ming Chun <macli@brc.ubc.ca>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2d04241d
  12. 11 Aug, 2009 3 commits
    • Mel Gorman's avatar
      The page allocation trace event reports that a page was successfully · de4c81a5
      Mel Gorman authored
      allocated but it does not specify where it came from.  When analysing
      performance, it can be important to distinguish between pages coming from
      the per-cpu allocator and pages coming from the buddy lists as the latter
      requires the zone lock to the taken and more data structures to be
      examined.
      
      This patch adds a trace event for __rmqueue reporting when a page is being
      allocated from the buddy lists.  It distinguishes between being called to
      refill the per-cpu lists or whether it is a high-order allocation. 
      Similarly, this patch adds an event to catch when the PCP lists are being
      drained a little and pages are going back to the buddy lists.
      
      This is trickier to draw conclusions from but high activity on those
      events could explain why there were a large number of cache misses on a
      page-allocator-intensive workload.  The coalescing and splitting of
      buddies involves a lot of writing of page metadata and cache line bounces
      not to mention the acquisition of an interrupt-safe lock necessary to
      enter this path.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Ming Chun <macli@brc.ubc.ca>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      de4c81a5
    • Mel Gorman's avatar
      Fragmentation avoidance depends on being able to use free pages from lists · 3a561180
      Mel Gorman authored
      of the appropriate migrate type.  In the event this is not possible,
      __rmqueue_fallback() selects a different list and in some circumstances
      change the migratetype of the pageblock.  Simplistically, the more times
      this event occurs, the more likely that fragmentation will be a problem
      later for hugepage allocation at least but there are other considerations
      such as the order of page being split to satisfy the allocation.
      
      This patch adds a trace event for __rmqueue_fallback() that reports what
      page is being used for the fallback, the orders of relevant pages, the
      desired migratetype and the migratetype of the lists being used, whether
      the pageblock changed type and whether this event is important with
      respect to fragmentation avoidance or not.  This information can be used
      to help analyse fragmentation avoidance and help decide whether
      min_free_kbytes should be increased or not.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Ming Chun <macli@brc.ubc.ca>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3a561180
    • Mel Gorman's avatar
      This patch adds trace events for the allocation and freeing of pages, · ec2ebbd0
      Mel Gorman authored
      including the freeing of pagevecs.  Using the events, it will be known
      what struct page and pfns are being allocated and freed and what the call
      site was in many cases.
      
      The page alloc tracepoints be used as an indicator as to whether the
      workload was heavily dependant on the page allocator or not.  You can make
      a guess based on vmstat but you can't get a per-process breakdown. 
      Depending on the call path, the call_site for page allocation may be
      __get_free_pages() instead of a useful callsite.  Instead of passing down
      a return address similar to slab debugging, the user should enable the
      stacktrace and seg-addr options to get a proper stack trace.
      
      The pagevec free tracepoint has a different usecase.  It can be used to
      get a idea of how many pages are being dumped off the LRU and whether it
      is kswapd doing the work or a process doing direct reclaim.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Ming Chun <macli@brc.ubc.ca>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ec2ebbd0
  13. 24 Aug, 2009 1 commit
  14. 04 Aug, 2009 1 commit
    • Andrew Morton's avatar
      ERROR: code indent should use tabs where possible · b2e67f8a
      Andrew Morton authored
      #219: FILE: arch/s390/mm/init.c:108:
      +                nr_free_pages() << (PAGE_SHIFT-10),$
      
      total: 1 errors, 0 warnings, 162 lines checked
      
      ./patches/arches-drop-superfluous-casts-in-nr_free_pages-callers.patch has style problems, please review.  If any of these errors
      are false positives report them to the maintainer, see
      CHECKPATCH in MAINTAINERS.
      
      Please run checkpatch prior to sending patches
      
      Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b2e67f8a
  15. 24 Aug, 2009 1 commit
  16. 04 Aug, 2009 1 commit
  17. 24 Aug, 2009 2 commits
  18. 09 Sep, 2009 1 commit
  19. 24 Aug, 2009 1 commit
  20. 10 Sep, 2009 1 commit
    • Mel Gorman's avatar
      Calculate the number of pageblocks within a range properly · 4553616e
      Mel Gorman authored
      Patch
      page-allocator-change-migratetype-for-all-pageblocks-within-a-high-order-page-during-__rmqueue_fallback
      is meant to change the pageblock ownership of each pageblock within a
      given range. This is necessary when the buddy to be split is of higher
      order than the pageblock_order. However, the calculation was wrong
      leading to crashes on ia-64 and slightly incorrect behaviour on x86.
      This patch corrects the calculation.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4553616e
  21. 24 Aug, 2009 4 commits
  22. 10 Sep, 2009 3 commits
  23. 24 Aug, 2009 1 commit
  24. 04 Aug, 2009 2 commits
  25. 03 Sep, 2009 1 commit
    • Andrea Arcangeli's avatar
      Rawhide users have reported hang at startup when cryptsetup is run: the · 93c93e98
      Andrea Arcangeli authored
      same problem can be simply reproduced by running a program int main() {
      mlockall(MCL_CURRENT | MCL_FUTURE); return 0; }
      
      The problem is that exit_mmap() applies munlock_vma_pages_all() to
      clean up VM_LOCKED areas, and its current implementation (stupidly)
      tries to fault in absent pages, for example where PROT_NONE prevented
      them being faulted in when mlocking.  Whereas the "ksm: fix oom
      deadlock" patch, knowing there's a race by which KSM might try to fault
      in pages after exit_mmap() had finally zapped the range, backs out of
      such faults doing nothing when its ksm_test_exit() notices mm_users 0.
      
      So revert that part of "ksm: fix oom deadlock" which moved the
      ksm_exit() call from before exit_mmap() to the middle of exit_mmap();
      and remove those ksm_test_exit() checks from the page fault paths, so
      allowing the munlocking to proceed without interference.
      
      ksm_exit, if there are rmap_items still chained on this mm slot, takes
      mmap_sem write side: so preventing KSM from working on an mm while
      exit_mmap runs.  And KSM will bail out as soon as it notices that
      mm_users is already zero, thanks to its internal ksm_test_exit checks. 
      So that when a task is killed by OOM killer or the user, KSM will not
      indefinitely prevent it from running exit_mmap to release its memory.
      
      This does break a part of what "ksm: fix oom deadlock" was trying to
      achieve.  When unmerging KSM (echo 2 >/sys/kernel/mm/ksm), and even
      when ksmd itself has to cancel a KSM page, it is possible that the
      first OOM-kill victim would be the KSM process being faulted: then its
      memory won't be freed until a second victim has been selected (freeing
      memory for the unmerging fault to complete).
      
      But the OOM killer is already liable to kill a second victim once the
      intended victim's p->mm goes to NULL: so there's not much point in
      rejecting this KSM patch before fixing that OOM behaviour.  It is very
      much more important to allow KSM users to boot up, than to haggle over
      an unlikely and poorly supported OOM case.
      
      We also intend to fix munlocking to not fault pages: at which point
      this patch _could_ be reverted; though that would be controversial, so
      we hope to find a better solution.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarJustin M. Forbes <jforbes@redhat.com>
      Acked-for-now-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Izik Eidus <ieidus@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      93c93e98