1. 05 Jul, 2009 1 commit
  2. 04 Jul, 2009 1 commit
    • Eric Dumazet's avatar
      x86: atomic64: Inline atomic64_read() again · a79f0da8
      Eric Dumazet authored
      Now atomic64_read() is light weight (no register pressure and
      small icache), we can inline it again.
      
      Also use "=&A" constraint instead of "+A" to avoid warning
      about unitialized 'res' variable. (gcc had to force 0 in eax/edx)
      
        $ size vmlinux.prev vmlinux.after
           text    data     bss     dec     hex filename
        4908667  451676 1684868 7045211  6b805b vmlinux.prev
        4908651  451676 1684868 7045195  6b804b vmlinux.after
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <4A4E1AA2.30002@gmail.com>
      [ Also fix typo in atomic64_set() export ]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a79f0da8
  3. 03 Jul, 2009 15 commits
    • Ingo Molnar's avatar
      x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative() · ddf9a003
      Ingo Molnar authored
      Linus noticed that the variable name 'old_val' is
      confusingly named in these functions - the correct
      naming is 'new_val'.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907030942260.3210@localhost.localdomain>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ddf9a003
    • Ingo Molnar's avatar
      x86: atomic64: Improve atomic64_xchg() · 3a8d1788
      Ingo Molnar authored
      Remove the read-first logic from atomic64_xchg() and simplify
      the loop.
      
      This function was the last user of __atomic64_read() - remove it.
      
      Also, change the 'real_val' assumption from the somewhat quirky
      1ULL << 32 value to the (just as arbitrary, but simpler) value
      of 0.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <tip-05118ab8859492ac9ddda0154cf90e37b0a4a0b0@git.kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3a8d1788
    • Ingo Molnar's avatar
      x86: atomic64: Export APIs to modules · 1fde902d
      Ingo Molnar authored
      atomic64_t primitives are used by a handful of drivers,
      so export the APIs consistently. These were inlined
      before.
      
      Also mark atomic64_32.o a core object, so that the symbols
      are available even if not linked to core kernel pieces.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <tip-05118ab8859492ac9ddda0154cf90e37b0a4a0b0@git.kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1fde902d
    • Eric Dumazet's avatar
      x86: atomic64: Improve atomic64_read() · 67d7178f
      Eric Dumazet authored
      Optimize atomic64_read() as a special open-coded
      cmpxchg8b variant. This generates nicer code:
      
      arch/x86/lib/atomic64_32.o:
      
         text	   data	    bss	    dec	    hex	filename
          435	      0	      0	    435	    1b3	atomic64_32.o.before
          431	      0	      0	    431	    1af	atomic64_32.o.after
      
      md5:
         bd8ab95e69c93518578bfaf0ea3be4d9  atomic64_32.o.before.asm
         2bdfd4bd1f6b7b61b7fc127aef90ce3b  atomic64_32.o.after.asm
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      67d7178f
    • Paul Mackerras's avatar
      x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP · 8e049ef0
      Paul Mackerras authored
      Occasionally we get bugs where atomic_read or atomic_set are
      used on atomic64_t variables or vice versa.  These bugs don't
      generate warnings on x86 because atomic_read and atomic_set are
      coded as macros rather than C functions, so we don't get any
      type-checking on their arguments; similarly for atomic64_read
      and atomic64_set in 64-bit kernels.
      
      This converts them to C functions so that the arguments are
      type-checked and bugs like this will get caught more easily. It
      also converts atomic_cmpxchg and atomic_xchg, and
      atomic64_cmpxchg and atomic64_xchg on 64-bit, so we get
      type-checking on their arguments too.
      
      Compiling a typical 64-bit x86 config, this generates no new
      warnings, and the vmlinux text is 86 bytes smaller.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8e049ef0
    • Ingo Molnar's avatar
      x86: atomic64: Fix unclean type use in atomic64_xchg() · 199e2378
      Ingo Molnar authored
      Linus noticed that atomic64_xchg() uses atomic_read(), which
      happens to work because atomic_read() is a macro so the
      .counter value gets u64-read on 32-bit too - but this is really
      bogus and serious bugs are waiting to happen.
      
      Fix atomic64_xchg() to use __atomic64_read() instead.
      
      No code changed:
      
      arch/x86/lib/atomic64_32.o:
      
         text	   data	    bss	    dec	    hex	filename
          435	      0	      0	    435	    1b3	atomic64_32.o.before
          435	      0	      0	    435	    1b3	atomic64_32.o.after
      
      md5:
         bd8ab95e69c93518578bfaf0ea3be4d9  atomic64_32.o.before.asm
         bd8ab95e69c93518578bfaf0ea3be4d9  atomic64_32.o.after.asm
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      199e2378
    • Ingo Molnar's avatar
      x86: atomic64: Make atomic_read() type-safe · 32171208
      Ingo Molnar authored
      Linus noticed that atomic64_xchg() uses atomic_read(), which
      happens to work because atomic_read() is a macro so the
      .counter value gets u64-read on 32-bit too - but this is really
      bogus and serious bugs are waiting to happen.
      
      Change atomic_read() to be a type-safe inline, and this exposes
      the atomic64 bogosity as well:
      
        arch/x86/lib/atomic64_32.c: In function ‘atomic64_xchg’:
        arch/x86/lib/atomic64_32.c:39: warning: passing argument 1 of ‘atomic_read’ from incompatible pointer type
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      32171208
    • Ingo Molnar's avatar
      x86: atomic64: Reduce size of functions · 3ac805d2
      Ingo Molnar authored
      cmpxchg8b is a huge instruction in terms of register footprint,
      we almost never want to inline it, not even within the same
      code module.
      
      GCC 4.3 still messes up for two functions, under-judging the
      true cost of this instruction - so annotate two key functions
      to reduce the bloat:
      
      arch/x86/lib/atomic64_32.o:
      
         text	   data	    bss	    dec	    hex	filename
         1763	      0	      0	   1763	    6e3	atomic64_32.o.before
          435	      0	      0	    435	    1b3	atomic64_32.o.after
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3ac805d2
    • Ingo Molnar's avatar
      x86: atomic64: Improve atomic64_add_return() · 824975ef
      Ingo Molnar authored
      Linus noted (based on Eric Dumazet's numbers) that we would
      probably be better off not trying an atomic_read() in
      atomic64_add_return() but intead intentionally let the first
      cmpxchg8b fail - to get a cache-friendly 'give me ownership
      of this cacheline' transaction. That can then be followed
      by the real cmpxchg8b which sets the value local to the CPU.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      824975ef
    • Eric Dumazet's avatar
      x86: atomic64: Improve cmpxchg8b() · 69237f94
      Eric Dumazet authored
      Rewrite cmpxchg8b() to not use %edi register but a generic "+m"
      constraint, to increase compiler freedom in code generation and
      possibly better code.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      69237f94
    • Eric Dumazet's avatar
      x86: atomic64: Improve atomic64_read() · aacf682f
      Eric Dumazet authored
      Linus noticed that the 32-bit version of atomic64_read() was
      being overly complex with re-reading the value and doing a
      retry loop over that.
      
      Instead we can just rely on cmpxchg8b returning either the new
      value or returning the current value.
      
      We can use any 'old' value, which will be faster as it can be
      loaded via immediates. Using some value that is not equal to
      the real value in memory the instruction gets faster.
      
      This also has the advantage that the CPU could avoid dirtying
      the cacheline.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      aacf682f
    • Ingo Molnar's avatar
      x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file · b7882b7c
      Ingo Molnar authored
      Linus noted that the atomic64_t primitives are all inlines
      currently which is crazy because these functions have a large
      register footprint anyway.
      
      Move them to a separate file: arch/x86/lib/atomic64_32.c
      
      Also, while at it, rename all uses of 'unsigned long long' to
      the much shorter u64.
      
      This makes the appearance of the prototypes a lot nicer - and
      it also uncovered a few bugs where (yet unused) API variants
      had 'long' as their return type instead of u64.
      
      [ More intrusive changes are not yet done in this patch. ]
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b7882b7c
    • Eric Dumazet's avatar
      x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too · bbf2a330
      Eric Dumazet authored
      Locked instructions on two cache lines at once are painful. If
      atomic64_t uses two cache lines, my test program is 10x slower.
      
      The chance for that is significant: 4/32 or 12.5%.
      
      Make sure an atomic64_t is 8 bytes aligned.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      [ changed it to __aligned(8) as per Andrew's suggestion ]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      bbf2a330
    • Ingo Molnar's avatar
      perf report: Annotate variable initialization · 029e5b16
      Ingo Molnar authored
      Certain versions of GCC dont see the initialization that is done here:
      
        builtin-report.c: In function ‘__cmd_report’:
        builtin-report.c:1038: warning: ‘syms’ may be used uninitialized in this function
      
      So annotate it with a NULL initialization.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      029e5b16
    • Arnaldo Carvalho de Melo's avatar
      perf_counter tools: Adjust symbols in ET_EXEC files too · 30d7a77d
      Arnaldo Carvalho de Melo authored
      Ingo Molnar wrote:
      
      > i just bisected a 'perf report' bug that would cause us to not
      > resolve all user-space symbols in a 'git gc' run to:
      >
      > f5812a7a is first bad commit
      > commit f5812a7a
      > Author: Arnaldo Carvalho de Melo <acme@redhat.com>
      > Date:   Tue Jun 30 11:43:17 2009 -0300
      >
      >     perf_counter tools: Adjust only prelinked symbol's addresses
      
      Rename ->prelinked to ->adjust_symbols and making what was done
      only for prelinked libraries also to ET_EXEC binaries, such as
      /usr/bin/git:
      
      [acme@doppio pahole]$ readelf -h /usr/bin/git | grep Type
        Type:                              EXEC (Executable file)
      [acme@doppio pahole]$
      
      And after installing the 'git-debuginfo' package, I get correct results:
      
      [acme@doppio linux-2.6-tip]$ perf report --sort comm,dso,symbol -d /usr/bin/git | head -20
      
       #
       # (1139614 samples)
       #
       # Overhead           Command  Shared Object              Symbol
       # ........  ................  .........................  ......
       #
          34.98%               git  /usr/bin/git               [.] send_sideband
          33.39%               git  /usr/bin/git               [.] enter_repo
           6.81%               git  /usr/bin/git               [.] diff_opt_parse
           4.95%               git  /usr/bin/git               [.] is_repository_shallow
           3.24%               git  /usr/bin/git               [.] odb_mkstemp
           1.39%               git  /usr/bin/git               [.] output
           1.34%               git  /usr/bin/git               [.] xmmap
           1.25%               git  /usr/bin/git               [.] receive_pack_config
           1.16%               git  /usr/bin/git               [.] git_pathdup
           0.90%               git  /usr/bin/git               [.] read_object_with_reference
           0.86%               git  /usr/bin/git               [.] show_patch_diff
           0.85%               git  /usr/bin/git               0x00000000095e2e
           0.69%               git  /usr/bin/git               [.] display
      [acme@doppio linux-2.6-tip]$
      
      I'll check what are the last cases where we can't resolve symbols, like
      this 0x00000000095e2e later.
      
      And I guess this will fix the problems Mike were seeing too:
      
       [acme@doppio linux-2.6-tip]$ readelf -h ../build/perf/vmlinux | grep Type
         Type:                              EXEC (Executable file)
       [acme@doppio linux-2.6-tip]$
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      30d7a77d
  4. 02 Jul, 2009 10 commits
    • Frederic Weisbecker's avatar
      perf_counter tools: Display percents of hits in callchain with overhead colors · 24b57c69
      Frederic Weisbecker authored
      This adds the use of colors to signal at a glance the important
      overhead thresholds in callchains hit rates.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246558475-10624-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      24b57c69
    • Frederic Weisbecker's avatar
      perf_counter tools: Provide helper to print percents color · 1e11fd82
      Frederic Weisbecker authored
      Among perf annotate, perf report and perf top, we can find the
      common colored printing of percents according to the following
      rules:
      
          High overhead =  > 5%, colored in red
          Mid overhead =  > 0.5%, colored in green
          Low overhead =  < 0.5%, default color
      
      Factorize these multiple checks in a single function named
      percent_color_fprintf() and also provide a get_percent_color()
      for sites which print percentages and other things at the same
      time.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246558475-10624-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1e11fd82
    • Frederic Weisbecker's avatar
      perf_counter tools: Set the minimum percent for callchains to be displayed · c20ab37e
      Frederic Weisbecker authored
      Callchains output may become a burden on a trace because even
      rarely hit site are exposed. This can be too much information.
      
      Let the user set a threshold as a minimum percent of hits using
      the new pattern for the -c option:
      
          -c mode,min_percent
      
      Example:
      
      $ perf report -s sym -c flat,4
      
           8.25%  [k] copy_user_generic_string
                   4.19%
                      copy_user_generic_string
                      generic_file_aio_read
                      do_sync_read
                      vfs_read
                      sys_pread64
                      system_call_fastpath
                      pread64
      
           5.39%  [k] search_by_key
           4.63%  0x00000000009e0a
           2.36%  [k] memcpy_c
      [...]
      
      $ perf report -s sym -c graph,2
      
           8.25%  [k] copy_user_generic_string
                      |
                      |--4.31%-- generic_file_aio_read
                      |          do_sync_read
                      |          vfs_read
                      |          |
                      |           --4.19%-- sys_pread64
                      |                     system_call_fastpath
                      |                     pread64
                      |
                       --3.24%-- generic_file_buffered_write
                                 __generic_file_aio_write_nolock
                                 generic_file_aio_write
                                 do_sync_write
                                 reiserfs_file_write
                                 vfs_write
                                 |
                                  --3.14%-- sys_pwrite64
                                            system_call_fastpath
                                            __pwrite64
      
           5.39%  [k] search_by_key
                      |
                       --2.23%-- reiserfs_update_sd_size
      
           4.63%  0x00000000009e0a
      
           2.36%  [k] memcpy_c
      [...]
      
      You can also omit it and it will default to 0.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246558475-10624-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c20ab37e
    • Frederic Weisbecker's avatar
      perf report: Add support for callchain graph output · 4eb3e478
      Frederic Weisbecker authored
      Currently, the printing of callchains is done in a single
      vertical level, this is the "flat" mode:
      
      8.25%  [k] copy_user_generic_string
                   4.19%
                      copy_user_generic_string
                      generic_file_aio_read
                      do_sync_read
                      vfs_read
                      sys_pread64
                      system_call_fastpath
                      pread64
      
      This patch introduces a new "graph" mode which provides a
      hierarchical output of factorized paths recursively sorted:
      
       8.25%  [k] copy_user_generic_string
                      |
                      |--4.31%-- generic_file_aio_read
                      |          do_sync_read
                      |          vfs_read
                      |          |
                      |          |--4.19%-- sys_pread64
                      |          |          system_call_fastpath
                      |          |          pread64
                      |          |
                      |           --0.12%-- sys_read
                      |                     system_call_fastpath
                      |                     __read
                      |
                      |--3.24%-- generic_file_buffered_write
                      |          __generic_file_aio_write_nolock
                      |          generic_file_aio_write
                      |          do_sync_write
                      |          reiserfs_file_write
                      |          vfs_write
                      |          |
                      |          |--3.14%-- sys_pwrite64
                      |          |          system_call_fastpath
                      |          |          __pwrite64
                      |          |
                      |           --0.10%-- sys_write
      [...]
      
      The command line has then changed.
      
      By providing the -c option, the callchain will output in the
      flat mode by default.
      
      But you can override it:
      
          perf report -c graph
      
      or
      
          perf report -c flat
      
      You can also pass the abreviated mode:
      
          perf report -c g
      
      or
      
          perf report -c gra
      
      will both make use of the graph mode.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246550301-8954-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4eb3e478
    • Frederic Weisbecker's avatar
      perf_counter tools: Add new OPT_CALLBACK_DEFAULT option · 5a4b1817
      Frederic Weisbecker authored
      There is no predefined macro to create an option that can have
      a custom value or a default one if none is given.
      
      This patch provides a new helper OPT_CALLBACK_DEFAULT() which
      defines such kind of option.
      
      For example, considering an option -c, we want to get the
      default value in the following cases:
      
          perf command -c -d
          perf command -d -c
      
      And the foo value when it's given:
      
          perf command -c foo -d
          perf command -d -c foo
      
      That's also why PARSE_OPT_LASTARG_DEFAULT is extended here to
      support default values whatever the position of the option, not
      only in the end.
      
      Should it now be renamed to PARSE_OPT_ARG_DEFAULT ?
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: git@vger.kernel.org
      LKML-Reference: <1246550301-8954-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5a4b1817
    • Frederic Weisbecker's avatar
      perf_counter tools: Create new chain_for_each_child() iterator · 14f4654c
      Frederic Weisbecker authored
      Iterating through children of a node in the callchain tree
      shows something that may be quite confusing at a first glance.
      The head is the children field of the parent and the list nodes
      are in the brothers field of the children.
      
      This is because the childs are linked to the parent as a list
      of "brothers" using the "children" list of the parent as a
      head:
      
        ---------------
       | Parent (head) |-------------------------------------
        ---------------                                      |
           |                                                 |
        children                                             |
           |                                                 |
        -----------               -----------                |
       | 1st child |---brother---| 2nd child |---brother-----
        -----------               -----------
      
      This makes the following strange pattern often occuring:
      
       list_for_each_entry(child, &parent->children, brothers) {
              // do something with children
       }
      
      Abstract it to chain_for_each_child() to factorize and simplify
      this pattern.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246550301-8954-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      14f4654c
    • Mike Galbraith's avatar
      perf_counter tools: Enable kernel module symbol loading in tools · 42976487
      Mike Galbraith authored
      Add the -m/--modules option to perf report and perf annotate,
      which enables live module symbol/image loading. To be used
      with -k/--vmlinux.
      
      (Also give perf annotate a -P/--full-paths option.)
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246514986.13293.48.camel@marge.simson.net>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      42976487
    • Mike Galbraith's avatar
      perf_counter tools: Connect module support infrastructure to symbol loading infrastructure · 6cfcc53e
      Mike Galbraith authored
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246514916.13293.46.camel@marge.simson.net>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6cfcc53e
    • Mike Galbraith's avatar
      perf_counter tools: Add infrastructure to support loading of kernel module symbols · 208b4b4a
      Mike Galbraith authored
      Add infrastructure for module path discovery and section load addresses.
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246514830.13293.44.camel@marge.simson.net>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      208b4b4a
    • Mike Galbraith's avatar
      perf_counter tools: Make symbol loading consistently return number of loaded symbols · 9974f496
      Mike Galbraith authored
      perf_counter tools: Make symbol loading consistently return number of loaded symbols.
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246514758.13293.42.camel@marge.simson.net>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9974f496
  5. 01 Jul, 2009 13 commits
    • Frederic Weisbecker's avatar
      perf stat: Handle pipe read failures in perf stat · a92bef0f
      Frederic Weisbecker authored
      Building builtin-stat.c reports the following errors:
      
      cc1: warnings being treated as errors
      builtin-stat.c: In function ‘run_perf_stat’:
      builtin-stat.c:242: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
      builtin-stat.c:255: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
      make: *** [builtin-stat.o] Erreur 1
      
      This patch handles the possible pipe read failures.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246474930-6088-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a92bef0f
    • Frederic Weisbecker's avatar
      perf_counter: Ignore the nmi call frames in the x86-64 backtraces · 0406ca6d
      Frederic Weisbecker authored
      About every callchains recorded with perf record are filled up
      including the internal perfcounter nmi frame:
      
       perf_callchain
       perf_counter_overflow
       intel_pmu_handle_irq
       perf_counter_nmi_handler
       notifier_call_chain
       atomic_notifier_call_chain
       notify_die
       do_nmi
       nmi
      
      We want ignore this frame as it's not interesting for
      instrumentation. To solve this, we simply ignore every frames
      from nmi context.
      
      New example of "perf report -s sym -c" after this patch:
      
      9.59%  [k] search_by_key
                   4.88%
                      search_by_key
                      reiserfs_read_locked_inode
                      reiserfs_iget
                      reiserfs_lookup
                      do_lookup
                      __link_path_walk
                      path_walk
                      do_path_lookup
                      user_path_at
                      vfs_fstatat
                      vfs_lstat
                      sys_newlstat
                      system_call_fastpath
                      __lxstat
                      0x406fb1
      
                   3.19%
                      search_by_key
                      search_by_entry_key
                      reiserfs_find_entry
                      reiserfs_lookup
                      do_lookup
                      __link_path_walk
                      path_walk
                      do_path_lookup
                      user_path_at
                      vfs_fstatat
                      vfs_lstat
                      sys_newlstat
                      system_call_fastpath
                      __lxstat
                      0x406fb1
      [...]
      
      For now this patch only solves the problem in x86-64.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246474930-6088-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0406ca6d
    • Arnaldo Carvalho de Melo's avatar
      perf_counter tools: Share list.h with the kernel · 5da50258
      Arnaldo Carvalho de Melo authored
      The copy we were using came from another copy I did for the dwarves
      (pahole) package, that came from the kernel years ago.
      
      The only function that is used by the perf tools and that isn't in the
      kernel is list_del_range, that I'm leaving in the perf tools only for
      now.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20090701174608.GA5823@ghostprotocols.net>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5da50258
    • Arnaldo Carvalho de Melo's avatar
      perf_counter tools: Share rbtree.with the kernel · 43cbcd8a
      Arnaldo Carvalho de Melo authored
      The tools/perf/util/rbtree.c copy already drifted by three
      csets:
      
       4b324126
       4c601178
       16c047ad
      
      So remove the copy and use the lib/rbtree.c directly, sharing
      the source code while still generating a separate object file,
      since tools/perf uses a far more agressive -O6 switch.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20090701152837.GG15682@ghostprotocols.net>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      43cbcd8a
    • Jaswinder Singh Rajput's avatar
      perf list: Add cache events · 73c24cb8
      Jaswinder Singh Rajput authored
      After:
      
      $ ./perf list
      
      List of pre-defined events (to be used in -e):
      
        cpu-cycles OR cycles                     [Hardware event]
        instructions                             [Hardware event]
        cache-references                         [Hardware event]
        cache-misses                             [Hardware event]
        branch-instructions OR branches          [Hardware event]
        branch-misses                            [Hardware event]
        bus-cycles                               [Hardware event]
      
        cpu-clock                                [Software event]
        task-clock                               [Software event]
        page-faults OR faults                    [Software event]
        minor-faults                             [Software event]
        major-faults                             [Software event]
        context-switches OR cs                   [Software event]
        cpu-migrations OR migrations             [Software event]
      
        L1-d$-loads                              [Hardware cache event]
        L1-d$-load-misses                        [Hardware cache event]
        L1-d$-stores                             [Hardware cache event]
        L1-d$-store-misses                       [Hardware cache event]
        L1-d$-prefetches                         [Hardware cache event]
        L1-d$-prefetch-misses                    [Hardware cache event]
        L1-i$-loads                              [Hardware cache event]
        L1-i$-load-misses                        [Hardware cache event]
        L1-i$-prefetches                         [Hardware cache event]
        L1-i$-prefetch-misses                    [Hardware cache event]
        LLC-loads                                [Hardware cache event]
        LLC-load-misses                          [Hardware cache event]
        LLC-stores                               [Hardware cache event]
        LLC-store-misses                         [Hardware cache event]
        LLC-prefetches                           [Hardware cache event]
        LLC-prefetch-misses                      [Hardware cache event]
        dTLB-loads                               [Hardware cache event]
        dTLB-load-misses                         [Hardware cache event]
        dTLB-stores                              [Hardware cache event]
        dTLB-store-misses                        [Hardware cache event]
        dTLB-prefetches                          [Hardware cache event]
        dTLB-prefetch-misses                     [Hardware cache event]
        iTLB-loads                               [Hardware cache event]
        iTLB-load-misses                         [Hardware cache event]
        branch-loads                             [Hardware cache event]
        branch-load-misses                       [Hardware cache event]
      
        rNNN                                     [raw hardware event descriptor]
      Signed-off-by: default avatarJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1246453578.3072.1.camel@ht.satnam>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      73c24cb8
    • Jaswinder Singh Rajput's avatar
      perf stat: Define MATCH_EVENT for easy attr checking · b9ebdcc0
      Jaswinder Singh Rajput authored
      MATCH_EVENT is useful:
      
       1. for multiple attrs checking
       2. avoid repetition of PERF_TYPE_ and PERF_COUNT_ and save space
       3. avoids line breakage
      Signed-off-by: default avatarJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1246440909.3403.5.camel@hpdv5.satnam>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b9ebdcc0
    • Ingo Molnar's avatar
      perf_counter tools: Add more warnings and fix/annotate them · f37a291c
      Ingo Molnar authored
      Enable -Wextra. This found a few real bugs plus a number
      of signed/unsigned type mismatches/uncleanlinesses. It
      also required a few annotations
      
      All things considered it was still worth it so lets try with
      this enabled for now.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f37a291c
    • Ingo Molnar's avatar
      perf report: Fix HV bit mismerge · 88a69dfb
      Ingo Molnar authored
      Fix:
      
       builtin-report.c: In function ‘hist_entry__add’:
       builtin-report.c:1015: error: case label not within a switch statement
       builtin-report.c:1017: error: break statement not within loop or switch
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      88a69dfb
    • Paul Mackerras's avatar
      perf_counter tools: Rework event string parsing/syntax · 61c45981
      Paul Mackerras authored
      This reworks the parser for event descriptors to make it more
      consistent in what it accepts.  It is now structured as a
      recursive descent parser for the following grammar:
      
      events		::= event ( ("," | space) space* event )*
      event		::= ( raw_event | numeric_event | symbolic_event |
      		      generic_hw_event ) [ event_modifier ]
      raw_event	::= "r" hex_number
      numeric_event	::= number ":" number
      number		::= decimal_number | "0x" hex_number | "0" octal_number
      symbolic_event	::= string_from_event_symbols_array
      generic_hw_event::= cache_type ( "-" ( cache_op | cache_result ) )*
      event_modifier	::= ":" ( "u" | "k" | "h" )+
      
      with the extra restriction that you can have at most one
      cache_op and at most one cache_result.
      
      We pass the current string pointer by reference (i.e. as a
      const char **) to the various parsing functions so that they
      can advance the pointer to indicate how much they consumed.
      They return 0 if they didn't recognize the thing at the pointer
      or 1 if they did (and advance the pointer past it).
      
      This also fixes parse_aliases to take the longest matching
      alias from the table, not the first one.  Otherwise "l1-data"
      would match the "l1-d" alias and the "ata" would not be
      consumed.
      
      This allows event modifiers indicating what processor modes to
      count in to be applied to any event, not just numeric events,
      and adds a ":h" modifier to indicate counting in hypervisor
      mode.  Specifying ":u" now sets both exclude_kernel and
      exclude_hv, and so on.  Multiple modes can be specified, e.g.
      ":uk" will count in user or hypervisor mode (i.e. only
      exclude_kernel will be set).
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <19018.53826.843815.189847@cargo.ozlabs.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      61c45981
    • Anton Blanchard's avatar
      powerpc/perf_counter: Enable alternate PR/HV bits for POWER7 · 0a456fc5
      Anton Blanchard authored
      POWER7 has the same PR/HV bit layout as POWER6, so set the flag.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Cc: a.p.zijlstra@chello.nl
      Cc: benh@kernel.crashing.org
      LKML-Reference: <20090701030701.GI3563@kryten>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0a456fc5
    • Frederic Weisbecker's avatar
      perf_counter tools: Various fixes for callchains · deac911c
      Frederic Weisbecker authored
      The symbol resolving has of course revealed some bugs in the
      callchain tree handling. This patch fixes some of them,
      including:
      
      - inherit the children from the parents while splitting a node
      - fix list range moving
      - fix indexes setting in callchains
      - create a child on the current node if the path doesn't match in
        the existent children (was only done on the root)
      - compare using symbols when possible so that we can match a function
        using any ip inside by referring to its start address.
      
      The practical effects are:
      
      - remove double callchains
      - fix upside down or any random order of callchains
      - fix wrong paths
      - fix bad hits and percentage accounts
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246419315-9968-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      deac911c
    • Frederic Weisbecker's avatar
      perf_counter tools: Resolve symbols in callchains · 4424961a
      Frederic Weisbecker authored
      This patch resolves the names, when possible, of each ip
      present in the callchains while using the -c option with perf
      report.
      
      Example:
      
      5.40%  [k] __d_lookup
                   5.37%
                      perf_callchain
                      perf_counter_overflow
                      intel_pmu_handle_irq
                      perf_counter_nmi_handler
                      notifier_call_chain
                      atomic_notifier_call_chain
                      notify_die
                      do_nmi
                      nmi
                      do_lookup
                      __link_path_walk
                      path_walk
                      do_path_lookup
                      user_path_at
                      sys_faccessat
                      sys_access
                      system_call_fastpath
                      0x7fb609846f77
      
                   0.01%
                      perf_callchain
                      perf_counter_overflow
                      intel_pmu_handle_irq
                      perf_counter_nmi_handler
                      notifier_call_chain
                      atomic_notifier_call_chain
                      notify_die
                      do_nmi
                      nmi
                      do_lookup
                      __link_path_walk
                      path_walk
                      do_path_lookup
                      user_path_at
                      sys_faccessat
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246419315-9968-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4424961a
    • Frederic Weisbecker's avatar
      perf_counter tools: Fix storage size allocation of callchain list · 9198aa77
      Frederic Weisbecker authored
      Fix a confusion while giving the size of a callchain list
      during its allocation. We are using the wrong structure size.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246419315-9968-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9198aa77