1. 18 Jun, 2009 2 commits
    • Steven Rostedt's avatar
      function-graph: add stack frame test · 71e308a2
      Steven Rostedt authored
      In case gcc does something funny with the stack frames, or the return
      from function code, we would like to detect that.
      
      An arch may implement passing of a variable that is unique to the
      function and can be saved on entering a function and can be tested
      when exiting the function. Usually the frame pointer can be used for
      this purpose.
      
      This patch also implements this for x86. Where it passes in the stack
      frame of the parent function, and will test that frame on exit.
      
      There was a case in x86_32 with optimize for size (-Os) where, for a
      few functions, gcc would align the stack frame and place a copy of the
      return address into it. The function graph tracer modified the copy and
      not the actual return address. On return from the funtion, it did not go
      to the tracer hook, but returned to the parent. This broke the function
      graph tracer, because the return of the parent (where gcc did not do
      this funky manipulation) returned to the location that the child function
      was suppose to. This caused strange kernel crashes.
      
      This test detected the problem and pointed out where the issue was.
      
      This modifies the parameters of one of the functions that the arch
      specific code calls, so it includes changes to arch code to accommodate
      the new prototype.
      
      Note, I notice that the parsic arch implements its own push_return_trace.
      This is now a generic function and the ftrace_push_return_trace should be
      used instead. This patch does not touch that code.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      71e308a2
    • Steven Rostedt's avatar
      function-graph: disable when both x86_32 and optimize for size are configured · eb4a0378
      Steven Rostedt authored
      On x86_32, when optimize for size is set, gcc may align the frame pointer
      and make a copy of the the return address inside the stack frame.
      The return address that is located in the stack frame may not be
      the one used to return to the calling function. This will break the
      function graph tracer.
      
      The function graph tracer replaces the return address with a jump to a hook
      function that can trace the exit of the function. If it only replaces
      a copy, then the hook will not be called when the function returns.
      Worse yet, when the parent function returns, the function graph tracer
      will return back to the location of the child function which will
      easily crash the kernel with weird results.
      
      To see the problem, when i386 is compiled with -Os we get:
      
      c106be03:       57                      push   %edi
      c106be04:       8d 7c 24 08             lea    0x8(%esp),%edi
      c106be08:       83 e4 e0                and    $0xffffffe0,%esp
      c106be0b:       ff 77 fc                pushl  0xfffffffc(%edi)
      c106be0e:       55                      push   %ebp
      c106be0f:       89 e5                   mov    %esp,%ebp
      c106be11:       57                      push   %edi
      c106be12:       56                      push   %esi
      c106be13:       53                      push   %ebx
      c106be14:       81 ec 8c 00 00 00       sub    $0x8c,%esp
      c106be1a:       e8 f5 57 fb ff          call   c1021614 <mcount>
      
      When it is compiled with -O2 instead we get:
      
      c10896f0:       55                      push   %ebp
      c10896f1:       89 e5                   mov    %esp,%ebp
      c10896f3:       83 ec 28                sub    $0x28,%esp
      c10896f6:       89 5d f4                mov    %ebx,0xfffffff4(%ebp)
      c10896f9:       89 75 f8                mov    %esi,0xfffffff8(%ebp)
      c10896fc:       89 7d fc                mov    %edi,0xfffffffc(%ebp)
      c10896ff:       e8 d0 08 fa ff          call   c1029fd4 <mcount>
      
      The compile with -Os will align the stack pointer then set up the
      frame pointer (%ebp), and it copies the return address back into
      the stack frame. The change to the return address in mcount is done
      to the copy and not the real place holder of the return address.
      
      Then compile with -O2 sets up the frame pointer first, this makes
      the change to the return address by mcount affect where the function
      will jump on exit.
      Reported-by: default avatarJake Edge <jake@lwn.net>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      eb4a0378
  2. 17 Jun, 2009 7 commits
    • Steven Rostedt's avatar
      ring-buffer: have benchmark test print to trace buffer · 4b221f03
      Steven Rostedt authored
      Currently the output of the ring buffer benchmark/test prints to
      the console. This test runs for ten seconds every ten seconds and
      ouputs the result after every iteration. This needlessly fills up
      the logs.
      
      This patch makes the ring buffer benchmark/test print to the ftrace
      buffer using trace_printk. To view the test results, you must examine
      the debug/tracing/trace file.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      4b221f03
    • Steven Rostedt's avatar
      ring-buffer: do not grab locks in nmi · 8d707e8e
      Steven Rostedt authored
      If ftrace_dump_on_oops is set, and an NMI detects a lockup, then it
      will need to read from the ring buffer. But the read side of the
      ring buffer still takes locks. This patch adds a check on the read
      side that if it is in an NMI, then it will disable the ring buffer
      and not take any locks.
      
      Reads can still happen on a disabled ring buffer.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      8d707e8e
    • Steven Rostedt's avatar
      ring-buffer: add locks around rb_per_cpu_empty · d4788207
      Steven Rostedt authored
      The checking of whether the buffer is empty or not needs to be serialized
      among the readers. Add the reader spin lock around it.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      d4788207
    • Steven Rostedt's avatar
      ring-buffer: check for less than two in size allocation · 5f78abee
      Steven Rostedt authored
      The ring buffer must have at least two pages allocated for the
      reader page swap to work.
      
      The page count check will miss the case of a zero size passed in.
      Even though a zero size ring buffer would probably fail an allocation,
      making the min size check for less than two instead of equal to one makes
      the code a bit more robust.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      5f78abee
    • Steven Rostedt's avatar
      ring-buffer: remove useless compile check for buffer_page size · 0dcd4d6c
      Steven Rostedt authored
      The original version of the ring buffer had a hack to map the
      page struct that held the pages of the buffer to also be the structure
      that the ring buffer would keep the pages in a link list.
      
      This overlap of the page struct was very dangerous and that hack was
      removed a while ago.
      
      But there was a check to make sure the buffer_page never became bigger
      than the page struct, and would fail the compile if it did. The
      check was only meaningful when we had the hack. Now that we have separate
      allocated descriptors for the buffer pages, we can remove this check.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      0dcd4d6c
    • Steven Rostedt's avatar
      ring-buffer: remove useless warn on check · c6a9d7b5
      Steven Rostedt authored
      A check if "write > BUF_PAGE_SIZE" is done right after a
      
      	if (write > BUF_PAGE_SIZE)
      		return ...;
      
      Thus the check is actually testing the compiler and not the
      kernel. This is useless, remove it.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      c6a9d7b5
    • Steven Rostedt's avatar
      ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index · 22f470f8
      Steven Rostedt authored
      The index of the event is found by masking PAGE_MASK to it and
      subtracting the header size. Currently the header size is calculate
      by PAGE_SIZE - BUF_PAGE_SIZE, when we already have a macro
      BUF_PAGE_HDR_SIZE to define it.
      
      If we want to change BUF_PAGE_SIZE to something less than filling
      the rest of the page (this is done for debugging), then we break
      the algorithm to find the index.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      22f470f8
  3. 16 Jun, 2009 6 commits
    • Steven Rostedt's avatar
      tracing: update sample event documentation · 44ad18e0
      Steven Rostedt authored
      The comments in the sample code is a bit confusing. This patch
      cleans them up a little.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      44ad18e0
    • Li Zefan's avatar
      tracing/filters: fix race between filter setting and module unload · 00e95830
      Li Zefan authored
      Module unload is protected by event_mutex, while setting filter is
      protected by filter_mutex. This leads to the race:
      
      echo 'bar == 0 || bar == 10' \    |
      		> sample/filter   |
                                        |  insmod sample.ko
        add_pred("bar == 0")            |
          -> n_preds == 1               |
        add_pred("bar == 100")          |
          -> n_preds == 2               |
                                        |  rmmod sample.ko
                                        |  insmod sample.ko
        add_pred("&&")                  |
          -> n_preds == 1 (should be 3) |
      
      Now event->filter->preds is corrupted. An then when filter_match_preds()
      is called, the WARN_ON() in it will be triggered.
      
      To avoid the race, we remove filter_mutex, and replace it with event_mutex.
      
      [ Impact: prevent corruption of filters by module removing and loading ]
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A375A4D.6000205@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      00e95830
    • Li Zefan's avatar
      tracing/filters: free filter_string in destroy_preds() · 57be8887
      Li Zefan authored
      filter->filter_string is not freed when unloading a module:
      
       # insmod trace-events-sample.ko
       # echo "bar < 100" > /mnt/tracing/events/sample/foo_bar/filter
       # rmmod trace-events-sample.ko
      
      [ Impact: fix memory leak when unloading module ]
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A375A30.9060802@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      57be8887
    • Steven Rostedt's avatar
      ring-buffer: use commit counters for commit pointer accounting · fa743953
      Steven Rostedt authored
      The ring buffer is made up of three sets of pointers.
      
      The head page pointer, which points to the next page for the reader to
      get.
      
      The commit pointer and commit index, which points to the page and index
      of the last committed write respectively.
      
      The tail pointer and tail index, which points to the page and the index
      of the last reserved data respectively (non committed).
      
      The commit pointer is only moved forward by the outer most writer.
      If a nested writer comes in, it will not move the pointer forward.
      
      The current implementation has a flaw. It assumes that the outer most
      writer successfully reserved data. There's a small race window where
      the outer most writer could find the tail pointer, but a nested
      writer could come in (via interrupt) and move the tail forward, and
      even the commit forward.
      
      The outer writer would not realized the commit moved forward and the
      accounting will break.
      
      This patch changes the design to use counters in the per cpu buffers
      to keep track of commits. The counters are incremented at the start
      of the commit, and decremented at the end. If the end commit counter
      is 1, then it moves the commit pointers. A loop is made to check for
      races between checking and moving the commit pointers. Only the outer
      commit should move the pointers anyway.
      
      The test of knowing if a reserve is equal to the last commit update
      is still needed to know for time keeping. The time code is much less
      racey than the commit updates.
      
      This change not only solves the mentioned race, but also makes the
      code simpler.
      
      [ Impact: fix commit race and simplify code ]
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      fa743953
    • Steven Rostedt's avatar
      ring-buffer: remove unused variable · 263294f3
      Steven Rostedt authored
      Fix the compiler error:
      
      kernel/trace/ring_buffer.c: In function 'rb_move_tail':
      kernel/trace/ring_buffer.c:1236: warning: unused variable 'event'
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      263294f3
    • Steven Rostedt's avatar
      ring-buffer: have benchmark test handle discarded events · 9086c7b9
      Steven Rostedt authored
      With the addition of commit:
      
        c7b09308
        ring-buffer: prevent adding write in discarded area
      
      The ring buffer may now add discarded events when a write passes
      the end of a buffer page. Before, a discarded event was only added
      when the tracer deliberately created one. The ring buffer benchmark
      test does not handle discarded events when it reads the buffer and
      fails when it encounters one.
      
      Also fix the increment for large data entries (luckily, the test did
      not add any yet).
      
      [ Impact: fix false failure of ring buffer self test ]
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      9086c7b9
  4. 15 Jun, 2009 7 commits
  5. 14 Jun, 2009 18 commits