1. 20 Jun, 2009 3 commits
    • Ingo Molnar's avatar
      Merge branch 'tip/tracing/urgent' of... · 3daeb4da
      Ingo Molnar authored
      Merge branch 'tip/tracing/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent
      3daeb4da
    • Frederic Weisbecker's avatar
      tracing/urgent: warn in case of ftrace_start_up inbalance · 9ea1a153
      Frederic Weisbecker authored
      Prevent from further ftrace_start_up inbalances so that we avoid
      future nop patching omissions with dynamic ftrace.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      9ea1a153
    • Frederic Weisbecker's avatar
      tracing/urgent: fix unbalanced ftrace_start_up · c85a17e2
      Frederic Weisbecker authored
      Perfcounter reports the following stats for a wide system
      profiling:
      
       #
       # (2364 samples)
       #
       # Overhead  Symbol
       # ........  ......
       #
          15.40%  [k] mwait_idle_with_hints
           8.29%  [k] read_hpet
           5.75%  [k] ftrace_caller
           3.60%  [k] ftrace_call
           [...]
      
      This snapshot has been taken while neither the function tracer nor
      the function graph tracer was running.
      With dynamic ftrace, such results show a wrong ftrace behaviour
      because all calls to ftrace_caller or ftrace_graph_caller (the patched
      calls to mcount) are supposed to be patched into nop if none of those
      tracers are running.
      
      The problem occurs after the first run of the function tracer. Once we
      launch it a second time, the callsites will never be nopped back,
      unless you set custom filters.
      For example it happens during the self tests at boot time.
      The function tracer selftest runs, and then the dynamic tracing is
      tested too. After that, the callsites are left un-nopped.
      
      This is because the reset callback of the function tracer tries to
      unregister two ftrace callbacks in once: the common function tracer
      and the function tracer with stack backtrace, regardless of which
      one is currently in use.
      It then creates an unbalance on ftrace_start_up value which is expected
      to be zero when the last ftrace callback is unregistered. When it
      reaches zero, the FTRACE_DISABLE_CALLS is set on the next ftrace
      command, triggering the patching into nop. But since it becomes
      unbalanced, ie becomes lower than zero, if the kernel functions
      are patched again (as in every further function tracer runs), they
      won't ever be nopped back.
      
      Note that ftrace_call and ftrace_graph_call are still patched back
      to ftrace_stub in the off case, but not the callers of ftrace_call
      and ftrace_graph_caller. It means that the tracing is well deactivated
      but we waste a useless call into every kernel function.
      
      This patch just unregisters the right ftrace_ops for the function
      tracer on its reset callback and ignores the other one which is
      not registered, fixing the unbalance. The problem also happens
      is .30
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: stable@kernel.org
      c85a17e2
  2. 17 Jun, 2009 7 commits
    • Steven Rostedt's avatar
      ring-buffer: have benchmark test print to trace buffer · 4b221f03
      Steven Rostedt authored
      Currently the output of the ring buffer benchmark/test prints to
      the console. This test runs for ten seconds every ten seconds and
      ouputs the result after every iteration. This needlessly fills up
      the logs.
      
      This patch makes the ring buffer benchmark/test print to the ftrace
      buffer using trace_printk. To view the test results, you must examine
      the debug/tracing/trace file.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      4b221f03
    • Steven Rostedt's avatar
      ring-buffer: do not grab locks in nmi · 8d707e8e
      Steven Rostedt authored
      If ftrace_dump_on_oops is set, and an NMI detects a lockup, then it
      will need to read from the ring buffer. But the read side of the
      ring buffer still takes locks. This patch adds a check on the read
      side that if it is in an NMI, then it will disable the ring buffer
      and not take any locks.
      
      Reads can still happen on a disabled ring buffer.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      8d707e8e
    • Steven Rostedt's avatar
      ring-buffer: add locks around rb_per_cpu_empty · d4788207
      Steven Rostedt authored
      The checking of whether the buffer is empty or not needs to be serialized
      among the readers. Add the reader spin lock around it.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      d4788207
    • Steven Rostedt's avatar
      ring-buffer: check for less than two in size allocation · 5f78abee
      Steven Rostedt authored
      The ring buffer must have at least two pages allocated for the
      reader page swap to work.
      
      The page count check will miss the case of a zero size passed in.
      Even though a zero size ring buffer would probably fail an allocation,
      making the min size check for less than two instead of equal to one makes
      the code a bit more robust.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      5f78abee
    • Steven Rostedt's avatar
      ring-buffer: remove useless compile check for buffer_page size · 0dcd4d6c
      Steven Rostedt authored
      The original version of the ring buffer had a hack to map the
      page struct that held the pages of the buffer to also be the structure
      that the ring buffer would keep the pages in a link list.
      
      This overlap of the page struct was very dangerous and that hack was
      removed a while ago.
      
      But there was a check to make sure the buffer_page never became bigger
      than the page struct, and would fail the compile if it did. The
      check was only meaningful when we had the hack. Now that we have separate
      allocated descriptors for the buffer pages, we can remove this check.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      0dcd4d6c
    • Steven Rostedt's avatar
      ring-buffer: remove useless warn on check · c6a9d7b5
      Steven Rostedt authored
      A check if "write > BUF_PAGE_SIZE" is done right after a
      
      	if (write > BUF_PAGE_SIZE)
      		return ...;
      
      Thus the check is actually testing the compiler and not the
      kernel. This is useless, remove it.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      c6a9d7b5
    • Steven Rostedt's avatar
      ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index · 22f470f8
      Steven Rostedt authored
      The index of the event is found by masking PAGE_MASK to it and
      subtracting the header size. Currently the header size is calculate
      by PAGE_SIZE - BUF_PAGE_SIZE, when we already have a macro
      BUF_PAGE_HDR_SIZE to define it.
      
      If we want to change BUF_PAGE_SIZE to something less than filling
      the rest of the page (this is done for debugging), then we break
      the algorithm to find the index.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      22f470f8
  3. 16 Jun, 2009 6 commits
    • Steven Rostedt's avatar
      tracing: update sample event documentation · 44ad18e0
      Steven Rostedt authored
      The comments in the sample code is a bit confusing. This patch
      cleans them up a little.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      44ad18e0
    • Li Zefan's avatar
      tracing/filters: fix race between filter setting and module unload · 00e95830
      Li Zefan authored
      Module unload is protected by event_mutex, while setting filter is
      protected by filter_mutex. This leads to the race:
      
      echo 'bar == 0 || bar == 10' \    |
      		> sample/filter   |
                                        |  insmod sample.ko
        add_pred("bar == 0")            |
          -> n_preds == 1               |
        add_pred("bar == 100")          |
          -> n_preds == 2               |
                                        |  rmmod sample.ko
                                        |  insmod sample.ko
        add_pred("&&")                  |
          -> n_preds == 1 (should be 3) |
      
      Now event->filter->preds is corrupted. An then when filter_match_preds()
      is called, the WARN_ON() in it will be triggered.
      
      To avoid the race, we remove filter_mutex, and replace it with event_mutex.
      
      [ Impact: prevent corruption of filters by module removing and loading ]
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A375A4D.6000205@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      00e95830
    • Li Zefan's avatar
      tracing/filters: free filter_string in destroy_preds() · 57be8887
      Li Zefan authored
      filter->filter_string is not freed when unloading a module:
      
       # insmod trace-events-sample.ko
       # echo "bar < 100" > /mnt/tracing/events/sample/foo_bar/filter
       # rmmod trace-events-sample.ko
      
      [ Impact: fix memory leak when unloading module ]
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A375A30.9060802@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      57be8887
    • Steven Rostedt's avatar
      ring-buffer: use commit counters for commit pointer accounting · fa743953
      Steven Rostedt authored
      The ring buffer is made up of three sets of pointers.
      
      The head page pointer, which points to the next page for the reader to
      get.
      
      The commit pointer and commit index, which points to the page and index
      of the last committed write respectively.
      
      The tail pointer and tail index, which points to the page and the index
      of the last reserved data respectively (non committed).
      
      The commit pointer is only moved forward by the outer most writer.
      If a nested writer comes in, it will not move the pointer forward.
      
      The current implementation has a flaw. It assumes that the outer most
      writer successfully reserved data. There's a small race window where
      the outer most writer could find the tail pointer, but a nested
      writer could come in (via interrupt) and move the tail forward, and
      even the commit forward.
      
      The outer writer would not realized the commit moved forward and the
      accounting will break.
      
      This patch changes the design to use counters in the per cpu buffers
      to keep track of commits. The counters are incremented at the start
      of the commit, and decremented at the end. If the end commit counter
      is 1, then it moves the commit pointers. A loop is made to check for
      races between checking and moving the commit pointers. Only the outer
      commit should move the pointers anyway.
      
      The test of knowing if a reserve is equal to the last commit update
      is still needed to know for time keeping. The time code is much less
      racey than the commit updates.
      
      This change not only solves the mentioned race, but also makes the
      code simpler.
      
      [ Impact: fix commit race and simplify code ]
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      fa743953
    • Steven Rostedt's avatar
      ring-buffer: remove unused variable · 263294f3
      Steven Rostedt authored
      Fix the compiler error:
      
      kernel/trace/ring_buffer.c: In function 'rb_move_tail':
      kernel/trace/ring_buffer.c:1236: warning: unused variable 'event'
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      263294f3
    • Steven Rostedt's avatar
      ring-buffer: have benchmark test handle discarded events · 9086c7b9
      Steven Rostedt authored
      With the addition of commit:
      
        c7b09308
        ring-buffer: prevent adding write in discarded area
      
      The ring buffer may now add discarded events when a write passes
      the end of a buffer page. Before, a discarded event was only added
      when the tracer deliberately created one. The ring buffer benchmark
      test does not handle discarded events when it reads the buffer and
      fails when it encounters one.
      
      Also fix the increment for large data entries (luckily, the test did
      not add any yet).
      
      [ Impact: fix false failure of ring buffer self test ]
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      9086c7b9
  4. 15 Jun, 2009 7 commits
  5. 14 Jun, 2009 17 commits