1. 11 Dec, 2009 4 commits
    • Steven Rostedt's avatar
      tracing: Add stack trace to irqsoff tracer · cc51a0fc
      Steven Rostedt authored
      The irqsoff and friends tracers help in finding causes of latency in the
      kernel. The also work with the function tracer to show what was happening
      when interrupts or preemption are disabled. But the function tracer has
      a bit of an overhead and can cause exagerated readings.
      
      Currently, when tracing with /proc/sys/kernel/ftrace_enabled = 0, where the
      function tracer is disabled, the information that is provided can end up
      being useless. For example, a 2 and a half millisecond latency only showed:
      
       # tracer: preemptirqsoff
       #
       # preemptirqsoff latency trace v1.1.5 on 2.6.32
       # --------------------------------------------------------------------
       # latency: 2463 us, #4/4, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
       #    -----------------
       #    | task: -4242 (uid:0 nice:0 policy:0 rt_prio:0)
       #    -----------------
       #  => started at: _spin_lock_irqsave
       #  => ended at:   remove_wait_queue
       #
       #
       #                  _------=> CPU#
       #                 / _-----=> irqs-off
       #                | / _----=> need-resched
       #                || / _---=> hardirq/softirq
       #                ||| / _--=> preempt-depth
       #                |||| /_--=> lock-depth
       #                |||||/     delay
       #  cmd     pid   |||||| time  |   caller
       #     \   /      ||||||   \   |   /
       hackbenc-4242    2d....    0us!: trace_hardirqs_off <-_spin_lock_irqsave
       hackbenc-4242    2...1. 2463us+: _spin_unlock_irqrestore <-remove_wait_queue
       hackbenc-4242    2...1. 2466us : trace_preempt_on <-remove_wait_queue
      
      The above lets us know that hackbench with pid 2463 grabbed a spin lock
      somewhere and enabled preemption at remove_wait_queue. This helps a little
      but where this actually happened is not informative.
      
      This patch adds the stack dump to the end of the irqsoff tracer. This provides
      the following output:
      
       hackbenc-4242    2d....    0us!: trace_hardirqs_off <-_spin_lock_irqsave
       hackbenc-4242    2...1. 2463us+: _spin_unlock_irqrestore <-remove_wait_queue
       hackbenc-4242    2...1. 2466us : trace_preempt_on <-remove_wait_queue
       hackbenc-4242    2...1. 2467us : <stack trace>
        => sub_preempt_count
        => _spin_unlock_irqrestore
        => remove_wait_queue
        => free_poll_entry
        => poll_freewait
        => do_sys_poll
        => sys_poll
        => system_call_fastpath
      
      Now we see that the culprit of this latency was the free_poll_entry code.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      cc51a0fc
    • Steven Rostedt's avatar
      tracing: Add trace_dump_stack() · 03889384
      Steven Rostedt authored
      I've been asked a few times about how to find out what is calling
      some location in the kernel. One way is to use dynamic function tracing
      and implement the func_stack_trace. But this only finds out who is
      calling a particular function. It does not tell you who is calling
      that function and entering a specific if conditional.
      
      I have myself implemented a quick version of trace_dump_stack() for
      this purpose a few times, and just needed it now. This is when I realized
      that this would be a good tool to have in the kernel like trace_printk().
      
      Using trace_dump_stack() is similar to dump_stack() except that it
      writes to the trace buffer instead and can be used in critical locations.
      
      For example:
      
      @@ -5485,8 +5485,12 @@ need_resched_nonpreemptible:
       	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
       		if (unlikely(signal_pending_state(prev->state, prev)))
       			prev->state = TASK_RUNNING;
      -		else
      +		else {
       			deactivate_task(rq, prev, 1);
      +			trace_printk("Deactivating task %s:%d\n",
      +				     prev->comm, prev->pid);
      +			trace_dump_stack();
      +		}
       		switch_count = &prev->nvcsw;
       	}
      
      Produces:
      
                 <...>-3249  [001]   296.105269: schedule: Deactivating task ntpd:3249
                 <...>-3249  [001]   296.105270: <stack trace>
       => schedule
       => schedule_hrtimeout_range
       => poll_schedule_timeout
       => do_select
       => core_sys_select
       => sys_select
       => system_call_fastpath
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      03889384
    • Steven Rostedt's avatar
      ring-buffer: Move resize integrity check under reader lock · dd7f5943
      Steven Rostedt authored
      While using an application that does splice on the ftrace ring
      buffer at start up, I triggered an integrity check failure.
      
      Looking into this, I discovered that resizing the buffer performs
      an integrity check after the buffer is resized. This check unfortunately
      is preformed after it releases the reader lock. If a reader is
      reading the buffer it may cause the integrity check to trigger a
      false failure.
      
      This patch simply moves the integrity checker under the protection
      of the ring buffer reader lock.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      dd7f5943
    • Steven Rostedt's avatar
      ring-buffer: Use sync sched protection on ring buffer resizing · 18421015
      Steven Rostedt authored
      There was a comment in the ring buffer code that says the calling
      layers should prevent tracing or reading of the ring buffer while
      resizing. I have discovered that the tracers do not honor this
      arrangement.
      
      This patch moves the disabling and synchronizing the ring buffer to
      a higher layer during resizing. This guarantees that no writes
      are occurring while the resize takes place.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      18421015
  2. 10 Dec, 2009 2 commits
  3. 09 Dec, 2009 9 commits
    • Carsten Emde's avatar
      tracing: Remove comparing of NULL to va_list in trace_array_vprintk() · f2942487
      Carsten Emde authored
      Olof Johansson stated the following:
      
        Comparing a va_list with NULL is bogus. It's supposed to be treated like
        an opaque type and only be manipulated with va_* accessors.
      
      Olof noticed that this code broke the ARM builds:
      
          kernel/trace/trace.c: In function 'trace_array_vprintk':
          kernel/trace/trace.c:1364: error: invalid operands to binary == (have 'va_list' and 'void *')
          kernel/trace/trace.c: In function 'tracing_mark_write':
          kernel/trace/trace.c:3349: error: incompatible type for argument 3 of 'trace_vprintk'
      
      This patch partly reverts c13d2f7c and
      re-installs the original mark_printk() mechanism.
      Reported-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarCarsten Emde <C.Emde@osadl.org>
      LKML-Reference: <4B1BAB74.104@osadl.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f2942487
    • Jiri Olsa's avatar
      tracing: Fix function graph trace_pipe to properly display failed entries · be1eca39
      Jiri Olsa authored
      There is a case where the graph tracer might get confused and omits
      displaying of a single record.  This applies mostly with the trace_pipe
      since it is unlikely that the trace_seq buffer will overflow with the
      trace file.
      
      As the function_graph tracer goes through the trace entries keeping a
      pointer to the current record:
      
      current ->  func1 ENTRY
                  func2 ENTRY
                  func2 RETURN
                  func1 RETURN
      
      When an function ENTRY is encountered, it moves the pointer to the
      next entry to check if the function is a nested or leaf function.
      
                  func1 ENTRY
      current ->  func2 ENTRY
                  func2 RETURN
                  func1 RETURN
      
      If the rest of the writing of the function fills the trace_seq buffer,
      then the trace_pipe read will ignore this entry. The next read will
      Now start at the current location, but the first entry (func1) will
      be discarded.
      
      This patch keeps a copy of the current entry in the iterator private
      storage and will keep track of when the trace_seq buffer fills. When
      the trace_seq buffer fills, it will reuse the copy of the entry in the
      next iteration.
      
      [
        This patch has been largely modified by Steven Rostedt in order to
        clean it up and simplify it. The original idea and concept was from
        Jirka and for that, this patch will go under his name to give him
        the credit he deserves. But because this was modify by Steven Rostedt
        anything wrong with the patch should be blamed on Steven.
      ]
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      be1eca39
    • Johannes Berg's avatar
      tracing: Add full state to trace_seq · d184b31c
      Johannes Berg authored
      The trace_seq buffer might fill up, and right now one needs to check the
      return value of each printf into the buffer to check for that.
      
      Instead, have the buffer keep track of whether it is full or not, and
      reject more input if it is full or would have overflowed with an input
      that wasn't added.
      
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      d184b31c
    • Steven Rostedt's avatar
      tracing: Buffer the output of seq_file in case of filled buffer · a63ce5b3
      Steven Rostedt authored
      If the seq_read fills the buffer it will call s_start again on the next
      itertation with the same position. This causes a problem with the
      function_graph tracer because it consumes the iteration in order to
      determine leaf functions.
      
      What happens is that the iterator stores the entry, and the function
      graph plugin will look at the next entry. If that next entry is a return
      of the same function and task, then the function is a leaf and the
      function_graph plugin calls ring_buffer_read which moves the ring buffer
      iterator forward (the trace iterator still points to the function start
      entry).
      
      The copying of the trace_seq to the seq_file buffer will fail if the
      seq_file buffer is full. The seq_read will not show this entry.
      The next read by userspace will cause seq_read to again call s_start
      which will reuse the trace iterator entry (the function start entry).
      But the function return entry was already consumed. The function graph
      plugin will think that this entry is a nested function and not a leaf.
      
      To solve this, the trace code now checks the return status of the
      seq_printf (trace_print_seq). If the writing to the seq_file buffer
      fails, we set a flag in the iterator (leftover) and we do not reset
      the trace_seq buffer. On the next call to s_start, we check the leftover
      flag, and if it is set, we just reuse the trace_seq buffer and do not
      call into the plugin print functions.
      
      Before this patch:
      
       2)               |      fput() {
       2)               |        __fput() {
       2)   0.550 us    |          inotify_inode_queue_event();
       2)               |          __fsnotify_parent() {
       2)   0.540 us    |          inotify_dentry_parent_queue_event();
      
      After the patch:
      
       2)               |      fput() {
       2)               |        __fput() {
       2)   0.550 us    |          inotify_inode_queue_event();
       2)   0.548 us    |          __fsnotify_parent();
       2)   0.540 us    |          inotify_dentry_parent_queue_event();
      
      [
        Updated the patch to fix a missing return 0 from the trace_print_seq()
        stub when CONFIG_TRACING is disabled.
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      ]
      Reported-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      a63ce5b3
    • Steven Rostedt's avatar
      tracing: Only call pipe_close if pipe_close is defined · 29bf4a5e
      Steven Rostedt authored
      This fixes a cut and paste error that had pipe_close get called
      if pipe_open was defined (not pipe_close).
      Reported-by: default avatarKosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      LKML-Reference: <20091209153204.F4CD.A69D9226@jp.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      29bf4a5e
    • Linus Torvalds's avatar
      Merge branches 'timers-for-linus-ntp' and 'irq-core-for-linus' of... · 2b876f95
      Linus Torvalds authored
      Merge branches 'timers-for-linus-ntp' and 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'timers-for-linus-ntp' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        ntp: Provide compability defines (You say MOD_NANO, I say ADJ_NANO)
      
      * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        genirq: do not execute DEBUG_SHIRQ when irq setup failed
      2b876f95
    • Linus Torvalds's avatar
      Merge branch 'timers-for-linus-urgent' of... · fbf07eac
      Linus Torvalds authored
      Merge branch 'timers-for-linus-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'timers-for-linus-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        hrtimer: Fix /proc/timer_list regression
        itimers: Fix racy writes to cpu_itimer fields
        timekeeping: Fix clock_gettime vsyscall time warp
      fbf07eac
    • Linus Torvalds's avatar
      Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip · 60d8ce2c
      Linus Torvalds authored
      * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        timers, init: Limit the number of per cpu calibration bootup messages
        posix-cpu-timers: optimize and document timer_create callback
        clockevents: Add missing include to pacify sparse
        x86: vmiclock: Fix printk format
        x86: Fix printk format due to variable type change
        sparc: fix printk for change of variable type
        clocksource/events: Fix fallout of generic code changes
        nohz: Allow 32-bit machines to sleep for more than 2.15 seconds
        nohz: Track last do_timer() cpu
        nohz: Prevent clocksource wrapping during idle
        nohz: Type cast printk argument
        mips: Use generic mult/shift factor calculation for clocks
        clocksource: Provide a generic mult/shift factor calculation
        clockevents: Use u32 for mult and shift factors
        nohz: Introduce arch_needs_cpu
        nohz: Reuse ktime in sub-functions of tick_check_idle.
        time: Remove xtime_cache
        time: Implement logarithmic time accumulation
      60d8ce2c
    • Linus Torvalds's avatar
      Merge branch 'timers-for-linus-hpet' of... · 849e8dea
      Linus Torvalds authored
      Merge branch 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        x86: hpet: Make WARN_ON understandable
        x86: arch specific support for remapping HPET MSIs
        intr-remap: generic support for remapping HPET MSIs
        x86, hpet: Simplify the HPET code
        x86, hpet: Disable per-cpu hpet timer if ARAT is supported
      849e8dea
  4. 08 Dec, 2009 25 commits