Commit d2ac742d authored by Carsten Emde's avatar Carsten Emde Committed by Thomas Gleixner

ftrace: Add latency histograms of missed timer offsets

A source of system latencies not yet considered in the histograms
of effective latencies are delayed timer interrupts. Such latencies
are mainly due to disabled interrupts. Recording of effective latencies
allows to continuously monitor a system's real-time capabilities
under real-world conditions.

This patch adds latency histograms of missed timer offsets. If the
timer belongs to a sleeper that is about to wakeup a task and the
latency is higher than previous latencies of such timers, some data
of this task are recorded as well.

Adapted and expanded Documentation/trace/histograms.txt.
Signed-off-by: default avatarCarsten Emde <C.Emde@osadl.org>
Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
parent 16731e6f
......@@ -26,24 +26,41 @@ when the end of the section is reached, and increments the frequency
counter of that latency value - irrespective of whether any concurrently
running process is affected by this latency or not.
- Configuration items (in the Kernel hacking/Tracers submenu)
CONFIG_INTERRUPT_OFF_LATENCY
CONFIG_PREEMPT_OFF_LATENCY
CONFIG_INTERRUPT_OFF_HIST
CONFIG_PREEMPT_OFF_HIST
* Effective latencies
Effective latencies are actually occuring during wakeup of a process. To
determine effective latencies, the kernel stores the time stamp when a
process is scheduled to be woken up, and determines the duration of the
wakeup time shortly before control is passed over to this process. Note
that the apparent latency in user space may be considerably longer,
since
i) interrupts may be disabled preventing the scheduler from initiating
the wakeup mechanism, and
There are two types of effective latencies, wakeup latencies and missed
timer latencies
* Wakeup latencies
Wakeup latencies may occur during wakeup of a process. To determine
wakeup latencies, the kernel stores the time stamp when a process is
scheduled to be woken up, and determines the duration of the wakeup time
shortly before control is passed over to this process. Note that the
apparent latency in user space may be considerably longer, since
i) interrupts may be disabled preventing the timer from waking up a process
in time
ii) the process may be interrupted after control is passed over to it
but before user space execution takes place.
If a particular wakeup latency is highest so far, details of the task
that is suffering from this latency are stored as well (see below).
- Configuration item (in the Kernel hacking/Tracers submenu)
CONFIG_WAKEUP_LATENCY_HIST
* Missed timer latencies
Missed timer latencies occur when a timer interrupt is serviced later
than it should. This is mainly due to disabled interrupts. To determine
the missed timer latency, the expected and the real execution time of a
timer are compared. If the former precedes the latter, the difference is
entered into the missed timer offsets histogram. If the timer is
responsible to wakeup a sleeping process and the latency is highest so
far among previous wakeup timers, details of the related task are stored
as well (see below).
- Configuration item (in the Kernel hacking/Tracers submenu)
CONFIG_WAKEUP_LATENCY
CONFIG_MISSED_TIMER_OFFSETS_HIST
* Usage
......@@ -59,19 +76,23 @@ from shell command line level, or add
nodev /sys sysfs defaults 0 0
nodev /sys/kernel/debug debugfs defaults 0 0
to the file /etc/fstab. All latency histogram related files are
to the file /etc/fstab in order to implicitly mount the debug file
system at every reboot. All latency histogram related files are
available in the directory /sys/kernel/debug/tracing/latency_hist. A
particular histogram type is enabled by writing non-zero to the related
variable in the /sys/kernel/debug/tracing/latency_hist/enable directory.
Select "preemptirqsoff" for the histograms of potential sources of
latencies and "wakeup" for histograms of effective latencies. The
histogram data - one per CPU - are available in the files
Select "preemptirqsoff" for histograms of potential sources of
latencies, "wakeup" for histograms of wakeup latencies and
"missed_timer_offsets" for histograms of missed timer offsets,
respectively.
The histogram data - one per CPU - are available in the files
/sys/kernel/debug/tracing/latency_hist/preemptoff/CPUx
/sys/kernel/debug/tracing/latency_hist/irqsoff/CPUx
/sys/kernel/debug/tracing/latency_hist/preemptirqsoff/CPUx
/sys/kernel/debug/tracing/latency_hist/wakeup/CPUx.
/sys/kernel/debug/tracing/latency_hist/wakeup/sharedprio/CPUx.
/sys/kernel/debug/tracing/latency_hist/missed_timer_offsets/CPUx.
The histograms are reset by writing non-zero to the file "reset" in a
particular latency directory. To reset all latency data, use
......@@ -94,19 +115,19 @@ fi
* Data format
Latency data are stored with a resolution of one microsecond. The
maximum latency is 10,240 microseconds. The data are only valid, if the
overflow register is empty. Every output line contains the latency in
microseconds in the first row and the number of samples in the second
row. To display only lines with a positive latency count, use, for
example,
maximum latency is 10,240 microseconds. Every output line contains the
latency in microseconds in the first row and the number of samples in
the second row. To display only lines with a positive latency count,
use, for example,
grep -v " 0$" /sys/kernel/debug/tracing/latency_hist/preemptoff/CPU0
#Minimum latency: 0 microseconds.
#Average latency: 0 microseconds.
#Maximum latency: 25 microseconds.
#Minimum latency: 0 microseconds
#Average latency: 0 microseconds
#Maximum latency: 25 microseconds
#Total samples: 3104770694
#There are 0 samples greater or equal than 10240 microseconds
#There are 0 samples lower than 0 microseconds.
#There are 0 samples greater or equal than 10240 microseconds.
#usecs samples
0 2984486876
1 49843506
......@@ -140,11 +161,16 @@ grep -v " 0$" /sys/kernel/debug/tracing/latency_hist/preemptoff/CPU0
Two different algorithms are used to determine the wakeup latency of a
process. One of them only considers processes that exclusively use the
highest priority of the system, the other one records the wakeup latency
of a process even if it shares the highest systemm latency with other
processes. The former is used to improve hardware and system software;
the related histograms are located it the wakeup subdirectory. The
latter is used to optimize the priority design of a given system; the
related histograms are located in the wakeup/sharedprio subdirectory.
of a process even if it shares the highest system latency with other
processes. The former is used to determine the worst-case latency of a
system; if higher than expected, the hardware and or system software
(e.g. the Linux kernel) may need to be debugged and fixed. The latter
reflects the priority design of a given system; if higher than expected,
the system design may need to be re-evaluated - the hardware
manufacturer or the kernel developers must not be blamed for such
latencies. The exclusive-priority wakeup latency histograms are located
in the "wakeup" subdirectory, the shared-priority histograms are located
in the "wakeup/sharedprio" subdirectory.
* Wakeup latency of a selected process
......@@ -157,20 +183,18 @@ PID of the requested process to
PIDs are not considered, if this variable is set to 0.
* Details of the process with the highest wakeup latency so far
* Details of processes with the highest wakeup or missed timer
latency so far
Selected data of the process that suffered from the highest wakeup
latency that occurred in a particular CPU are available in the files
Selected data of processes that suffered from the highest wakeup or
missed timer latency that occurred on a particular CPU are available in
the files
/sys/kernel/debug/tracing/latency_hist/wakeup/max_latency-CPUx
and
/sys/kernel/debug/tracing/latency_hist/wakeup/sharedprio/max_latency-CPUx,
respectively.
/sys/kernel/debug/tracing/latency_hist/wakeup/sharedprio/max_latency-CPUx
/sys/kernel/debug/tracing/latency_hist/missed_timer_offsets/max_latency-CPUx
The format of the data is
<PID> <Priority> <Latency> <Command>
These data are also reset when the related wakeup histograms are reset.
These data are also reset when the related histograms are reset.
......@@ -17,8 +17,8 @@ TRACE_EVENT(preemptirqsoff_hist,
TP_ARGS(reason, starthist),
TP_STRUCT__entry(
__field( int, reason )
__field( int, starthist )
__field(int, reason )
__field(int, starthist )
),
TP_fast_assign(
......@@ -31,6 +31,31 @@ TRACE_EVENT(preemptirqsoff_hist,
);
#endif
#ifndef CONFIG_MISSED_TIMER_OFFSETS_HIST
#define trace_hrtimer_interrupt(a,b,c)
#else
TRACE_EVENT(hrtimer_interrupt,
TP_PROTO(int cpu, long long offset, struct task_struct *task),
TP_ARGS(cpu, offset, task),
TP_STRUCT__entry(
__array(char, comm, TASK_COMM_LEN)
__field(int, cpu )
__field(long long, offset )
),
TP_fast_assign(
strncpy(__entry->comm, task != NULL ? task->comm : "", TASK_COMM_LEN);
__entry->cpu = cpu;
__entry->offset = offset;
),
TP_printk("cpu=%d offset=%lld thread=%s", __entry->cpu, __entry->offset, __entry->comm)
);
#endif
#endif /* _TRACE_HIST_H */
/* This part must be outside protection */
......
......@@ -936,7 +936,15 @@ static int __init kernel_init(void * unused)
WARN_ON(irqs_disabled());
#endif
#define DEBUG_COUNT (defined(CONFIG_DEBUG_RT_MUTEXES) + defined(CONFIG_IRQSOFF_TRACER) + defined(CONFIG_PREEMPT_TRACER) + defined(CONFIG_STACK_TRACER) + defined(CONFIG_INTERRUPT_OFF_HIST) + defined(CONFIG_PREEMPT_OFF_HIST) + defined(CONFIG_WAKEUP_LATENCY_HIST) + defined(CONFIG_DEBUG_SLAB) + defined(CONFIG_DEBUG_PAGEALLOC) + defined(CONFIG_LOCKDEP) + (defined(CONFIG_FTRACE) - defined(CONFIG_FTRACE_MCOUNT_RECORD)))
#define DEBUG_COUNT (defined(CONFIG_DEBUG_RT_MUTEXES) + \
defined(CONFIG_IRQSOFF_TRACER) + defined(CONFIG_PREEMPT_TRACER) + \
defined(CONFIG_STACK_TRACER) + defined(CONFIG_INTERRUPT_OFF_HIST) + \
defined(CONFIG_PREEMPT_OFF_HIST) + \
defined(CONFIG_WAKEUP_LATENCY_HIST) + \
defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) + \
defined(CONFIG_DEBUG_SLAB) + defined(CONFIG_DEBUG_PAGEALLOC) + \
defined(CONFIG_LOCKDEP) + \
(defined(CONFIG_FTRACE) - defined(CONFIG_FTRACE_MCOUNT_RECORD)))
#if DEBUG_COUNT > 0
printk(KERN_ERR "*****************************************************************************\n");
......@@ -968,6 +976,9 @@ static int __init kernel_init(void * unused)
#ifdef CONFIG_WAKEUP_LATENCY_HIST
printk(KERN_ERR "* CONFIG_WAKEUP_LATENCY_HIST *\n");
#endif
#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
printk(KERN_ERR "* CONFIG_MISSED_TIMER_OFFSETS_HIST *\n");
#endif
#ifdef CONFIG_DEBUG_SLAB
printk(KERN_ERR "* CONFIG_DEBUG_SLAB *\n");
#endif
......
......@@ -48,6 +48,8 @@
#include <asm/uaccess.h>
#include <trace/events/hist.h>
/*
* The timer bases:
*
......@@ -1349,6 +1351,7 @@ static inline int hrtimer_rt_defer(struct hrtimer *timer) { return 0; }
#ifdef CONFIG_HIGH_RES_TIMERS
static int force_clock_reprogram;
static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer);
/*
* After 5 iteration's attempts, we consider that hrtimer_interrupt()
......@@ -1419,6 +1422,13 @@ void hrtimer_interrupt(struct clock_event_device *dev)
timer = rb_entry(node, struct hrtimer, node);
trace_hrtimer_interrupt(raw_smp_processor_id(),
ktime_to_ns(ktime_sub(
hrtimer_get_expires(timer), basenow)),
timer->function == hrtimer_wakeup ?
container_of(timer, struct hrtimer_sleeper,
timer)->task : NULL);
/*
* The immediate goal for using the softexpires is
* minimizing wakeups, not running timers at the
......
......@@ -179,7 +179,10 @@ config INTERRUPT_OFF_HIST
If PREEMPT_OFF_HIST is also selected, additional histograms (one
per cpu) are generated that accumulate the duration of time periods
when both interrupts and preemption are disabled.
when both interrupts and preemption are disabled. The histogram data
will be located in the debug file system at
/sys/kernel/debug/tracing/latency_hist/irqsoff
config PREEMPT_TRACER
bool "Preemption-off Latency Tracer"
......@@ -216,7 +219,10 @@ config PREEMPT_OFF_HIST
If INTERRUPT_OFF_HIST is also selected, additional histograms (one
per cpu) are generated that accumulate the duration of time periods
when both interrupts and preemption are disabled.
when both interrupts and preemption are disabled. The histogram data
will be located in the debug file system at
/sys/kernel/debug/tracing/latency_hist/preemptoff
config SCHED_TRACER
bool "Scheduling Latency Tracer"
......@@ -243,7 +249,29 @@ config WAKEUP_LATENCY_HIST
another one to determine the latency of processes that share the
highest system priority with other processes. The former is used to
improve hardware and system software, the latter to optimize the
priority design of a given system.
priority design of a given system. The histogram data will be
located in the debug file system at
/sys/kernel/debug/tracing/latency_hist/wakeup
and
/sys/kernel/debug/tracing/latency_hist/wakeup/sharedprio
config MISSED_TIMER_OFFSETS_HIST
depends on GENERIC_TIME
select GENERIC_TRACER
bool "Missed timer offsets histogram"
help
Generate a histogram of missed timer offsets in microseconds. The
histograms are disabled by default. To enable them, write a non-zero
number to
/sys/kernel/debug/tracing/latency_hist/enable/missed_timer_offsets
The histogram data will be located in the debug file system at
/sys/kernel/debug/tracing/latency_hist/missed_timer_offsets
config SYSPROF_TRACER
bool "Sysprof Tracer"
......
......@@ -38,6 +38,7 @@ obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
obj-$(CONFIG_INTERRUPT_OFF_HIST) += latency_hist.o
obj-$(CONFIG_PREEMPT_OFF_HIST) += latency_hist.o
obj-$(CONFIG_WAKEUP_LATENCY_HIST) += latency_hist.o
obj-$(CONFIG_MISSED_TIMER_OFFSETS_HIST) += latency_hist.o
obj-$(CONFIG_NOP_TRACER) += trace_nop.o
obj-$(CONFIG_STACK_TRACER) += trace_stack.o
obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment