Commit 16731e6f authored by Carsten Emde's avatar Carsten Emde Committed by Thomas Gleixner

ftrace: Consider shared max priority in latency histograms

The algorithm used so far to trace the process with the highest priority
requires that no other processes with the same priority are being woken
up simultaneously. Otherwise, a process with a lower priority may be
picked up for tracing which leads to an erroneously high latency value.

Generally, the wakeup latency of a process that exclusively uses the
highest priority of the system is due to software or hardware issues we
would like to solve or, at least, keep as small as possible. This is
what latency measurements are made for, after all. The wakeup latency of
a process that shares the highest priority of the system with other
processes, is quite another story. It may contain the worst-case runtime
durations of the other processes; thus, it is the result of the priority
design of a given system and nothing a kernel developer or hardware
engineer may want to fix.

This said, we need to separately record latencies i) of processes that
exclusively use the highest priority of the system and ii) of processes
that share the highest priority of the system with other processes.

The above mentioned shortcoming of the tracing algorithm also applies to
the variable tracing_max_latency that the wakeup latency tracer uses,
since it is based on the same procedure as the original version of the
latency histogram. In consequence, if several processes share the
highest priority of the system, the variable tracing_max_latency may
contain erroneously high values. We could now patch the wakeup latency
tracer as well and separately record the various latencies, but we
better document this behavior and recommend the latency histograms to
reliably determine a system's worst-case wakeup latency.

Simplified and cleaned up a bit. Added some more help info to Kconfig.
Signed-off-by: default avatarCarsten Emde <C.Emde@osadl.org>
Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
parent d9a4a1d0
...@@ -111,9 +111,14 @@ of ftrace. Here is a list of some of the key files: ...@@ -111,9 +111,14 @@ of ftrace. Here is a list of some of the key files:
For example, the time interrupts are disabled. For example, the time interrupts are disabled.
This time is saved in this file. The max trace This time is saved in this file. The max trace
will also be stored, and displayed by "trace". will also be stored, and displayed by "trace".
A new max trace will only be recorded if the A new max trace will only be recorded, if the
latency is greater than the value in this latency is greater than the value in this
file. (in microseconds) file (in microseconds). Note that the max latency
recorded by the wakeup and the wakeup_rt tracer
do not necessarily reflect the worst-case latency
of the system, but may be erroneously high in
case two or more processes share the maximum
priority of the system.
buffer_size_kb: buffer_size_kb:
......
...@@ -24,7 +24,7 @@ histograms of potential sources of latency, the kernel stores the time ...@@ -24,7 +24,7 @@ histograms of potential sources of latency, the kernel stores the time
stamp at the start of a critical section, determines the time elapsed stamp at the start of a critical section, determines the time elapsed
when the end of the section is reached, and increments the frequency when the end of the section is reached, and increments the frequency
counter of that latency value - irrespective of whether any concurrently counter of that latency value - irrespective of whether any concurrently
running process is affected by latency or not. running process is affected by this latency or not.
- Configuration items (in the Kernel hacking/Tracers submenu) - Configuration items (in the Kernel hacking/Tracers submenu)
CONFIG_INTERRUPT_OFF_LATENCY CONFIG_INTERRUPT_OFF_LATENCY
CONFIG_PREEMPT_OFF_LATENCY CONFIG_PREEMPT_OFF_LATENCY
...@@ -71,18 +71,20 @@ histogram data - one per CPU - are available in the files ...@@ -71,18 +71,20 @@ histogram data - one per CPU - are available in the files
/sys/kernel/debug/tracing/latency_hist/irqsoff/CPUx /sys/kernel/debug/tracing/latency_hist/irqsoff/CPUx
/sys/kernel/debug/tracing/latency_hist/preemptirqsoff/CPUx /sys/kernel/debug/tracing/latency_hist/preemptirqsoff/CPUx
/sys/kernel/debug/tracing/latency_hist/wakeup/CPUx. /sys/kernel/debug/tracing/latency_hist/wakeup/CPUx.
/sys/kernel/debug/tracing/latency_hist/wakeup/sharedprio/CPUx.
The histograms are reset by writing non-zero to the file "reset" in a The histograms are reset by writing non-zero to the file "reset" in a
particular latency directory. To reset all latency data, use particular latency directory. To reset all latency data, use
#!/bin/sh #!/bin/bash
HISTDIR=/sys/kernel/debug/tracing/latency_hist TRACINGDIR=/sys/kernel/debug/tracing
HISTDIR=$TRACINGDIR/latency_hist
if test -d $HISTDIR if test -d $HISTDIR
then then
cd $HISTDIR cd $HISTDIR
for i in */reset for i in `find . | grep /reset$`
do do
echo 1 >$i echo 1 >$i
done done
...@@ -133,6 +135,18 @@ grep -v " 0$" /sys/kernel/debug/tracing/latency_hist/preemptoff/CPU0 ...@@ -133,6 +135,18 @@ grep -v " 0$" /sys/kernel/debug/tracing/latency_hist/preemptoff/CPU0
25 1 25 1
* Two types of wakeup latency histograms
Two different algorithms are used to determine the wakeup latency of a
process. One of them only considers processes that exclusively use the
highest priority of the system, the other one records the wakeup latency
of a process even if it shares the highest systemm latency with other
processes. The former is used to improve hardware and system software;
the related histograms are located it the wakeup subdirectory. The
latter is used to optimize the priority design of a given system; the
related histograms are located in the wakeup/sharedprio subdirectory.
* Wakeup latency of a selected process * Wakeup latency of a selected process
To only collect wakeup latency data of a particular process, write the To only collect wakeup latency data of a particular process, write the
...@@ -146,11 +160,17 @@ PIDs are not considered, if this variable is set to 0. ...@@ -146,11 +160,17 @@ PIDs are not considered, if this variable is set to 0.
* Details of the process with the highest wakeup latency so far * Details of the process with the highest wakeup latency so far
Selected data of the process that suffered from the highest wakeup Selected data of the process that suffered from the highest wakeup
latency that occurred in a particular CPU are available in the file latency that occurred in a particular CPU are available in the files
/sys/kernel/debug/tracing/latency_hist/wakeup/max_latency-CPUx
and
/sys/kernel/debug/tracing/latency_hist/wakeup/sharedprio/max_latency-CPUx,
/sys/kernel/debug/tracing/latency_hist/wakeup/max_latency-CPUx. respectively.
The format of the data is The format of the data is
<PID> <Priority> <Latency> <Command> <PID> <Priority> <Latency> <Command>
These data are also reset when the wakeup histogram ist reset. These data are also reset when the related wakeup histograms are reset.
...@@ -1548,6 +1548,9 @@ struct task_struct { ...@@ -1548,6 +1548,9 @@ struct task_struct {
unsigned long trace; unsigned long trace;
/* bitmask of trace recursion */ /* bitmask of trace recursion */
unsigned long trace_recursion; unsigned long trace_recursion;
#ifdef CONFIG_WAKEUP_LATENCY_HIST
u64 preempt_timestamp_hist;
#endif
#endif /* CONFIG_TRACING */ #endif /* CONFIG_TRACING */
#ifdef CONFIG_PREEMPT_RT #ifdef CONFIG_PREEMPT_RT
/* /*
......
...@@ -143,7 +143,6 @@ config FUNCTION_GRAPH_TRACER ...@@ -143,7 +143,6 @@ config FUNCTION_GRAPH_TRACER
the return value. This is done by setting the current return the return value. This is done by setting the current return
address on the current task structure into a stack of calls. address on the current task structure into a stack of calls.
config IRQSOFF_TRACER config IRQSOFF_TRACER
bool "Interrupts-off Latency Tracer" bool "Interrupts-off Latency Tracer"
default n default n
...@@ -171,15 +170,15 @@ config INTERRUPT_OFF_HIST ...@@ -171,15 +170,15 @@ config INTERRUPT_OFF_HIST
bool "Interrupts-off Latency Histogram" bool "Interrupts-off Latency Histogram"
depends on IRQSOFF_TRACER depends on IRQSOFF_TRACER
help help
This option generates a continuously updated histogram (one per cpu) This option generates continuously updated histograms (one per cpu)
of the duration of time periods with interrupts disabled. The of the duration of time periods with interrupts disabled. The
histogram is disabled by default. To enable it, write a non-zero histograms are disabled by default. To enable them, write a non-zero
number to the related file in number to
/sys/kernel/debug/tracing/latency_hist/enable/preemptirqsoff /sys/kernel/debug/tracing/latency_hist/enable/preemptirqsoff
If PREEMPT_OFF_HIST is also selected, an additional histogram (one If PREEMPT_OFF_HIST is also selected, additional histograms (one
per cpu) is generated that accumulates the duration of time periods per cpu) are generated that accumulate the duration of time periods
when both interrupts and preemption are disabled. when both interrupts and preemption are disabled.
config PREEMPT_TRACER config PREEMPT_TRACER
...@@ -208,15 +207,15 @@ config PREEMPT_OFF_HIST ...@@ -208,15 +207,15 @@ config PREEMPT_OFF_HIST
bool "Preemption-off Latency Histogram" bool "Preemption-off Latency Histogram"
depends on PREEMPT_TRACER depends on PREEMPT_TRACER
help help
This option generates a continuously updated histogram (one per cpu) This option generates continuously updated histograms (one per cpu)
of the duration of time periods with preemption disabled. The of the duration of time periods with preemption disabled. The
histogram is disabled by default. To enable it, write a non-zero histograms are disabled by default. To enable them, write a non-zero
number to number to
/sys/kernel/debug/tracing/latency_hist/enable/preemptirqsoff /sys/kernel/debug/tracing/latency_hist/enable/preemptirqsoff
If INTERRUPT_OFF_HIST is also selected, an additional histogram (one If INTERRUPT_OFF_HIST is also selected, additional histograms (one
per cpu) is generated that accumulates the duration of time periods per cpu) are generated that accumulate the duration of time periods
when both interrupts and preemption are disabled. when both interrupts and preemption are disabled.
config SCHED_TRACER config SCHED_TRACER
...@@ -232,12 +231,20 @@ config WAKEUP_LATENCY_HIST ...@@ -232,12 +231,20 @@ config WAKEUP_LATENCY_HIST
bool "Scheduling Latency Histogram" bool "Scheduling Latency Histogram"
depends on SCHED_TRACER depends on SCHED_TRACER
help help
This option generates a continuously updated histogram (one per cpu) This option generates continuously updated histograms (one per cpu)
of the scheduling latency of the highest priority task. The histogram of the scheduling latency of the highest priority task.
is disabled by default. To enable it, write a non-zero number to The histograms are disabled by default. To enable them, write a
non-zero number to
/sys/kernel/debug/tracing/latency_hist/enable/wakeup /sys/kernel/debug/tracing/latency_hist/enable/wakeup
Two different algorithms are used, one to determine the latency of
processes that exclusively use the highest priority of the system and
another one to determine the latency of processes that share the
highest system priority with other processes. The former is used to
improve hardware and system software, the latter to optimize the
priority design of a given system.
config SYSPROF_TRACER config SYSPROF_TRACER
bool "Sysprof Tracer" bool "Sysprof Tracer"
depends on X86 depends on X86
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment