Commit cc2fa446 authored by Carsten Emde's avatar Carsten Emde Committed by Thomas Gleixner

tracing: Add histograms of potential and effective wakeup latencies

Resuscitated and enhanced the kernel latency histograms provided
originally by Yi Yang and adapted and converted by Steven Rostedt.

Latency histograms in the current version
- can be enabled online and independently
- have virtually no performance penalty when configured but not enabled
- have very little performance penalty when enabled
- use already available wakeup and switch tracepoints
- give corresponding results with the related tracer
- allow to record wakeup latency histograms of a single process
- record the process where the highest wakeup latency occurred 
- are documented in Documentation/trace/histograms.txt
Signed-off-by: default avatarCarsten Emde <C.Emde@osadl.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4AAEDDD5.4040505@osadl.org>
Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
parent edc1a1c3
Using the Linux Kernel Latency Histograms
This document gives a short explanation how to enable, configure and use
latency histograms. Latency histograms are primarily relevant in the
context of real-time enabled kernels (CONFIG_PREEMPT/CONFIG_PREEMPT_RT)
and are used in the quality management of the Linux real-time
capabilities.
* Purpose of latency histograms
A latency histogram continuously accumulates the frequencies of latency
data. There are two types of histograms
- potential sources of latencies
- effective latencies
* Potential sources of latencies
Potential sources of latencies are code segments where interrupts,
preemption or both are disabled (aka critical sections). To create
histograms of potential sources of latency, the kernel stores the time
stamp at the start of a critical section, determines the time elapsed
when the end of the section is reached, and increments the frequency
counter of that latency value - irrespective of whether any concurrently
running process is affected by latency or not.
- Configuration items (in the Kernel hacking/Tracers submenu)
CONFIG_INTERRUPT_OFF_LATENCY
CONFIG_PREEMPT_OFF_LATENCY
* Effective latencies
Effective latencies are actually occuring during wakeup of a process. To
determine effective latencies, the kernel stores the time stamp when a
process is scheduled to be woken up, and determines the duration of the
wakeup time shortly before control is passed over to this process. Note
that the apparent latency in user space may be considerably longer,
since
i) interrupts may be disabled preventing the scheduler from initiating
the wakeup mechanism, and
ii) the process may be interrupted after control is passed over to it
but before user space execution takes place.
- Configuration item (in the Kernel hacking/Tracers submenu)
CONFIG_WAKEUP_LATENCY
* Usage
The interface to the administration of the latency histograms is located
in the debugfs file system. To mount it, either enter
mount -t sysfs nodev /sys
mount -t debugfs nodev /sys/kernel/debug
from shell command line level, or add
nodev /sys sysfs defaults 0 0
nodev /sys/kernel/debug debugfs defaults 0 0
to the file /etc/fstab. All latency histogram related files are
available in the directory /sys/kernel/debug/tracing/latency_hist. A
particular histogram type is enabled by writing non-zero to the related
variable in the /sys/kernel/debug/tracing/latency_hist/enable directory.
Select "preemptirqsoff" for the histograms of potential sources of
latencies and "wakeup" for histograms of effective latencies. The
histogram data - one per CPU - are available in the files
/sys/kernel/debug/tracing/latency_hist/preemptoff/CPUx
/sys/kernel/debug/tracing/latency_hist/irqsoff/CPUx
/sys/kernel/debug/tracing/latency_hist/preemptirqsoff/CPUx
/sys/kernel/debug/tracing/latency_hist/wakeup/CPUx.
The histograms are reset by writing non-zero to the file "reset" in a
particular latency directory. To reset all latency data, use
#!/bin/sh
HISTDIR=/sys/kernel/debug/tracing/latency_hist
if test -d $HISTDIR
then
cd $HISTDIR
for i in */reset
do
echo 1 >$i
done
fi
* Data format
Latency data are stored with a resolution of one microsecond. The
maximum latency is 10,240 microseconds. The data are only valid, if the
overflow register is empty. Every output line contains the latency in
microseconds in the first row and the number of samples in the second
row. To display only lines with a positive latency count, use, for
example,
grep -v " 0$" /sys/kernel/debug/tracing/latency_hist/preemptoff/CPU0
#Minimum latency: 0 microseconds.
#Average latency: 0 microseconds.
#Maximum latency: 25 microseconds.
#Total samples: 3104770694
#There are 0 samples greater or equal than 10240 microseconds
#usecs samples
0 2984486876
1 49843506
2 58219047
3 5348126
4 2187960
5 3388262
6 959289
7 208294
8 40420
9 4485
10 14918
11 18340
12 25052
13 19455
14 5602
15 969
16 47
17 18
18 14
19 1
20 3
21 2
22 5
23 2
25 1
* Wakeup latency of a selected process
To only collect wakeup latency data of a particular process, write the
PID of the requested process to
/sys/kernel/debug/tracing/latency_hist/wakeup/pid.
PIDs are not considered, if this variable is set to 0.
* Details of the process with the highest wakeup latency so far
Selected data of the process that suffered from the highest wakeup
latency that occurred in a particular CPU are available in the file
/sys/kernel/debug/tracing/latency_hist/wakeup/max_latency-CPUx.
The format of the data is
<PID> <Priority> <Latency> <Command>
These data are also reset when the wakeup histogram ist reset.
...@@ -1969,6 +1969,7 @@ extern int sched_setscheduler_nocheck(struct task_struct *, int, ...@@ -1969,6 +1969,7 @@ extern int sched_setscheduler_nocheck(struct task_struct *, int,
extern struct task_struct *idle_task(int cpu); extern struct task_struct *idle_task(int cpu);
extern struct task_struct *curr_task(int cpu); extern struct task_struct *curr_task(int cpu);
extern void set_curr_task(int cpu, struct task_struct *p); extern void set_curr_task(int cpu, struct task_struct *p);
extern struct task_struct *rq_curr(struct rq *rq);
void yield(void); void yield(void);
void __yield(void); void __yield(void);
......
#undef TRACE_SYSTEM
#define TRACE_SYSTEM hist
#if !defined(_TRACE_HIST_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_HIST_H
#include "latency_hist.h"
#include <linux/tracepoint.h>
#if !defined(CONFIG_PREEMPT_OFF_HIST) && !defined(CONFIG_INTERRUPT_OFF_HIST)
#define trace_preemptirqsoff_hist(a,b)
#else
TRACE_EVENT(preemptirqsoff_hist,
TP_PROTO(int reason, int starthist),
TP_ARGS(reason, starthist),
TP_STRUCT__entry(
__field( int, reason )
__field( int, starthist )
),
TP_fast_assign(
__entry->reason = reason;
__entry->starthist = starthist;
),
TP_printk("reason=%s starthist=%s", getaction(__entry->reason),
__entry->starthist ? "start" : "stop")
);
#endif
#endif /* _TRACE_HIST_H */
/* This part must be outside protection */
#include <trace/define_trace.h>
#ifndef _LATENCY_HIST_H
#define _LATENCY_HIST_H
enum hist_action {
IRQS_ON,
PREEMPT_ON,
TRACE_STOP,
IRQS_OFF,
PREEMPT_OFF,
TRACE_START,
};
static char *actions[] = {
"IRQS_ON",
"PREEMPT_ON",
"TRACE_STOP",
"IRQS_OFF",
"PREEMPT_OFF",
"TRACE_START",
};
static inline char *getaction(int action)
{
if (action >= 0 && action <= sizeof(actions)/sizeof(actions[0]))
return(actions[action]);
return("unknown");
}
#endif /* _LATENCY_HIST_H */
...@@ -936,7 +936,7 @@ static int __init kernel_init(void * unused) ...@@ -936,7 +936,7 @@ static int __init kernel_init(void * unused)
WARN_ON(irqs_disabled()); WARN_ON(irqs_disabled());
#endif #endif
#define DEBUG_COUNT (defined(CONFIG_DEBUG_RT_MUTEXES) + defined(CONFIG_IRQSOFF_TRACER) + defined(CONFIG_PREEMPT_TRACER) + defined(CONFIG_STACK_TRACER) + defined(CONFIG_WAKEUP_LATENCY_HIST) + defined(CONFIG_DEBUG_SLAB) + defined(CONFIG_DEBUG_PAGEALLOC) + defined(CONFIG_LOCKDEP) + (defined(CONFIG_FTRACE) - defined(CONFIG_FTRACE_MCOUNT_RECORD))) #define DEBUG_COUNT (defined(CONFIG_DEBUG_RT_MUTEXES) + defined(CONFIG_IRQSOFF_TRACER) + defined(CONFIG_PREEMPT_TRACER) + defined(CONFIG_STACK_TRACER) + defined(CONFIG_INTERRUPT_OFF_HIST) + defined(CONFIG_PREEMPT_OFF_HIST) + defined(CONFIG_WAKEUP_LATENCY_HIST) + defined(CONFIG_DEBUG_SLAB) + defined(CONFIG_DEBUG_PAGEALLOC) + defined(CONFIG_LOCKDEP) + (defined(CONFIG_FTRACE) - defined(CONFIG_FTRACE_MCOUNT_RECORD)))
#if DEBUG_COUNT > 0 #if DEBUG_COUNT > 0
printk(KERN_ERR "*****************************************************************************\n"); printk(KERN_ERR "*****************************************************************************\n");
...@@ -956,9 +956,15 @@ static int __init kernel_init(void * unused) ...@@ -956,9 +956,15 @@ static int __init kernel_init(void * unused)
#ifdef CONFIG_PREEMPT_TRACER #ifdef CONFIG_PREEMPT_TRACER
printk(KERN_ERR "* CONFIG_PREEMPT_TRACER *\n"); printk(KERN_ERR "* CONFIG_PREEMPT_TRACER *\n");
#endif #endif
#ifdef CONFIG_FTRACE #if defined(CONFIG_FTRACE) && !defined(CONFIG_FTRACE_MCOUNT_RECORD)
printk(KERN_ERR "* CONFIG_FTRACE *\n"); printk(KERN_ERR "* CONFIG_FTRACE *\n");
#endif #endif
#ifdef CONFIG_INTERRUPT_OFF_HIST
printk(KERN_ERR "* CONFIG_INTERRUPT_OFF_HIST *\n");
#endif
#ifdef CONFIG_PREEMPT_OFF_HIST
printk(KERN_ERR "* CONFIG_PREEMPT_OFF_HIST *\n");
#endif
#ifdef CONFIG_WAKEUP_LATENCY_HIST #ifdef CONFIG_WAKEUP_LATENCY_HIST
printk(KERN_ERR "* CONFIG_WAKEUP_LATENCY_HIST *\n"); printk(KERN_ERR "* CONFIG_WAKEUP_LATENCY_HIST *\n");
#endif #endif
......
...@@ -717,6 +717,11 @@ struct rq { ...@@ -717,6 +717,11 @@ struct rq {
#endif #endif
}; };
struct task_struct *rq_curr(struct rq *rq)
{
return rq->curr;
}
static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
static inline void check_preempt_curr(struct rq *rq, struct task_struct *p, int sync) static inline void check_preempt_curr(struct rq *rq, struct task_struct *p, int sync)
......
...@@ -167,6 +167,21 @@ config IRQSOFF_TRACER ...@@ -167,6 +167,21 @@ config IRQSOFF_TRACER
enabled. This option and the preempt-off timing option can be enabled. This option and the preempt-off timing option can be
used together or separately.) used together or separately.)
config INTERRUPT_OFF_HIST
bool "Interrupts-off Latency Histogram"
depends on IRQSOFF_TRACER
help
This option generates a continuously updated histogram (one per cpu)
of the duration of time periods with interrupts disabled. The
histogram is disabled by default. To enable it, write a non-zero
number to the related file in
/sys/kernel/debug/tracing/latency_hist/enable/preemptirqsoff
If PREEMPT_OFF_HIST is also selected, an additional histogram (one
per cpu) is generated that accumulates the duration of time periods
when both interrupts and preemption are disabled.
config PREEMPT_TRACER config PREEMPT_TRACER
bool "Preemption-off Latency Tracer" bool "Preemption-off Latency Tracer"
default n default n
...@@ -189,14 +204,20 @@ config PREEMPT_TRACER ...@@ -189,14 +204,20 @@ config PREEMPT_TRACER
enabled. This option and the irqs-off timing option can be enabled. This option and the irqs-off timing option can be
used together or separately.) used together or separately.)
config SYSPROF_TRACER config PREEMPT_OFF_HIST
bool "Sysprof Tracer" bool "Preemption-off Latency Histogram"
depends on X86 depends on PREEMPT_TRACER
select GENERIC_TRACER
select CONTEXT_SWITCH_TRACER
help help
This tracer provides the trace needed by the 'Sysprof' userspace This option generates a continuously updated histogram (one per cpu)
tool. of the duration of time periods with preemption disabled. The
histogram is disabled by default. To enable it, write a non-zero
number to
/sys/kernel/debug/tracing/latency_hist/enable/preemptirqsoff
If INTERRUPT_OFF_HIST is also selected, an additional histogram (one
per cpu) is generated that accumulates the duration of time periods
when both interrupts and preemption are disabled.
config SCHED_TRACER config SCHED_TRACER
bool "Scheduling Latency Tracer" bool "Scheduling Latency Tracer"
...@@ -207,6 +228,25 @@ config SCHED_TRACER ...@@ -207,6 +228,25 @@ config SCHED_TRACER
This tracer tracks the latency of the highest priority task This tracer tracks the latency of the highest priority task
to be scheduled in, starting from the point it has woken up. to be scheduled in, starting from the point it has woken up.
config WAKEUP_LATENCY_HIST
bool "Scheduling Latency Histogram"
depends on SCHED_TRACER
help
This option generates a continuously updated histogram (one per cpu)
of the scheduling latency of the highest priority task. The histogram
is disabled by default. To enable it, write a non-zero number to
/sys/kernel/debug/tracing/latency_hist/enable/wakeup
config SYSPROF_TRACER
bool "Sysprof Tracer"
depends on X86
select GENERIC_TRACER
select CONTEXT_SWITCH_TRACER
help
This tracer provides the trace needed by the 'Sysprof' userspace
tool.
config ENABLE_DEFAULT_TRACERS config ENABLE_DEFAULT_TRACERS
bool "Trace process context switches and events" bool "Trace process context switches and events"
depends on !GENERIC_TRACER depends on !GENERIC_TRACER
......
...@@ -35,6 +35,9 @@ obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o ...@@ -35,6 +35,9 @@ obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o
obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
obj-$(CONFIG_INTERRUPT_OFF_HIST) += latency_hist.o
obj-$(CONFIG_PREEMPT_OFF_HIST) += latency_hist.o
obj-$(CONFIG_WAKEUP_LATENCY_HIST) += latency_hist.o
obj-$(CONFIG_NOP_TRACER) += trace_nop.o obj-$(CONFIG_NOP_TRACER) += trace_nop.o
obj-$(CONFIG_STACK_TRACER) += trace_stack.o obj-$(CONFIG_STACK_TRACER) += trace_stack.o
obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o
......
This diff is collapsed.
...@@ -17,6 +17,7 @@ ...@@ -17,6 +17,7 @@
#include <linux/fs.h> #include <linux/fs.h>
#include "trace.h" #include "trace.h"
#include <trace/events/hist.h>
static struct trace_array *irqsoff_trace __read_mostly; static struct trace_array *irqsoff_trace __read_mostly;
static int tracer_enabled __read_mostly; static int tracer_enabled __read_mostly;
...@@ -247,11 +248,13 @@ void start_critical_timings(void) ...@@ -247,11 +248,13 @@ void start_critical_timings(void)
{ {
if (preempt_trace() || irq_trace()) if (preempt_trace() || irq_trace())
start_critical_timing(CALLER_ADDR0, CALLER_ADDR1); start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
trace_preemptirqsoff_hist(TRACE_START, 1);
} }
EXPORT_SYMBOL_GPL(start_critical_timings); EXPORT_SYMBOL_GPL(start_critical_timings);
void stop_critical_timings(void) void stop_critical_timings(void)
{ {
trace_preemptirqsoff_hist(TRACE_STOP, 0);
if (preempt_trace() || irq_trace()) if (preempt_trace() || irq_trace())
stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1); stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
} }
...@@ -261,6 +264,7 @@ EXPORT_SYMBOL_GPL(stop_critical_timings); ...@@ -261,6 +264,7 @@ EXPORT_SYMBOL_GPL(stop_critical_timings);
#ifdef CONFIG_PROVE_LOCKING #ifdef CONFIG_PROVE_LOCKING
void time_hardirqs_on(unsigned long a0, unsigned long a1) void time_hardirqs_on(unsigned long a0, unsigned long a1)
{ {
trace_preemptirqsoff_hist(IRQS_ON, 0);
if (!preempt_trace() && irq_trace()) if (!preempt_trace() && irq_trace())
stop_critical_timing(a0, a1); stop_critical_timing(a0, a1);
} }
...@@ -269,6 +273,7 @@ void time_hardirqs_off(unsigned long a0, unsigned long a1) ...@@ -269,6 +273,7 @@ void time_hardirqs_off(unsigned long a0, unsigned long a1)
{ {
if (!preempt_trace() && irq_trace()) if (!preempt_trace() && irq_trace())
start_critical_timing(a0, a1); start_critical_timing(a0, a1);
trace_preemptirqsoff_hist(IRQS_OFF, 1);
} }
#else /* !CONFIG_PROVE_LOCKING */ #else /* !CONFIG_PROVE_LOCKING */
...@@ -302,6 +307,7 @@ inline void print_irqtrace_events(struct task_struct *curr) ...@@ -302,6 +307,7 @@ inline void print_irqtrace_events(struct task_struct *curr)
*/ */
void trace_hardirqs_on(void) void trace_hardirqs_on(void)
{ {
trace_preemptirqsoff_hist(IRQS_ON, 0);
if (!preempt_trace() && irq_trace()) if (!preempt_trace() && irq_trace())
stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1); stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
} }
...@@ -311,11 +317,13 @@ void trace_hardirqs_off(void) ...@@ -311,11 +317,13 @@ void trace_hardirqs_off(void)
{ {
if (!preempt_trace() && irq_trace()) if (!preempt_trace() && irq_trace())
start_critical_timing(CALLER_ADDR0, CALLER_ADDR1); start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
trace_preemptirqsoff_hist(IRQS_OFF, 1);
} }
EXPORT_SYMBOL(trace_hardirqs_off); EXPORT_SYMBOL(trace_hardirqs_off);
void trace_hardirqs_on_caller(unsigned long caller_addr) void trace_hardirqs_on_caller(unsigned long caller_addr)
{ {
trace_preemptirqsoff_hist(IRQS_ON, 0);
if (!preempt_trace() && irq_trace()) if (!preempt_trace() && irq_trace())
stop_critical_timing(CALLER_ADDR0, caller_addr); stop_critical_timing(CALLER_ADDR0, caller_addr);
} }
...@@ -325,6 +333,7 @@ void trace_hardirqs_off_caller(unsigned long caller_addr) ...@@ -325,6 +333,7 @@ void trace_hardirqs_off_caller(unsigned long caller_addr)
{ {
if (!preempt_trace() && irq_trace()) if (!preempt_trace() && irq_trace())
start_critical_timing(CALLER_ADDR0, caller_addr); start_critical_timing(CALLER_ADDR0, caller_addr);
trace_preemptirqsoff_hist(IRQS_OFF, 1);
} }
EXPORT_SYMBOL(trace_hardirqs_off_caller); EXPORT_SYMBOL(trace_hardirqs_off_caller);
...@@ -334,12 +343,14 @@ EXPORT_SYMBOL(trace_hardirqs_off_caller); ...@@ -334,12 +343,14 @@ EXPORT_SYMBOL(trace_hardirqs_off_caller);
#ifdef CONFIG_PREEMPT_TRACER #ifdef CONFIG_PREEMPT_TRACER
void trace_preempt_on(unsigned long a0, unsigned long a1) void trace_preempt_on(unsigned long a0, unsigned long a1)
{ {
trace_preemptirqsoff_hist(PREEMPT_ON, 0);
if (preempt_trace()) if (preempt_trace())
stop_critical_timing(a0, a1); stop_critical_timing(a0, a1);
} }
void trace_preempt_off(unsigned long a0, unsigned long a1) void trace_preempt_off(unsigned long a0, unsigned long a1)
{ {
trace_preemptirqsoff_hist(PREEMPT_OFF, 1);
if (preempt_trace()) if (preempt_trace())
start_critical_timing(a0, a1); start_critical_timing(a0, a1);
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment