Commit a79155b7 authored by Thomas Gleixner's avatar Thomas Gleixner

Merge commit 'origin/tracing/core' into rt/trace

parents 75977eeb 60ba7702
......@@ -2480,6 +2480,11 @@ and is between 256 and 4096 characters. It is defined in the file
trace_buf_size=nn[KMG]
[FTRACE] will set tracing buffer size.
trace_event=[event-list]
[FTRACE] Set and start specified trace events in order
to facilitate early boot debugging.
See also Documentation/trace/events.txt
trix= [HW,OSS] MediaTrix AudioTrix Pro
Format:
<io>,<irq>,<dma>,<dma2>,<sb_io>,<sb_irq>,<sb_dma>,<mpu_io>,<mpu_irq>
......
Event Tracing
Documentation written by Theodore Ts'o
Updated by Li Zefan
Updated by Li Zefan and Tom Zanussi
1. Introduction
===============
......@@ -83,8 +83,199 @@ When reading one of these enable files, there are four results:
X - there is a mixture of events enabled and disabled
? - this file does not affect any event
2.3 Boot option
---------------
In order to facilitate early boot debugging, use boot option:
trace_event=[event-list]
The format of this boot option is the same as described in section 2.1.
3. Defining an event-enabled tracepoint
=======================================
See The example provided in samples/trace_events
4. Event formats
================
Each trace event has a 'format' file associated with it that contains
a description of each field in a logged event. This information can
be used to parse the binary trace stream, and is also the place to
find the field names that can be used in event filters (see section 5).
It also displays the format string that will be used to print the
event in text mode, along with the event name and ID used for
profiling.
Every event has a set of 'common' fields associated with it; these are
the fields prefixed with 'common_'. The other fields vary between
events and correspond to the fields defined in the TRACE_EVENT
definition for that event.
Each field in the format has the form:
field:field-type field-name; offset:N; size:N;
where offset is the offset of the field in the trace record and size
is the size of the data item, in bytes.
For example, here's the information displayed for the 'sched_wakeup'
event:
# cat /debug/tracing/events/sched/sched_wakeup/format
name: sched_wakeup
ID: 60
format:
field:unsigned short common_type; offset:0; size:2;
field:unsigned char common_flags; offset:2; size:1;
field:unsigned char common_preempt_count; offset:3; size:1;
field:int common_pid; offset:4; size:4;
field:int common_tgid; offset:8; size:4;
field:char comm[TASK_COMM_LEN]; offset:12; size:16;
field:pid_t pid; offset:28; size:4;
field:int prio; offset:32; size:4;
field:int success; offset:36; size:4;
field:int cpu; offset:40; size:4;
print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid,
REC->prio, REC->success, REC->cpu
This event contains 10 fields, the first 5 common and the remaining 5
event-specific. All the fields for this event are numeric, except for
'comm' which is a string, a distinction important for event filtering.
5. Event filtering
==================
Trace events can be filtered in the kernel by associating boolean
'filter expressions' with them. As soon as an event is logged into
the trace buffer, its fields are checked against the filter expression
associated with that event type. An event with field values that
'match' the filter will appear in the trace output, and an event whose
values don't match will be discarded. An event with no filter
associated with it matches everything, and is the default when no
filter has been set for an event.
5.1 Expression syntax
---------------------
A filter expression consists of one or more 'predicates' that can be
combined using the logical operators '&&' and '||'. A predicate is
simply a clause that compares the value of a field contained within a
logged event with a constant value and returns either 0 or 1 depending
on whether the field value matched (1) or didn't match (0):
field-name relational-operator value
Parentheses can be used to provide arbitrary logical groupings and
double-quotes can be used to prevent the shell from interpreting
operators as shell metacharacters.
The field-names available for use in filters can be found in the
'format' files for trace events (see section 4).
The relational-operators depend on the type of the field being tested:
The operators available for numeric fields are:
==, !=, <, <=, >, >=
And for string fields they are:
==, !=
Currently, only exact string matches are supported.
Currently, the maximum number of predicates in a filter is 16.
5.2 Setting filters
-------------------
A filter for an individual event is set by writing a filter expression
to the 'filter' file for the given event.
For example:
# cd /debug/tracing/events/sched/sched_wakeup
# echo "common_preempt_count > 4" > filter
A slightly more involved example:
# cd /debug/tracing/events/sched/sched_signal_send
# echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter
If there is an error in the expression, you'll get an 'Invalid
argument' error when setting it, and the erroneous string along with
an error message can be seen by looking at the filter e.g.:
# cd /debug/tracing/events/sched/sched_signal_send
# echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter
-bash: echo: write error: Invalid argument
# cat filter
((sig >= 10 && sig < 15) || dsig == 17) && comm != bash
^
parse_error: Field not found
Currently the caret ('^') for an error always appears at the beginning of
the filter string; the error message should still be useful though
even without more accurate position info.
5.3 Clearing filters
--------------------
To clear the filter for an event, write a '0' to the event's filter
file.
To clear the filters for all events in a subsystem, write a '0' to the
subsystem's filter file.
5.3 Subsystem filters
---------------------
For convenience, filters for every event in a subsystem can be set or
cleared as a group by writing a filter expression into the filter file
at the root of the subsytem. Note however, that if a filter for any
event within the subsystem lacks a field specified in the subsystem
filter, or if the filter can't be applied for any other reason, the
filter for that event will retain its previous setting. This can
result in an unintended mixture of filters which could lead to
confusing (to the user who might think different filters are in
effect) trace output. Only filters that reference just the common
fields can be guaranteed to propagate successfully to all events.
Here are a few subsystem filter examples that also illustrate the
above points:
Clear the filters on all events in the sched subsytem:
# cd /sys/kernel/debug/tracing/events/sched
# echo 0 > filter
# cat sched_switch/filter
none
# cat sched_wakeup/filter
none
Set a filter using only common fields for all events in the sched
subsytem (all events end up with the same filter):
# cd /sys/kernel/debug/tracing/events/sched
# echo common_pid == 0 > filter
# cat sched_switch/filter
common_pid == 0
# cat sched_wakeup/filter
common_pid == 0
Attempt to set a filter using a non-common field for all events in the
sched subsytem (all events but those that have a prev_pid field retain
their old filters):
# cd /sys/kernel/debug/tracing/events/sched
# echo prev_pid == 0 > filter
# cat sched_switch/filter
prev_pid == 0
# cat sched_wakeup/filter
common_pid == 0
......@@ -85,26 +85,19 @@ of ftrace. Here is a list of some of the key files:
This file holds the output of the trace in a human
readable format (described below).
latency_trace:
This file shows the same trace but the information
is organized more to display possible latencies
in the system (described below).
trace_pipe:
The output is the same as the "trace" file but this
file is meant to be streamed with live tracing.
Reads from this file will block until new data
is retrieved. Unlike the "trace" and "latency_trace"
files, this file is a consumer. This means reading
from this file causes sequential reads to display
more current data. Once data is read from this
file, it is consumed, and will not be read
again with a sequential read. The "trace" and
"latency_trace" files are static, and if the
tracer is not adding more data, they will display
the same information every time they are read.
Reads from this file will block until new data is
retrieved. Unlike the "trace" file, this file is a
consumer. This means reading from this file causes
sequential reads to display more current data. Once
data is read from this file, it is consumed, and
will not be read again with a sequential read. The
"trace" file is static, and if the tracer is not
adding more data,they will display the same
information every time they are read.
trace_options:
......@@ -117,10 +110,10 @@ of ftrace. Here is a list of some of the key files:
Some of the tracers record the max latency.
For example, the time interrupts are disabled.
This time is saved in this file. The max trace
will also be stored, and displayed by either
"trace" or "latency_trace". A new max trace will
only be recorded if the latency is greater than
the value in this file. (in microseconds)
will also be stored, and displayed by "trace".
A new max trace will only be recorded if the
latency is greater than the value in this
file. (in microseconds)
buffer_size_kb:
......@@ -210,7 +203,7 @@ Here is the list of current tracers that may be configured.
the trace with the longest max latency.
See tracing_max_latency. When a new max is recorded,
it replaces the old trace. It is best to view this
trace via the latency_trace file.
trace with the latency-format option enabled.
"preemptoff"
......@@ -307,8 +300,8 @@ the lowest priority thread (pid 0).
Latency trace format
--------------------
For traces that display latency times, the latency_trace file
gives somewhat more information to see why a latency happened.
When the latency-format option is enabled, the trace file gives
somewhat more information to see why a latency happened.
Here is a typical trace.
# tracer: irqsoff
......@@ -380,9 +373,10 @@ explains which is which.
The above is mostly meaningful for kernel developers.
time: This differs from the trace file output. The trace file output
includes an absolute timestamp. The timestamp used by the
latency_trace file is relative to the start of the trace.
time: When the latency-format option is enabled, the trace file
output includes a timestamp relative to the start of the
trace. This differs from the output when latency-format
is disabled, which includes an absolute timestamp.
delay: This is just to help catch your eye a bit better. And
needs to be fixed to be only relative to the same CPU.
......@@ -440,7 +434,8 @@ Here are the available options:
sym-addr:
bash-4000 [01] 1477.606694: simple_strtoul <c0339346>
verbose - This deals with the latency_trace file.
verbose - This deals with the trace file when the
latency-format option is enabled.
bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \
(+0.000ms): simple_strtoul (strict_strtoul)
......@@ -472,7 +467,7 @@ Here are the available options:
the app is no longer running
The lookup is performed when you read
trace,trace_pipe,latency_trace. Example:
trace,trace_pipe. Example:
a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
......@@ -481,6 +476,11 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
every scheduling event. Will add overhead if
there's a lot of tasks running at once.
latency-format - This option changes the trace. When
it is enabled, the trace displays
additional information about the
latencies, as described in "Latency
trace format".
sched_switch
------------
......@@ -596,12 +596,13 @@ To reset the maximum, echo 0 into tracing_max_latency. Here is
an example:
# echo irqsoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# ls -ltr
[...]
# echo 0 > tracing_enabled
# cat latency_trace
# cat trace
# tracer: irqsoff
#
irqsoff latency trace v1.1.5 on 2.6.26
......@@ -703,12 +704,13 @@ which preemption was disabled. The control of preemptoff tracer
is much like the irqsoff tracer.
# echo preemptoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# ls -ltr
[...]
# echo 0 > tracing_enabled
# cat latency_trace
# cat trace
# tracer: preemptoff
#
preemptoff latency trace v1.1.5 on 2.6.26-rc8
......@@ -850,12 +852,13 @@ Again, using this trace is much like the irqsoff and preemptoff
tracers.
# echo preemptirqsoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# ls -ltr
[...]
# echo 0 > tracing_enabled
# cat latency_trace
# cat trace
# tracer: preemptirqsoff
#
preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
......@@ -1012,11 +1015,12 @@ Instead of performing an 'ls', we will run 'sleep 1' under
'chrt' which changes the priority of the task.
# echo wakeup > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# chrt -f 5 sleep 1
# echo 0 > tracing_enabled
# cat latency_trace
# cat trace
# tracer: wakeup
#
wakeup latency trace v1.1.5 on 2.6.26-rc8
......
" Enable folding for ftrace function_graph traces.
"
" To use, :source this file while viewing a function_graph trace, or use vim's
" -S option to load from the command-line together with a trace. You can then
" use the usual vim fold commands, such as "za", to open and close nested
" functions. While closed, a fold will show the total time taken for a call,
" as would normally appear on the line with the closing brace. Folded
" functions will not include finish_task_switch(), so folding should remain
" relatively sane even through a context switch.
"
" Note that this will almost certainly only work well with a
" single-CPU trace (e.g. trace-cmd report --cpu 1).
function! FunctionGraphFoldExpr(lnum)
let line = getline(a:lnum)
if line[-1:] == '{'
if line =~ 'finish_task_switch() {$'
return '>1'
endif
return 'a1'
elseif line[-1:] == '}'
return 's1'
else
return '='
endif
endfunction
function! FunctionGraphFoldText()
let s = split(getline(v:foldstart), '|', 1)
if getline(v:foldend+1) =~ 'finish_task_switch() {$'
let s[2] = ' task switch '
else
let e = split(getline(v:foldend), '|', 1)
let s[2] = e[2]
endif
return join(s, '|')
endfunction
setlocal foldexpr=FunctionGraphFoldExpr(v:lnum)
setlocal foldtext=FunctionGraphFoldText()
setlocal foldcolumn=12
setlocal foldmethod=expr
This diff is collapsed.
......@@ -84,7 +84,7 @@ config S390
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_TRACE_MCOUNT_TEST
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FTRACE_SYSCALLS
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_DYNAMIC_FTRACE
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_DEFAULT_NO_SPIN_MUTEXES
......
......@@ -900,7 +900,7 @@ CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_FTRACE_SYSCALLS=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_FUNCTION_TRACER is not set
......
......@@ -92,7 +92,7 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_SYSCALL_TRACE 8 /* syscall trace active */
#define TIF_SYSCALL_AUDIT 9 /* syscall auditing active */
#define TIF_SECCOMP 10 /* secure computing */
#define TIF_SYSCALL_FTRACE 11 /* ftrace syscall instrumentation */
#define TIF_SYSCALL_TRACEPOINT 11 /* syscall tracepoint instrumentation */
#define TIF_USEDFPU 16 /* FPU was used by this task this quantum (SMP) */
#define TIF_POLLING_NRFLAG 17 /* true if poll_idle() is polling
TIF_NEED_RESCHED */
......@@ -111,7 +111,7 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)
#define _TIF_SYSCALL_AUDIT (1<<TIF_SYSCALL_AUDIT)
#define _TIF_SECCOMP (1<<TIF_SECCOMP)
#define _TIF_SYSCALL_FTRACE (1<<TIF_SYSCALL_FTRACE)
#define _TIF_SYSCALL_TRACEPOINT (1<<TIF_SYSCALL_TRACEPOINT)
#define _TIF_USEDFPU (1<<TIF_USEDFPU)
#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)
#define _TIF_31BIT (1<<TIF_31BIT)
......
......@@ -54,7 +54,7 @@ _TIF_WORK_SVC = (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED | \
_TIF_WORK_INT = (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED | \
_TIF_MCCK_PENDING)
_TIF_SYSCALL = (_TIF_SYSCALL_TRACE>>8 | _TIF_SYSCALL_AUDIT>>8 | \
_TIF_SECCOMP>>8 | _TIF_SYSCALL_FTRACE>>8)
_TIF_SECCOMP>>8 | _TIF_SYSCALL_TRACEPOINT>>8)
STACK_SHIFT = PAGE_SHIFT + THREAD_ORDER
STACK_SIZE = 1 << STACK_SHIFT
......
......@@ -57,7 +57,7 @@ _TIF_WORK_SVC = (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED | \
_TIF_WORK_INT = (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED | \
_TIF_MCCK_PENDING)
_TIF_SYSCALL = (_TIF_SYSCALL_TRACE>>8 | _TIF_SYSCALL_AUDIT>>8 | \
_TIF_SECCOMP>>8 | _TIF_SYSCALL_FTRACE>>8)
_TIF_SECCOMP>>8 | _TIF_SYSCALL_TRACEPOINT>>8)
#define BASED(name) name-system_call(%r13)
......
......@@ -220,6 +220,29 @@ struct syscall_metadata *syscall_nr_to_meta(int nr)
return syscalls_metadata[nr];
}
int syscall_name_to_nr(char *name)
{
int i;
if (!syscalls_metadata)
return -1;
for (i = 0; i < NR_syscalls; i++)
if (syscalls_metadata[i])
if (!strcmp(syscalls_metadata[i]->name, name))
return i;
return -1;
}
void set_syscall_enter_id(int num, int id)
{
syscalls_metadata[num]->enter_id = id;
}
void set_syscall_exit_id(int num, int id)
{
syscalls_metadata[num]->exit_id = id;
}
static struct syscall_metadata *find_syscall_meta(unsigned long syscall)
{
struct syscall_metadata *start;
......@@ -237,24 +260,19 @@ static struct syscall_metadata *find_syscall_meta(unsigned long syscall)
return NULL;
}
void arch_init_ftrace_syscalls(void)
static int __init arch_init_ftrace_syscalls(void)
{
struct syscall_metadata *meta;
int i;
static atomic_t refs;
if (atomic_inc_return(&refs) != 1)
goto out;
syscalls_metadata = kzalloc(sizeof(*syscalls_metadata) * NR_syscalls,
GFP_KERNEL);
if (!syscalls_metadata)
goto out;
return -ENOMEM;
for (i = 0; i < NR_syscalls; i++) {
meta = find_syscall_meta((unsigned long)sys_call_table[i]);
syscalls_metadata[i] = meta;
}
return;
out:
atomic_dec(&refs);
return 0;
}
arch_initcall(arch_init_ftrace_syscalls);
#endif
......@@ -51,6 +51,9 @@
#include "compat_ptrace.h"
#endif
#define CREATE_TRACE_POINTS
#include <trace/events/syscalls.h>
enum s390_regset {
REGSET_GENERAL,
REGSET_FP,
......@@ -661,8 +664,8 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
ret = -1;
}
if (unlikely(test_thread_flag(TIF_SYSCALL_FTRACE)))
ftrace_syscall_enter(regs);
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->gprs[2]);
if (unlikely(current->audit_context))
audit_syscall_entry(is_compat_task() ?
......@@ -679,8 +682,8 @@ asmlinkage void do_syscall_trace_exit(struct pt_regs *regs)
audit_syscall_exit(AUDITSC_RESULT(regs->gprs[2]),
regs->gprs[2]);
if (unlikely(test_thread_flag(TIF_SYSCALL_FTRACE)))
ftrace_syscall_exit(regs);
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_exit(regs, regs->gprs[2]);
if (test_thread_flag(TIF_SYSCALL_TRACE))
tracehook_report_syscall_exit(regs, 0);
......
......@@ -38,7 +38,7 @@ config X86
select HAVE_FUNCTION_GRAPH_FP_TEST
select HAVE_FUNCTION_TRACE_MCOUNT_TEST
select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE
select HAVE_FTRACE_SYSCALLS
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KVM
select HAVE_ARCH_KGDB
select HAVE_ARCH_TRACEHOOK
......
......@@ -2355,7 +2355,7 @@ CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_HW_BRANCH_TRACER=y
CONFIG_HAVE_FTRACE_SYSCALLS=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_RING_BUFFER=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y
......
......@@ -2329,7 +2329,7 @@ CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_HW_BRANCH_TRACER=y
CONFIG_HAVE_FTRACE_SYSCALLS=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_RING_BUFFER=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y
......
......@@ -28,13 +28,6 @@
#endif
/* FIXME: I don't want to stay hardcoded */
#ifdef CONFIG_X86_64
# define FTRACE_SYSCALL_MAX 296
#else
# define FTRACE_SYSCALL_MAX 333
#endif
#ifdef CONFIG_FUNCTION_TRACER
#define MCOUNT_ADDR ((long)(mcount))
#define MCOUNT_INSN_SIZE 5 /* sizeof mcount call */
......
......@@ -65,6 +65,8 @@
6: osp nopl 0x00(%eax,%eax,1)
7: nopl 0x00000000(%eax)
8: nopl 0x00000000(%eax,%eax,1)
Note: All the above are assumed to be a single instruction.
There is kernel code that depends on this.
*/
#define P6_NOP1 GENERIC_NOP1
#define P6_NOP2 ".byte 0x66,0x90\n"
......
......@@ -95,7 +95,7 @@ struct thread_info {
#define TIF_DEBUGCTLMSR 25 /* uses thread_struct.debugctlmsr */
#define TIF_DS_AREA_MSR 26 /* uses thread_struct.ds_area_msr */
#define TIF_LAZY_MMU_UPDATES 27 /* task is updating the mmu lazily */
#define TIF_SYSCALL_FTRACE 28 /* for ftrace syscall instrumentation */
#define TIF_SYSCALL_TRACEPOINT 28 /* syscall tracepoint instrumentation */
#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
......@@ -118,17 +118,17 @@ struct thread_info {
#define _TIF_DEBUGCTLMSR (1 << TIF_DEBUGCTLMSR)
#define _TIF_DS_AREA_MSR (1 << TIF_DS_AREA_MSR)
#define _TIF_LAZY_MMU_UPDATES (1 << TIF_LAZY_MMU_UPDATES)
#define _TIF_SYSCALL_FTRACE (1 << TIF_SYSCALL_FTRACE)
#define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT)
/* work to do in syscall_trace_enter() */
#define _TIF_WORK_SYSCALL_ENTRY \
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_FTRACE | \
_TIF_SYSCALL_AUDIT | _TIF_SECCOMP | _TIF_SINGLESTEP)
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT | \
_TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT)
/* work to do in syscall_trace_leave() */
#define _TIF_WORK_SYSCALL_EXIT \
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SINGLESTEP | \
_TIF_SYSCALL_FTRACE)
_TIF_SYSCALL_TRACEPOINT)
/* work to do on interrupt/exception return */
#define _TIF_WORK_MASK \
......@@ -137,7 +137,8 @@ struct thread_info {
_TIF_SINGLESTEP|_TIF_SECCOMP|_TIF_SYSCALL_EMU))
/* work to do on any return to user space */
#define _TIF_ALLWORK_MASK ((0x0000FFFF & ~_TIF_SECCOMP) | _TIF_SYSCALL_FTRACE)
#define _TIF_ALLWORK_MASK \
((0x0000FFFF & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT)
/* Only used for 64 bit */
#define _TIF_DO_NOTIFY_MASK \
......
......@@ -345,6 +345,8 @@
#ifdef __KERNEL__
#define NR_syscalls 337
#define __ARCH_WANT_IPC_PARSE_VERSION
#define __ARCH_WANT_OLD_READDIR
#define __ARCH_WANT_OLD_STAT
......
......@@ -688,6 +688,12 @@ __SYSCALL(__NR_perf_counter_open, sys_perf_counter_open)
#endif /* __NO_STUBS */
#ifdef __KERNEL__
#ifndef COMPILE_OFFSETS
#include <asm/asm-offsets.h>
#define NR_syscalls (__NR_syscall_max + 1)
#endif
/*
* "Conditional" syscalls
*
......
......@@ -3,6 +3,7 @@
* This code generates raw asm output which is post-processed to extract
* and format the required data.
*/
#define COMPILE_OFFSETS
#include <linux/crypto.h>
#include <linux/sched.h>
......
......@@ -146,7 +146,7 @@ ENTRY(ftrace_graph_caller)
END(ftrace_graph_caller)
GLOBAL(return_to_handler)
subq $80, %rsp
subq $24, %rsp
/* Save the return values */
movq %rax, (%rsp)
......@@ -155,10 +155,10 @@ GLOBAL(return_to_handler)
call ftrace_return_to_handler
movq %rax, 72(%rsp)
movq %rax, 16(%rsp)
movq 8(%rsp), %rdx
movq (%rsp), %rax
addq $72, %rsp
addq $16, %rsp
retq
#endif
......
......@@ -417,10 +417,6 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
unsigned long return_hooker = (unsigned long)
&return_to_handler;
/* Nmi's are currently unsupported */
if (unlikely(in_nmi()))
return;
if (unlikely(atomic_read(&current->tracing_graph_pause)))
return;
......@@ -498,37 +494,56 @@ static struct syscall_metadata *find_syscall_meta(unsigned long *syscall)
struct syscall_metadata *syscall_nr_to_meta(int nr)
{
if (!syscalls_metadata || nr >= FTRACE_SYSCALL_MAX || nr < 0)
if (!syscalls_metadata || nr >= NR_syscalls || nr < 0)
return NULL;
return syscalls_metadata[nr];
}
void arch_init_ftrace_syscalls(void)
int syscall_name_to_nr(char *name)
{
int i;
if (!syscalls_metadata)
return -1;
for (i = 0; i < NR_syscalls; i++) {
if (syscalls_metadata[i]) {
if (!strcmp(syscalls_metadata[i]->name, name))
return i;
}
}
return -1;
}
void set_syscall_enter_id(int num, int id)
{
syscalls_metadata[num]->enter_id = id;
}
void set_syscall_exit_id(int num, int id)
{
syscalls_metadata[num]->exit_id = id;
}
static int __init arch_init_ftrace_syscalls(void)
{
int i;
struct syscall_metadata *meta;
unsigned long **psys_syscall_table = &sys_call_table;
static atomic_t refs;
if (atomic_inc_return(&refs) != 1)
goto end;
syscalls_metadata = kzalloc(sizeof(*syscalls_metadata) *
FTRACE_SYSCALL_MAX, GFP_KERNEL);
NR_syscalls, GFP_KERNEL);
if (!syscalls_metadata) {
WARN_ON(1);
return;
return -ENOMEM;
}
for (i = 0; i < FTRACE_SYSCALL_MAX; i++) {
for (i = 0; i < NR_syscalls; i++) {
meta = find_syscall_meta(psys_syscall_table[i]);
syscalls_metadata[i] = meta;
}
return;
/* Paranoid: avoid overflow */
end:
atomic_dec(&refs);
return 0;
}
arch_initcall(arch_init_ftrace_syscalls);
#endif
......@@ -35,10 +35,11 @@
#include <asm/proto.h>
#include <asm/ds.h>
#include <trace/syscall.h>
#include "tls.h"
#define CREATE_TRACE_POINTS
#include <trace/events/syscalls.h>
enum x86_regset {
REGSET_GENERAL,
REGSET_FP,
......@@ -1497,8 +1498,8 @@ asmregparm long syscall_trace_enter(struct pt_regs *regs)
tracehook_report_syscall_entry(regs))
ret = -1L;
if (unlikely(test_thread_flag(TIF_SYSCALL_FTRACE)))
ftrace_syscall_enter(regs);
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->orig_ax);
if (unlikely(current->audit_context)) {
if (IS_IA32)
......@@ -1523,8 +1524,8 @@ asmregparm void syscall_trace_leave(struct pt_regs *regs)
if (unlikely(current->audit_context))
audit_syscall_exit(AUDITSC_RESULT(regs->ax), regs->ax);
if (unlikely(test_thread_flag(TIF_SYSCALL_FTRACE)))
ftrace_syscall_exit(regs);
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_exit(regs, regs->ax);
if (test_thread_flag(TIF_SYSCALL_TRACE))
tracehook_report_syscall_exit(regs, 0);
......
......@@ -18,9 +18,9 @@
#include <asm/ia32.h>
#include <asm/syscalls.h>
asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long off)
SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, unsigned long, off)
{
long error;
struct file *file;
......@@ -226,7 +226,7 @@ bottomup:
}
asmlinkage long sys_uname(struct new_utsname __user *name)
SYSCALL_DEFINE1(uname, struct new_utsname __user *, name)
{
int err;
down_read(&uts_sem);
......
......@@ -91,7 +91,8 @@
#endif
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
#define MCOUNT_REC() VMLINUX_SYMBOL(__start_mcount_loc) = .; \
#define MCOUNT_REC() . = ALIGN(8); \
VMLINUX_SYMBOL(__start_mcount_loc) = .; \
*(__mcount_loc) \
VMLINUX_SYMBOL(__stop_mcount_loc) = .;
#else
......@@ -331,7 +332,6 @@
/* __*init sections */ \
__init_rodata : AT(ADDR(__init_rodata) - LOAD_OFFSET) { \
*(.ref.rodata) \
MCOUNT_REC() \
DEV_KEEP(init.rodata) \
DEV_KEEP(exit.rodata) \
CPU_KEEP(init.rodata) \
......@@ -455,6 +455,7 @@
MEM_DISCARD(init.data) \
KERNEL_CTORS() \
*(.init.rodata) \
MCOUNT_REC() \
DEV_DISCARD(init.rodata) \
CPU_DISCARD(init.rodata) \
MEM_DISCARD(init.rodata)
......
#ifndef _LINUX_FTRACE_EVENT_H
#define _LINUX_FTRACE_EVENT_H
#include <linux/trace_seq.h>
#include <linux/ring_buffer.h>
#include <linux/trace_seq.h>
#include <linux/percpu.h>
struct trace_array;
......@@ -34,7 +34,7 @@ struct trace_entry {
unsigned char flags;
unsigned char preempt_count;
int pid;
int tgid;
int lock_depth;
};
#define FTRACE_MAX_EVENT \
......@@ -93,16 +93,22 @@ void tracing_generic_entry_update(struct trace_entry *entry,
unsigned long flags,
int pc);
struct ring_buffer_event *
trace_current_buffer_lock_reserve(int type, unsigned long len,
trace_current_buffer_lock_reserve(struct ring_buffer **current_buffer,
int type, unsigned long len,
unsigned long flags, int pc);
void trace_current_buffer_unlock_commit(struct ring_buffer_event *event,
void trace_current_buffer_unlock_commit(struct ring_buffer *buffer,
struct ring_buffer_event *event,
unsigned long flags, int pc);
void trace_nowake_buffer_unlock_commit(struct ring_buffer_event *event,
void trace_nowake_buffer_unlock_commit(struct ring_buffer *buffer,
struct ring_buffer_event *event,
unsigned long flags, int pc);
void trace_current_buffer_discard_commit(struct ring_buffer_event *event);
void trace_current_buffer_discard_commit(struct ring_buffer *buffer,
struct ring_buffer_event *event);
void tracing_record_cmdline(struct task_struct *tsk);
struct event_filter;
struct ftrace_event_call {
struct list_head list;
char *name;
......@@ -110,16 +116,18 @@ struct ftrace_event_call {
struct dentry *dir;
struct trace_event *event;
int enabled;
int (*regfunc)(void);
void (*unregfunc)(void);
int (*regfunc)(void *);
void (*unregfunc)(void *);
int id;
int (*raw_init)(void);
int (*show_format)(struct trace_seq *s);
int (*define_fields)(void);
int (*show_format)(struct ftrace_event_call *call,
struct trace_seq *s);
int (*define_fields)(struct ftrace_event_call *);
struct list_head fields;
int filter_active;
void *filter;
struct event_filter *filter;
void *mod;
void *data;
atomic_t profile_count;
int (*profile_enable)(struct ftrace_event_call *);
......@@ -127,17 +135,27 @@ struct ftrace_event_call {
};
#define MAX_FILTER_PRED 32
#define MAX_FILTER_STR_VAL 128
#define MAX_FILTER_STR_VAL 256 /* Should handle KSYM_SYMBOL_LEN */
extern int init_preds(struct ftrace_event_call *call);
extern void destroy_preds(struct ftrace_event_call *call);
extern int filter_match_preds(struct ftrace_event_call *call, void *rec);
extern int filter_current_check_discard(struct ftrace_event_call *call,
extern int filter_current_check_discard(struct ring_buffer *buffer,
struct ftrace_event_call *call,
void *rec,
struct ring_buffer_event *event);
extern int trace_define_field(struct ftrace_event_call *call, char *type,
char *name, int offset, int size, int is_signed);
enum {
FILTER_OTHER = 0,
FILTER_STATIC_STRING,
FILTER_DYN_STRING,
FILTER_PTR_STRING,
};
extern int trace_define_field(struct ftrace_event_call *call,
const char *type, const char *name,
int offset, int size, int is_signed,
int filter_type);
extern int trace_define_common_fields(struct ftrace_event_call *call);
#define is_signed_type(type) (((type)(-1)) < 0)
......@@ -162,11 +180,4 @@ do { \
__trace_printk(ip, fmt, ##args); \
} while (0)
#define __common_field(type, item, is_signed) \
ret = trace_define_field(event_call, #type, "common_" #item, \
offsetof(typeof(field.ent), item), \
sizeof(field.ent.item), is_signed); \
if (ret) \
return ret;
#endif /* _LINUX_FTRACE_EVENT_H */
......@@ -17,10 +17,12 @@
#include <linux/moduleparam.h>
#include <linux/marker.h>
#include <linux/tracepoint.h>
#include <asm/local.h>
#include <asm/local.h>
#include <asm/module.h>
#include <trace/events/module.h>
/* Not Yet Implemented */
#define MODULE_SUPPORTED_DEVICE(name)
......@@ -462,7 +464,10 @@ static inline local_t *__module_ref_addr(struct module *mod, int cpu)
static inline void __module_get(struct module *module)
{
if (module) {
local_inc(__module_ref_addr(module, get_cpu()));
unsigned int cpu = get_cpu();
local_inc(__module_ref_addr(module, cpu));
trace_module_get(module, _THIS_IP_,
local_read(__module_ref_addr(module, cpu)));
put_cpu();
}
}
......@@ -473,8 +478,11 @@ static inline int try_module_get(struct module *module)
if (module) {
unsigned int cpu = get_cpu();
if (likely(module_is_live(module)))
if (likely(module_is_live(module))) {
local_inc(__module_ref_addr(module, cpu));
trace_module_get(module, _THIS_IP_,
local_read(__module_ref_addr(module, cpu)));
}
else
ret = 0;
put_cpu();
......
......@@ -761,6 +761,8 @@ extern int sysctl_perf_counter_mlock;
extern int sysctl_perf_counter_sample_rate;
extern void perf_counter_init(void);
extern void perf_tpcounter_event(int event_id, u64 addr, u64 count,
void *record, int entry_size);
#ifndef perf_misc_flags
#define perf_misc_flags(regs) (user_mode(regs) ? PERF_EVENT_MISC_USER : \
......
......@@ -74,20 +74,6 @@ ring_buffer_event_time_delta(struct ring_buffer_event *event)
return event->time_delta;
}
/*
* ring_buffer_event_discard can discard any event in the ring buffer.
* it is up to the caller to protect against a reader from
* consuming it or a writer from wrapping and replacing it.
*
* No external protection is needed if this is called before
* the event is commited. But in that case it would be better to
* use ring_buffer_discard_commit.
*
* Note, if an event that has not been committed is discarded
* with ring_buffer_event_discard, it must still be committed.
*/
void ring_buffer_event_discard(struct ring_buffer_event *event);
/*
* ring_buffer_discard_commit will remove an event that has not
* ben committed yet. If this is used, then ring_buffer_unlock_commit
......@@ -154,8 +140,17 @@ unsigned long ring_buffer_size(struct ring_buffer *buffer);
void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu);
void ring_buffer_reset(struct ring_buffer *buffer);
#ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
struct ring_buffer *buffer_b, int cpu);
#else
static inline int
ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
struct ring_buffer *buffer_b, int cpu)
{
return -ENODEV;
}
#endif
int ring_buffer_empty(struct ring_buffer *buffer);
int ring_buffer_empty_cpu(struct ring_buffer *buffer, int cpu);
......@@ -170,7 +165,6 @@ unsigned long ring_buffer_overruns(struct ring_buffer *buffer);
unsigned long ring_buffer_entries_cpu(struct ring_buffer *buffer, int cpu);
unsigned long ring_buffer_overrun_cpu(struct ring_buffer *buffer, int cpu);
unsigned long ring_buffer_commit_overrun_cpu(struct ring_buffer *buffer, int cpu);
unsigned long ring_buffer_nmi_dropped_cpu(struct ring_buffer *buffer, int cpu);
u64 ring_buffer_time_stamp(struct ring_buffer *buffer, int cpu);
void ring_buffer_normalize_time_stamp(struct ring_buffer *buffer,
......
......@@ -64,6 +64,7 @@ struct perf_counter_attr;
#include <linux/sem.h>
#include <asm/siginfo.h>
#include <asm/signal.h>
#include <linux/unistd.h>
#include <linux/quota.h>
#include <linux/key.h>
#include <trace/syscall.h>
......@@ -97,6 +98,53 @@ struct perf_counter_attr;
#define __SC_TEST5(t5, a5, ...) __SC_TEST(t5); __SC_TEST4(__VA_ARGS__)
#define __SC_TEST6(t6, a6, ...) __SC_TEST(t6); __SC_TEST5(__VA_ARGS__)
#ifdef CONFIG_EVENT_PROFILE
#define TRACE_SYS_ENTER_PROFILE(sname) \
static int prof_sysenter_enable_##sname(struct ftrace_event_call *event_call) \
{ \
int ret = 0; \
if (!atomic_inc_return(&event_enter_##sname.profile_count)) \
ret = reg_prof_syscall_enter("sys"#sname); \
return ret; \
} \
\
static void prof_sysenter_disable_##sname(struct ftrace_event_call *event_call)\
{ \
if (atomic_add_negative(-1, &event_enter_##sname.profile_count)) \
unreg_prof_syscall_enter("sys"#sname); \
}
#define TRACE_SYS_EXIT_PROFILE(sname) \
static int prof_sysexit_enable_##sname(struct ftrace_event_call *event_call) \
{ \
int ret = 0; \
if (!atomic_inc_return(&event_exit_##sname.profile_count)) \
ret = reg_prof_syscall_exit("sys"#sname); \
return ret; \
} \
\
static void prof_sysexit_disable_##sname(struct ftrace_event_call *event_call) \
{ \
if (atomic_add_negative(-1, &event_exit_##sname.profile_count)) \
unreg_prof_syscall_exit("sys"#sname); \
}
#define TRACE_SYS_ENTER_PROFILE_INIT(sname) \
.profile_count = ATOMIC_INIT(-1), \
.profile_enable = prof_sysenter_enable_##sname, \
.profile_disable = prof_sysenter_disable_##sname,
#define TRACE_SYS_EXIT_PROFILE_INIT(sname) \
.profile_count = ATOMIC_INIT(-1), \
.profile_enable = prof_sysexit_enable_##sname, \
.profile_disable = prof_sysexit_disable_##sname,
#else
#define TRACE_SYS_ENTER_PROFILE(sname)
#define TRACE_SYS_ENTER_PROFILE_INIT(sname)
#define TRACE_SYS_EXIT_PROFILE(sname)
#define TRACE_SYS_EXIT_PROFILE_INIT(sname)
#endif
#ifdef CONFIG_FTRACE_SYSCALLS
#define __SC_STR_ADECL1(t, a) #a
#define __SC_STR_ADECL2(t, a, ...) #a, __SC_STR_ADECL1(__VA_ARGS__)
......@@ -112,7 +160,81 @@ struct perf_counter_attr;
#define __SC_STR_TDECL5(t, a, ...) #t, __SC_STR_TDECL4(__VA_ARGS__)
#define __SC_STR_TDECL6(t, a, ...) #t, __SC_STR_TDECL5(__VA_ARGS__)
#define SYSCALL_TRACE_ENTER_EVENT(sname) \
static struct ftrace_event_call event_enter_##sname; \
struct trace_event enter_syscall_print_##sname = { \
.trace = print_syscall_enter, \
}; \
static int init_enter_##sname(void) \
{ \
int num, id; \
num = syscall_name_to_nr("sys"#sname); \
if (num < 0) \
return -ENOSYS; \
id = register_ftrace_event(&enter_syscall_print_##sname);\
if (!id) \
return -ENODEV; \
event_enter_##sname.id = id; \
set_syscall_enter_id(num, id); \
INIT_LIST_HEAD(&event_enter_##sname.fields); \
return 0; \
} \
TRACE_SYS_ENTER_PROFILE(sname); \
static struct ftrace_event_call __used \
__attribute__((__aligned__(4))) \
__attribute__((section("_ftrace_events"))) \
event_enter_##sname = { \
.name = "sys_enter"#sname, \
.system = "syscalls", \
.event = &event_syscall_enter, \
.raw_init = init_enter_##sname, \
.show_format = syscall_enter_format, \
.define_fields = syscall_enter_define_fields, \
.regfunc = reg_event_syscall_enter, \
.unregfunc = unreg_event_syscall_enter, \
.data = "sys"#sname, \
TRACE_SYS_ENTER_PROFILE_INIT(sname) \
}
#define SYSCALL_TRACE_EXIT_EVENT(sname) \
static struct ftrace_event_call event_exit_##sname; \
struct trace_event exit_syscall_print_##sname = { \
.trace = print_syscall_exit, \
}; \
static int init_exit_##sname(void) \
{ \
int num, id; \
num = syscall_name_to_nr("sys"#sname); \
if (num < 0) \
return -ENOSYS; \
id = register_ftrace_event(&exit_syscall_print_##sname);\
if (!id) \
return -ENODEV; \
event_exit_##sname.id = id; \
set_syscall_exit_id(num, id); \
INIT_LIST_HEAD(&event_exit_##sname.fields); \
return 0; \
} \
TRACE_SYS_EXIT_PROFILE(sname); \
static struct ftrace_event_call __used \
__attribute__((__aligned__(4))) \
__attribute__((section("_ftrace_events"))) \
event_exit_##sname = { \
.name = "sys_exit"#sname, \
.system = "syscalls", \
.event = &event_syscall_exit, \
.raw_init = init_exit_##sname, \
.show_format = syscall_exit_format, \
.define_fields = syscall_exit_define_fields, \
.regfunc = reg_event_syscall_exit, \
.unregfunc = unreg_event_syscall_exit, \
.data = "sys"#sname, \
TRACE_SYS_EXIT_PROFILE_INIT(sname) \
}
#define SYSCALL_METADATA(sname, nb) \
SYSCALL_TRACE_ENTER_EVENT(sname); \
SYSCALL_TRACE_EXIT_EVENT(sname); \
static const struct syscall_metadata __used \
__attribute__((__aligned__(4))) \
__attribute__((section("__syscalls_metadata"))) \
......@@ -121,18 +243,23 @@ struct perf_counter_attr;
.nb_args = nb, \
.types = types_##sname, \
.args = args_##sname, \
}
.enter_event = &event_enter_##sname, \
.exit_event = &event_exit_##sname, \
};
#define SYSCALL_DEFINE0(sname) \
SYSCALL_TRACE_ENTER_EVENT(_##sname); \
SYSCALL_TRACE_EXIT_EVENT(_##sname); \
static const struct syscall_metadata __used \
__attribute__((__aligned__(4))) \
__attribute__((section("__syscalls_metadata"))) \
__syscall_meta_##sname = { \
.name = "sys_"#sname, \
.nb_args = 0, \
.enter_event = &event_enter__##sname, \
.exit_event = &event_exit__##sname, \
}; \
asmlinkage long sys_##sname(void)
#else
#define SYSCALL_DEFINE0(name) asmlinkage long sys_##name(void)
#endif
......
......@@ -23,6 +23,8 @@ struct tracepoint;
struct tracepoint {
const char *name; /* Tracepoint name */
int state; /* State. */
void (*regfunc)(void);
void (*unregfunc)(void);
void **funcs;
} __attribute__((aligned(32))); /*
* Aligned on 32 bytes because it is
......@@ -78,12 +80,16 @@ struct tracepoint {
return tracepoint_probe_unregister(#name, (void *)probe);\
}
#define DEFINE_TRACE(name) \
#define DEFINE_TRACE_FN(name, reg, unreg) \
static const char __tpstrtab_##name[] \
__attribute__((section("__tracepoints_strings"))) = #name; \
struct tracepoint __tracepoint_##name \
__attribute__((section("__tracepoints"), aligned(32))) = \
{ __tpstrtab_##name, 0, NULL }
{ __tpstrtab_##name, 0, reg, unreg, NULL }
#define DEFINE_TRACE(name) \
DEFINE_TRACE_FN(name, NULL, NULL);
#define EXPORT_TRACEPOINT_SYMBOL_GPL(name) \
EXPORT_SYMBOL_GPL(__tracepoint_##name)
......@@ -108,6 +114,7 @@ extern void tracepoint_update_probe_range(struct tracepoint *begin,
return -ENOSYS; \
}
#define DEFINE_TRACE_FN(name, reg, unreg)
#define DEFINE_TRACE(name)
#define EXPORT_TRACEPOINT_SYMBOL_GPL(name)
#define EXPORT_TRACEPOINT_SYMBOL(name)
......@@ -158,6 +165,15 @@ static inline void tracepoint_synchronize_unregister(void)
#define PARAMS(args...) args
#endif /* _LINUX_TRACEPOINT_H */
/*
* Note: we keep the TRACE_EVENT outside the include file ifdef protection.
* This is due to the way trace events work. If a file includes two
* trace event headers under one "CREATE_TRACE_POINTS" the first include
* will override the TRACE_EVENT and break the second include.
*/
#ifndef TRACE_EVENT
/*
* For use with the TRACE_EVENT macro:
......@@ -259,10 +275,15 @@ static inline void tracepoint_synchronize_unregister(void)
* can also by used by generic instrumentation like SystemTap), and
* it is also used to expose a structured trace record in
* /sys/kernel/debug/tracing/events/.
*
* A set of (un)registration functions can be passed to the variant
* TRACE_EVENT_FN to perform any (un)registration work.
*/
#define TRACE_EVENT(name, proto, args, struct, assign, print) \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
#endif
#define TRACE_EVENT_FN(name, proto, args, struct, \
assign, print, reg, unreg) \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
#endif
#endif /* ifdef TRACE_EVENT (see note above) */
......@@ -26,6 +26,11 @@
#define TRACE_EVENT(name, proto, args, tstruct, assign, print) \
DEFINE_TRACE(name)
#undef TRACE_EVENT_FN
#define TRACE_EVENT_FN(name, proto, args, tstruct, \
assign, print, reg, unreg) \
DEFINE_TRACE_FN(name, reg, unreg)
#undef DECLARE_TRACE
#define DECLARE_TRACE(name, proto, args) \
DEFINE_TRACE(name)
......@@ -56,6 +61,8 @@
#include <trace/ftrace.h>
#endif
#undef TRACE_EVENT
#undef TRACE_EVENT_FN
#undef TRACE_HEADER_MULTI_READ
/* Only undef what we defined in this file */
......
......@@ -171,6 +171,7 @@ TRACE_EVENT(block_rq_complete,
(unsigned long long)__entry->sector,
__entry->nr_sector, __entry->errors)
);
TRACE_EVENT(block_bio_bounce,
TP_PROTO(struct request_queue *q, struct bio *bio),
......@@ -186,7 +187,8 @@ TRACE_EVENT(block_bio_bounce,
),
TP_fast_assign(
__entry->dev = bio->bi_bdev->bd_dev;
__entry->dev = bio->bi_bdev ?
bio->bi_bdev->bd_dev : 0;
__entry->sector = bio->bi_sector;
__entry->nr_sector = bio->bi_size >> 9;
blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
......
#undef TRACE_SYSTEM
#define TRACE_SYSTEM module
#if !defined(_TRACE_MODULE_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_MODULE_H
#include <linux/tracepoint.h>
#ifdef CONFIG_MODULES
struct module;
#define show_module_flags(flags) __print_flags(flags, "", \
{ (1UL << TAINT_PROPRIETARY_MODULE), "P" }, \
{ (1UL << TAINT_FORCED_MODULE), "F" }, \
{ (1UL << TAINT_CRAP), "C" })
TRACE_EVENT(module_load,
TP_PROTO(struct module *mod),
TP_ARGS(mod),
TP_STRUCT__entry(
__field( unsigned int, taints )
__string( name, mod->name )
),
TP_fast_assign(
__entry->taints = mod->taints;
__assign_str(name, mod->name);
),
TP_printk("%s %s", __get_str(name), show_module_flags(__entry->taints))
);
TRACE_EVENT(module_free,
TP_PROTO(struct module *mod),
TP_ARGS(mod),
TP_STRUCT__entry(
__string( name, mod->name )
),
TP_fast_assign(
__assign_str(name, mod->name);
),
TP_printk("%s", __get_str(name))
);
TRACE_EVENT(module_get,
TP_PROTO(struct module *mod, unsigned long ip, int refcnt),
TP_ARGS(mod, ip, refcnt),
TP_STRUCT__entry(
__field( unsigned long, ip )
__field( int, refcnt )
__string( name, mod->name )
),
TP_fast_assign(
__entry->ip = ip;
__entry->refcnt = refcnt;
__assign_str(name, mod->name);
),
TP_printk("%s call_site=%pf refcnt=%d",
__get_str(name), (void *)__entry->ip, __entry->refcnt)
);
TRACE_EVENT(module_put,
TP_PROTO(struct module *mod, unsigned long ip, int refcnt),
TP_ARGS(mod, ip, refcnt),
TP_STRUCT__entry(
__field( unsigned long, ip )
__field( int, refcnt )
__string( name, mod->name )
),
TP_fast_assign(
__entry->ip = ip;
__entry->refcnt = refcnt;
__assign_str(name, mod->name);
),
TP_printk("%s call_site=%pf refcnt=%d",
__get_str(name), (void *)__entry->ip, __entry->refcnt)
);
TRACE_EVENT(module_request,
TP_PROTO(char *name, bool wait, unsigned long ip),
TP_ARGS(name, wait, ip),
TP_STRUCT__entry(
__field( bool, wait )
__field( unsigned long, ip )
__string( name, name )
),
TP_fast_assign(
__entry->wait = wait;
__entry->ip = ip;
__assign_str(name, name);
),
TP_printk("%s wait=%d call_site=%pf",
__get_str(name), (int)__entry->wait, (void *)__entry->ip)
);
#endif /* CONFIG_MODULES */
#endif /* _TRACE_MODULE_H */
/* This part must be outside protection */
#include <trace/define_trace.h>
......@@ -94,6 +94,7 @@ TRACE_EVENT(sched_wakeup,
__field( pid_t, pid )
__field( int, prio )
__field( int, success )
__field( int, cpu )
),
TP_fast_assign(
......@@ -101,11 +102,12 @@ TRACE_EVENT(sched_wakeup,
__entry->pid = p->pid;
__entry->prio = p->prio;
__entry->success = success;
__entry->cpu = task_cpu(p);
),
TP_printk("task %s:%d [%d] success=%d",
TP_printk("task %s:%d [%d] success=%d [%03d]",
__entry->comm, __entry->pid, __entry->prio,
__entry->success)
__entry->success, __entry->cpu)
);
/*
......@@ -125,6 +127,7 @@ TRACE_EVENT(sched_wakeup_new,
__field( pid_t, pid )
__field( int, prio )
__field( int, success )
__field( int, cpu )
),
TP_fast_assign(
......@@ -132,11 +135,12 @@ TRACE_EVENT(sched_wakeup_new,
__entry->pid = p->pid;
__entry->prio = p->prio;
__entry->success = success;
__entry->cpu = task_cpu(p);
),
TP_printk("task %s:%d [%d] success=%d",
TP_printk("task %s:%d [%d] success=%d [%03d]",
__entry->comm, __entry->pid, __entry->prio,
__entry->success)
__entry->success, __entry->cpu)
);
/*
......
#undef TRACE_SYSTEM
#define TRACE_SYSTEM syscalls
#if !defined(_TRACE_EVENTS_SYSCALLS_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_EVENTS_SYSCALLS_H
#include <linux/tracepoint.h>
#include <asm/ptrace.h>
#include <asm/syscall.h>
#ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
extern void syscall_regfunc(void);
extern void syscall_unregfunc(void);
TRACE_EVENT_FN(sys_enter,
TP_PROTO(struct pt_regs *regs, long id),
TP_ARGS(regs, id),
TP_STRUCT__entry(
__field( long, id )
__array( unsigned long, args, 6 )
),
TP_fast_assign(
__entry->id = id;
syscall_get_arguments(current, regs, 0, 6, __entry->args);
),
TP_printk("NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)",
__entry->id,
__entry->args[0], __entry->args[1], __entry->args[2],
__entry->args[3], __entry->args[4], __entry->args[5]),
syscall_regfunc, syscall_unregfunc
);
TRACE_EVENT_FN(sys_exit,
TP_PROTO(struct pt_regs *regs, long ret),
TP_ARGS(regs, ret),
TP_STRUCT__entry(
__field( long, id )
__field( long, ret )
),
TP_fast_assign(
__entry->id = syscall_get_nr(current, regs);
__entry->ret = ret;
),
TP_printk("NR %ld = %ld",
__entry->id, __entry->ret),
syscall_regfunc, syscall_unregfunc
);
#endif /* CONFIG_HAVE_SYSCALL_TRACEPOINTS */
#endif /* _TRACE_EVENTS_SYSCALLS_H */
/* This part must be outside protection */
#include <trace/define_trace.h>
......@@ -21,11 +21,14 @@
#undef __field
#define __field(type, item) type item;
#undef __field_ext
#define __field_ext(type, item, filter_type) type item;
#undef __array
#define __array(type, item, len) type item[len];
#undef __dynamic_array
#define __dynamic_array(type, item, len) unsigned short __data_loc_##item;
#define __dynamic_array(type, item, len) u32 __data_loc_##item;
#undef __string
#define __string(item, src) __dynamic_array(char, item, -1)
......@@ -42,6 +45,16 @@
}; \
static struct ftrace_event_call event_##name
#undef __cpparg
#define __cpparg(arg...) arg
/* Callbacks are meaningless to ftrace. */
#undef TRACE_EVENT_FN
#define TRACE_EVENT_FN(name, proto, args, tstruct, \
assign, print, reg, unreg) \
TRACE_EVENT(name, __cpparg(proto), __cpparg(args), \
__cpparg(tstruct), __cpparg(assign), __cpparg(print)) \
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
......@@ -51,23 +64,27 @@
* Include the following:
*
* struct ftrace_data_offsets_<call> {
* int <item1>;
* int <item2>;
* u32 <item1>;
* u32 <item2>;
* [...]
* };
*
* The __dynamic_array() macro will create each int <item>, this is
* The __dynamic_array() macro will create each u32 <item>, this is
* to keep the offset of each array from the beginning of the event.
* The size of an array is also encoded, in the higher 16 bits of <item>.
*/
#undef __field
#define __field(type, item);
#define __field(type, item)
#undef __field_ext
#define __field_ext(type, item, filter_type)
#undef __array
#define __array(type, item, len)
#undef __dynamic_array
#define __dynamic_array(type, item, len) int item;
#define __dynamic_array(type, item, len) u32 item;
#undef __string
#define __string(item, src) __dynamic_array(char, item, -1)
......@@ -109,6 +126,9 @@
if (!ret) \
return 0;
#undef __field_ext
#define __field_ext(type, item, filter_type) __field(type, item)
#undef __array
#define __array(type, item, len) \
ret = trace_seq_printf(s, "\tfield:" #type " " #item "[" #len "];\t" \
......@@ -120,7 +140,7 @@
#undef __dynamic_array
#define __dynamic_array(type, item, len) \
ret = trace_seq_printf(s, "\tfield:__data_loc " #item ";\t" \
ret = trace_seq_printf(s, "\tfield:__data_loc " #type "[] " #item ";\t"\
"offset:%u;\tsize:%u;\n", \
(unsigned int)offsetof(typeof(field), \
__data_loc_##item), \
......@@ -150,7 +170,8 @@
#undef TRACE_EVENT
#define TRACE_EVENT(call, proto, args, tstruct, func, print) \
static int \
ftrace_format_##call(struct trace_seq *s) \
ftrace_format_##call(struct ftrace_event_call *unused, \
struct trace_seq *s) \
{ \
struct ftrace_raw_##call field __attribute__((unused)); \
int ret = 0; \
......@@ -210,7 +231,7 @@ ftrace_format_##call(struct trace_seq *s) \
#undef __get_dynamic_array
#define __get_dynamic_array(field) \
((void *)__entry + __entry->__data_loc_##field)
((void *)__entry + (__entry->__data_loc_##field & 0xffff))
#undef __get_str
#define __get_str(field) (char *)__get_dynamic_array(field)
......@@ -263,28 +284,33 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags) \
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
#undef __field
#define __field(type, item) \
#undef __field_ext
#define __field_ext(type, item, filter_type) \
ret = trace_define_field(event_call, #type, #item, \
offsetof(typeof(field), item), \
sizeof(field.item), is_signed_type(type)); \
sizeof(field.item), \
is_signed_type(type), filter_type); \
if (ret) \
return ret;
#undef __field
#define __field(type, item) __field_ext(type, item, FILTER_OTHER)
#undef __array
#define __array(type, item, len) \
BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
ret = trace_define_field(event_call, #type "[" #len "]", #item, \
offsetof(typeof(field), item), \
sizeof(field.item), 0); \
sizeof(field.item), 0, FILTER_OTHER); \
if (ret) \
return ret;
#undef __dynamic_array
#define __dynamic_array(type, item, len) \
ret = trace_define_field(event_call, "__data_loc" "[" #type "]", #item,\
offsetof(typeof(field), __data_loc_##item), \
sizeof(field.__data_loc_##item), 0);
ret = trace_define_field(event_call, "__data_loc " #type "[]", #item, \
offsetof(typeof(field), __data_loc_##item), \
sizeof(field.__data_loc_##item), 0, \
FILTER_OTHER);
#undef __string
#define __string(item, src) __dynamic_array(char, item, -1)
......@@ -292,17 +318,14 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags) \
#undef TRACE_EVENT
#define TRACE_EVENT(call, proto, args, tstruct, func, print) \
int \
ftrace_define_fields_##call(void) \
ftrace_define_fields_##call(struct ftrace_event_call *event_call) \
{ \
struct ftrace_raw_##call field; \
struct ftrace_event_call *event_call = &event_##call; \
int ret; \
\
__common_field(int, type, 1); \
__common_field(unsigned char, flags, 0); \
__common_field(unsigned char, preempt_count, 0); \
__common_field(int, pid, 1); \
__common_field(int, tgid, 1); \
ret = trace_define_common_fields(event_call); \
if (ret) \
return ret; \
\
tstruct; \
\
......@@ -321,6 +344,9 @@ ftrace_define_fields_##call(void) \
#undef __field
#define __field(type, item)
#undef __field_ext
#define __field_ext(type, item, filter_type)
#undef __array
#define __array(type, item, len)
......@@ -328,6 +354,7 @@ ftrace_define_fields_##call(void) \
#define __dynamic_array(type, item, len) \
__data_offsets->item = __data_size + \
offsetof(typeof(*entry), __data); \
__data_offsets->item |= (len * sizeof(type)) << 16; \
__data_size += (len) * sizeof(type);
#undef __string
......@@ -433,13 +460,15 @@ static void ftrace_profile_disable_##call(struct ftrace_event_call *event_call)\
* {
* struct ring_buffer_event *event;
* struct ftrace_raw_<call> *entry; <-- defined in stage 1
* struct ring_buffer *buffer;
* unsigned long irq_flags;
* int pc;
*
* local_save_flags(irq_flags);
* pc = preempt_count();
*
* event = trace_current_buffer_lock_reserve(event_<call>.id,
* event = trace_current_buffer_lock_reserve(&buffer,
* event_<call>.id,
* sizeof(struct ftrace_raw_<call>),
* irq_flags, pc);
* if (!event)
......@@ -449,7 +478,7 @@ static void ftrace_profile_disable_##call(struct ftrace_event_call *event_call)\
* <assign>; <-- Here we assign the entries by the __field and
* __array macros.
*
* trace_current_buffer_unlock_commit(event, irq_flags, pc);
* trace_current_buffer_unlock_commit(buffer, event, irq_flags, pc);
* }
*
* static int ftrace_raw_reg_event_<call>(void)
......@@ -541,6 +570,7 @@ static void ftrace_raw_event_##call(proto) \
struct ftrace_event_call *event_call = &event_##call; \
struct ring_buffer_event *event; \
struct ftrace_raw_##call *entry; \
struct ring_buffer *buffer; \
unsigned long irq_flags; \
int __data_size; \
int pc; \
......@@ -550,7 +580,8 @@ static void ftrace_raw_event_##call(proto) \
\
__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
\
event = trace_current_buffer_lock_reserve(event_##call.id, \
event = trace_current_buffer_lock_reserve(&buffer, \
event_##call.id, \
sizeof(*entry) + __data_size, \
irq_flags, pc); \
if (!event) \
......@@ -562,11 +593,12 @@ static void ftrace_raw_event_##call(proto) \
\
{ assign; } \
\
if (!filter_current_check_discard(event_call, entry, event)) \
trace_nowake_buffer_unlock_commit(event, irq_flags, pc); \
if (!filter_current_check_discard(buffer, event_call, entry, event)) \
trace_nowake_buffer_unlock_commit(buffer, \
event, irq_flags, pc); \
} \
\
static int ftrace_raw_reg_event_##call(void) \
static int ftrace_raw_reg_event_##call(void *ptr) \
{ \
int ret; \
\
......@@ -577,7 +609,7 @@ static int ftrace_raw_reg_event_##call(void) \
return ret; \
} \
\
static void ftrace_raw_unreg_event_##call(void) \
static void ftrace_raw_unreg_event_##call(void *ptr) \
{ \
unregister_trace_##call(ftrace_raw_event_##call); \
} \
......@@ -595,7 +627,6 @@ static int ftrace_raw_init_event_##call(void) \
return -ENODEV; \
event_##call.id = id; \
INIT_LIST_HEAD(&event_##call.fields); \
init_preds(&event_##call); \
return 0; \
} \
\
......
#ifndef _TRACE_SYSCALL_H
#define _TRACE_SYSCALL_H
#include <linux/tracepoint.h>
#include <linux/unistd.h>
#include <linux/ftrace_event.h>
#include <asm/ptrace.h>
/*
* A syscall entry in the ftrace syscalls array.
*
......@@ -10,26 +15,49 @@
* @nb_args: number of parameters it takes
* @types: list of types as strings
* @args: list of args as strings (args[i] matches types[i])
* @enter_id: associated ftrace enter event id
* @exit_id: associated ftrace exit event id
* @enter_event: associated syscall_enter trace event
* @exit_event: associated syscall_exit trace event
*/
struct syscall_metadata {
const char *name;
int nb_args;
const char **types;
const char **args;
int enter_id;
int exit_id;
struct ftrace_event_call *enter_event;
struct ftrace_event_call *exit_event;
};
#ifdef CONFIG_FTRACE_SYSCALLS
extern void arch_init_ftrace_syscalls(void);
extern struct syscall_metadata *syscall_nr_to_meta(int nr);
extern void start_ftrace_syscalls(void);
extern void stop_ftrace_syscalls(void);
extern void ftrace_syscall_enter(struct pt_regs *regs);
extern void ftrace_syscall_exit(struct pt_regs *regs);
#else
static inline void start_ftrace_syscalls(void) { }
static inline void stop_ftrace_syscalls(void) { }
static inline void ftrace_syscall_enter(struct pt_regs *regs) { }
static inline void ftrace_syscall_exit(struct pt_regs *regs) { }
extern int syscall_name_to_nr(char *name);
void set_syscall_enter_id(int num, int id);
void set_syscall_exit_id(int num, int id);
extern struct trace_event event_syscall_enter;
extern struct trace_event event_syscall_exit;
extern int reg_event_syscall_enter(void *ptr);
extern void unreg_event_syscall_enter(void *ptr);
extern int reg_event_syscall_exit(void *ptr);
extern void unreg_event_syscall_exit(void *ptr);
extern int syscall_enter_format(struct ftrace_event_call *call,
struct trace_seq *s);
extern int syscall_exit_format(struct ftrace_event_call *call,
struct trace_seq *s);
extern int syscall_enter_define_fields(struct ftrace_event_call *call);
extern int syscall_exit_define_fields(struct ftrace_event_call *call);
enum print_line_t print_syscall_enter(struct trace_iterator *iter, int flags);
enum print_line_t print_syscall_exit(struct trace_iterator *iter, int flags);
#endif
#ifdef CONFIG_EVENT_PROFILE
int reg_prof_syscall_enter(char *name);
void unreg_prof_syscall_enter(char *name);
int reg_prof_syscall_exit(char *name);
void unreg_prof_syscall_exit(char *name);
#endif
#endif /* _TRACE_SYSCALL_H */
......@@ -37,6 +37,8 @@
#include <linux/suspend.h>
#include <asm/uaccess.h>
#include <trace/events/module.h>
extern int max_threads;
static struct workqueue_struct *khelper_wq;
......@@ -108,6 +110,8 @@ int __request_module(bool wait, const char *fmt, ...)
return -ENOMEM;
}
trace_module_request(module_name, wait, _RET_IP_);
ret = call_usermodehelper(modprobe_path, argv, envp,
wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
atomic_dec(&kmod_concurrent);
......
......@@ -103,7 +103,7 @@ static struct kprobe_blackpoint kprobe_blacklist[] = {
#define INSNS_PER_PAGE (PAGE_SIZE/(MAX_INSN_SIZE * sizeof(kprobe_opcode_t)))
struct kprobe_insn_page {
struct hlist_node hlist;
struct list_head list;
kprobe_opcode_t *insns; /* Page of instruction slots */
char slot_used[INSNS_PER_PAGE];
int nused;
......@@ -117,7 +117,7 @@ enum kprobe_slot_state {
};
static DEFINE_MUTEX(kprobe_insn_mutex); /* Protects kprobe_insn_pages */
static struct hlist_head kprobe_insn_pages;
static LIST_HEAD(kprobe_insn_pages);
static int kprobe_garbage_slots;
static int collect_garbage_slots(void);
......@@ -152,10 +152,9 @@ loop_end:
static kprobe_opcode_t __kprobes *__get_insn_slot(void)
{
struct kprobe_insn_page *kip;
struct hlist_node *pos;
retry:
hlist_for_each_entry(kip, pos, &kprobe_insn_pages, hlist) {
list_for_each_entry(kip, &kprobe_insn_pages, list) {
if (kip->nused < INSNS_PER_PAGE) {
int i;
for (i = 0; i < INSNS_PER_PAGE; i++) {
......@@ -189,8 +188,8 @@ static kprobe_opcode_t __kprobes *__get_insn_slot(void)
kfree(kip);
return NULL;
}
INIT_HLIST_NODE(&kip->hlist);
hlist_add_head(&kip->hlist, &kprobe_insn_pages);
INIT_LIST_HEAD(&kip->list);
list_add(&kip->list, &kprobe_insn_pages);
memset(kip->slot_used, SLOT_CLEAN, INSNS_PER_PAGE);
kip->slot_used[0] = SLOT_USED;
kip->nused = 1;
......@@ -219,12 +218,8 @@ static int __kprobes collect_one_slot(struct kprobe_insn_page *kip, int idx)
* so as not to have to set it up again the
* next time somebody inserts a probe.
*/
hlist_del(&kip->hlist);
if (hlist_empty(&kprobe_insn_pages)) {
INIT_HLIST_NODE(&kip->hlist);
hlist_add_head(&kip->hlist,
&kprobe_insn_pages);
} else {
if (!list_is_singular(&kprobe_insn_pages)) {
list_del(&kip->list);
module_free(NULL, kip->insns);
kfree(kip);
}
......@@ -235,14 +230,13 @@ static int __kprobes collect_one_slot(struct kprobe_insn_page *kip, int idx)
static int __kprobes collect_garbage_slots(void)
{
struct kprobe_insn_page *kip;
struct hlist_node *pos, *next;
struct kprobe_insn_page *kip, *next;
/* Ensure no-one is preepmted on the garbages */
if (check_safety())
return -EAGAIN;
hlist_for_each_entry_safe(kip, pos, next, &kprobe_insn_pages, hlist) {
list_for_each_entry_safe(kip, next, &kprobe_insn_pages, list) {
int i;
if (kip->ngarbage == 0)
continue;
......@@ -260,19 +254,17 @@ static int __kprobes collect_garbage_slots(void)
void __kprobes free_insn_slot(kprobe_opcode_t * slot, int dirty)
{
struct kprobe_insn_page *kip;
struct hlist_node *pos;
mutex_lock(&kprobe_insn_mutex);
hlist_for_each_entry(kip, pos, &kprobe_insn_pages, hlist) {
list_for_each_entry(kip, &kprobe_insn_pages, list) {
if (kip->insns <= slot &&
slot < kip->insns + (INSNS_PER_PAGE * MAX_INSN_SIZE)) {
int i = (slot - kip->insns) / MAX_INSN_SIZE;
if (dirty) {
kip->slot_used[i] = SLOT_DIRTY;
kip->ngarbage++;
} else {
} else
collect_one_slot(kip, i);
}
break;
}
}
......
......@@ -55,6 +55,11 @@
#include <linux/percpu.h>
#include <linux/kmemleak.h>
#define CREATE_TRACE_POINTS
#include <trace/events/module.h>
EXPORT_TRACEPOINT_SYMBOL(module_get);
#if 0
#define DEBUGP printk
#else
......@@ -942,6 +947,8 @@ void module_put(struct module *module)
if (module) {
unsigned int cpu = get_cpu();
local_dec(__module_ref_addr(module, cpu));
trace_module_put(module, _RET_IP_,
local_read(__module_ref_addr(module, cpu)));
/* Maybe they're waiting for us to drop reference? */
if (unlikely(!module_is_live(module)))
wake_up_process(module->waiter);
......@@ -1497,6 +1504,8 @@ static int __unlink_module(void *_mod)
/* Free a module, remove from lists, etc (must hold module_mutex). */
static void free_module(struct module *mod)
{
trace_module_free(mod);
/* Delete from various lists */
stop_machine(__unlink_module, mod, NULL);
remove_notes_attrs(mod);
......@@ -2364,6 +2373,8 @@ static noinline struct module *load_module(void __user *umod,
/* Get rid of temporary copy */
vfree(hdr);
trace_module_load(mod);
/* Done! */
return mod;
......
......@@ -41,7 +41,7 @@ config HAVE_FTRACE_MCOUNT_RECORD
config HAVE_HW_BRANCH_TRACER
bool
config HAVE_FTRACE_SYSCALLS
config HAVE_SYSCALL_TRACEPOINTS
bool
config TRACER_MAX_TRACE
......@@ -60,9 +60,14 @@ config EVENT_TRACING
bool
config CONTEXT_SWITCH_TRACER
select MARKERS
bool
config RING_BUFFER_ALLOW_SWAP
bool
help
Allow the use of ring_buffer_swap_cpu.
Adds a very slight overhead to tracing when enabled.
# All tracer options should select GENERIC_TRACER. For those options that are
# enabled by all tracers (context switch and event tracer) they select TRACING.
# This allows those options to appear when no other tracer is selected. But the
......@@ -147,6 +152,7 @@ config IRQSOFF_TRACER
select TRACE_IRQFLAGS
select GENERIC_TRACER
select TRACER_MAX_TRACE
select RING_BUFFER_ALLOW_SWAP
help
This option measures the time spent in irqs-off critical
sections, with microsecond accuracy.
......@@ -168,6 +174,7 @@ config PREEMPT_TRACER
depends on PREEMPT
select GENERIC_TRACER
select TRACER_MAX_TRACE
select RING_BUFFER_ALLOW_SWAP
help
This option measures the time spent in preemption off critical
sections, with microsecond accuracy.
......@@ -211,7 +218,7 @@ config ENABLE_DEFAULT_TRACERS
config FTRACE_SYSCALLS
bool "Trace syscalls"
depends on HAVE_FTRACE_SYSCALLS
depends on HAVE_SYSCALL_TRACEPOINTS
select GENERIC_TRACER
select KALLSYMS
help
......
......@@ -65,13 +65,15 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
{
struct blk_io_trace *t;
struct ring_buffer_event *event = NULL;
struct ring_buffer *buffer = NULL;
int pc = 0;
int cpu = smp_processor_id();
bool blk_tracer = blk_tracer_enabled;
if (blk_tracer) {
buffer = blk_tr->buffer;
pc = preempt_count();
event = trace_buffer_lock_reserve(blk_tr, TRACE_BLK,
event = trace_buffer_lock_reserve(buffer, TRACE_BLK,
sizeof(*t) + len,
0, pc);
if (!event)
......@@ -96,7 +98,7 @@ record_it:
memcpy((void *) t + sizeof(*t), data, len);
if (blk_tracer)
trace_buffer_unlock_commit(blk_tr, event, 0, pc);
trace_buffer_unlock_commit(buffer, event, 0, pc);
}
}
......@@ -179,6 +181,7 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
{
struct task_struct *tsk = current;
struct ring_buffer_event *event = NULL;
struct ring_buffer *buffer = NULL;
struct blk_io_trace *t;
unsigned long flags = 0;
unsigned long *sequence;
......@@ -204,8 +207,9 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
if (blk_tracer) {
tracing_record_cmdline(current);
buffer = blk_tr->buffer;
pc = preempt_count();
event = trace_buffer_lock_reserve(blk_tr, TRACE_BLK,
event = trace_buffer_lock_reserve(buffer, TRACE_BLK,
sizeof(*t) + pdu_len,
0, pc);
if (!event)
......@@ -252,7 +256,7 @@ record_it:
memcpy((void *) t + sizeof(*t), pdu_data, pdu_len);
if (blk_tracer) {
trace_buffer_unlock_commit(blk_tr, event, 0, pc);
trace_buffer_unlock_commit(buffer, event, 0, pc);
return;
}
}
......
This diff is collapsed.
......@@ -183,11 +183,9 @@ static void kmemtrace_stop_probes(void)
static int kmem_trace_init(struct trace_array *tr)
{
int cpu;
kmemtrace_array = tr;
for_each_cpu(cpu, cpu_possible_mask)
tracing_reset(tr, cpu);
tracing_reset_online_cpus(tr);
kmemtrace_start_probes();
......@@ -239,12 +237,52 @@ struct kmemtrace_user_event_alloc {
};
static enum print_line_t
kmemtrace_print_alloc_user(struct trace_iterator *iter,
struct kmemtrace_alloc_entry *entry)
kmemtrace_print_alloc(struct trace_iterator *iter, int flags)
{
struct kmemtrace_user_event_alloc *ev_alloc;
struct trace_seq *s = &iter->seq;
struct kmemtrace_alloc_entry *entry;
int ret;
trace_assign_type(entry, iter->ent);
ret = trace_seq_printf(s, "type_id %d call_site %pF ptr %lu "
"bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d\n",
entry->type_id, (void *)entry->call_site, (unsigned long)entry->ptr,
(unsigned long)entry->bytes_req, (unsigned long)entry->bytes_alloc,
(unsigned long)entry->gfp_flags, entry->node);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
return TRACE_TYPE_HANDLED;
}
static enum print_line_t
kmemtrace_print_free(struct trace_iterator *iter, int flags)
{
struct trace_seq *s = &iter->seq;
struct kmemtrace_free_entry *entry;
int ret;
trace_assign_type(entry, iter->ent);
ret = trace_seq_printf(s, "type_id %d call_site %pF ptr %lu\n",
entry->type_id, (void *)entry->call_site,
(unsigned long)entry->ptr);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
return TRACE_TYPE_HANDLED;
}
static enum print_line_t
kmemtrace_print_alloc_user(struct trace_iterator *iter, int flags)
{
struct trace_seq *s = &iter->seq;
struct kmemtrace_alloc_entry *entry;
struct kmemtrace_user_event *ev;
struct kmemtrace_user_event_alloc *ev_alloc;
trace_assign_type(entry, iter->ent);
ev = trace_seq_reserve(s, sizeof(*ev));
if (!ev)
......@@ -271,12 +309,14 @@ kmemtrace_print_alloc_user(struct trace_iterator *iter,
}
static enum print_line_t
kmemtrace_print_free_user(struct trace_iterator *iter,
struct kmemtrace_free_entry *entry)
kmemtrace_print_free_user(struct trace_iterator *iter, int flags)
{
struct trace_seq *s = &iter->seq;
struct kmemtrace_free_entry *entry;
struct kmemtrace_user_event *ev;
trace_assign_type(entry, iter->ent);
ev = trace_seq_reserve(s, sizeof(*ev));
if (!ev)
return TRACE_TYPE_PARTIAL_LINE;
......@@ -294,12 +334,14 @@ kmemtrace_print_free_user(struct trace_iterator *iter,
/* The two other following provide a more minimalistic output */
static enum print_line_t
kmemtrace_print_alloc_compress(struct trace_iterator *iter,
struct kmemtrace_alloc_entry *entry)
kmemtrace_print_alloc_compress(struct trace_iterator *iter)
{
struct kmemtrace_alloc_entry *entry;
struct trace_seq *s = &iter->seq;
int ret;
trace_assign_type(entry, iter->ent);
/* Alloc entry */
ret = trace_seq_printf(s, " + ");
if (!ret)
......@@ -345,29 +387,24 @@ kmemtrace_print_alloc_compress(struct trace_iterator *iter,
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
/* Node */
ret = trace_seq_printf(s, "%4d ", entry->node);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
/* Call site */
ret = seq_print_ip_sym(s, entry->call_site, 0);
/* Node and call site*/
ret = trace_seq_printf(s, "%4d %pf\n", entry->node,
(void *)entry->call_site);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
if (!trace_seq_printf(s, "\n"))
return TRACE_TYPE_PARTIAL_LINE;
return TRACE_TYPE_HANDLED;
}
static enum print_line_t
kmemtrace_print_free_compress(struct trace_iterator *iter,
struct kmemtrace_free_entry *entry)
kmemtrace_print_free_compress(struct trace_iterator *iter)
{
struct kmemtrace_free_entry *entry;
struct trace_seq *s = &iter->seq;
int ret;
trace_assign_type(entry, iter->ent);
/* Free entry */
ret = trace_seq_printf(s, " - ");
if (!ret)
......@@ -401,19 +438,11 @@ kmemtrace_print_free_compress(struct trace_iterator *iter,
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
/* Skip node */
ret = trace_seq_printf(s, " ");
/* Skip node and print call site*/
ret = trace_seq_printf(s, " %pf\n", (void *)entry->call_site);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
/* Call site */
ret = seq_print_ip_sym(s, entry->call_site, 0);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
if (!trace_seq_printf(s, "\n"))
return TRACE_TYPE_PARTIAL_LINE;
return TRACE_TYPE_HANDLED;
}
......@@ -421,32 +450,31 @@ static enum print_line_t kmemtrace_print_line(struct trace_iterator *iter)
{
struct trace_entry *entry = iter->ent;
switch (entry->type) {
case TRACE_KMEM_ALLOC: {
struct kmemtrace_alloc_entry *field;
trace_assign_type(field, entry);
if (kmem_tracer_flags.val & TRACE_KMEM_OPT_MINIMAL)
return kmemtrace_print_alloc_compress(iter, field);
else
return kmemtrace_print_alloc_user(iter, field);
}
case TRACE_KMEM_FREE: {
struct kmemtrace_free_entry *field;
trace_assign_type(field, entry);
if (kmem_tracer_flags.val & TRACE_KMEM_OPT_MINIMAL)
return kmemtrace_print_free_compress(iter, field);
else
return kmemtrace_print_free_user(iter, field);
}
if (!(kmem_tracer_flags.val & TRACE_KMEM_OPT_MINIMAL))
return TRACE_TYPE_UNHANDLED;
switch (entry->type) {
case TRACE_KMEM_ALLOC:
return kmemtrace_print_alloc_compress(iter);
case TRACE_KMEM_FREE:
return kmemtrace_print_free_compress(iter);
default:
return TRACE_TYPE_UNHANDLED;
}
}
static struct trace_event kmem_trace_alloc = {
.type = TRACE_KMEM_ALLOC,
.trace = kmemtrace_print_alloc,
.binary = kmemtrace_print_alloc_user,
};
static struct trace_event kmem_trace_free = {
.type = TRACE_KMEM_FREE,
.trace = kmemtrace_print_free,
.binary = kmemtrace_print_free_user,
};
static struct tracer kmem_tracer __read_mostly = {
.name = "kmemtrace",
.init = kmem_trace_init,
......@@ -463,6 +491,21 @@ void kmemtrace_init(void)
static int __init init_kmem_tracer(void)
{
return register_tracer(&kmem_tracer);
if (!register_ftrace_event(&kmem_trace_alloc)) {
pr_warning("Warning: could not register kmem events\n");
return 1;
}
if (!register_ftrace_event(&kmem_trace_free)) {
pr_warning("Warning: could not register kmem events\n");
return 1;
}
if (!register_tracer(&kmem_tracer)) {
pr_warning("Warning: could not register the kmem tracer\n");
return 1;
}
return 0;
}
device_initcall(init_kmem_tracer);
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -5,6 +5,7 @@
*
*/
#include <linux/module.h>
#include "trace.h"
int ftrace_profile_enable(int event_id)
......@@ -14,7 +15,8 @@ int ftrace_profile_enable(int event_id)
mutex_lock(&event_mutex);
list_for_each_entry(event, &ftrace_events, list) {
if (event->id == event_id && event->profile_enable) {
if (event->id == event_id && event->profile_enable &&
try_module_get(event->mod)) {
ret = event->profile_enable(event);
break;
}
......@@ -32,6 +34,7 @@ void ftrace_profile_disable(int event_id)
list_for_each_entry(event, &ftrace_events, list) {
if (event->id == event_id) {
event->profile_disable(event);
module_put(event->mod);
break;
}
}
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -288,11 +288,9 @@ static int
ftrace_trace_onoff_print(struct seq_file *m, unsigned long ip,
struct ftrace_probe_ops *ops, void *data)
{
char str[KSYM_SYMBOL_LEN];
long count = (long)data;
kallsyms_lookup(ip, NULL, NULL, NULL, str);
seq_printf(m, "%s:", str);
seq_printf(m, "%pf:", (void *)ip);
if (ops == &traceon_probe_ops)
seq_printf(m, "traceon");
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment