= Tracing and Logging =



Starting with version 2014.1, there are three tracing mechanisms available in LITMUS^RT^, which are exposed as four sets of device files:

 1. `litmus_log`: This trace contains text messages (created with the `TRACE()` macro, see `debug_trace.h`) that convey information useful for debugging. There is one global `litmus_log` buffer for the whole system. The `litmus_log` facility is essentially a replacement for `printk()`, which cannot be invoked from scheduling code without risking deadlock. Debug tracing must be enabled at compile time. Note that debug tracing creates significant overhead because string formatting takes place. 
 2. `ft_cpu_traceX` and `ft_msg_traceX`: These per-CPU traces (`X` denotes the CPU) contain binary-encoded time stamps. They are used for overhead tracing. There are two buffers for each processor to enable processor-local tracing. The "ft" stands for [[http://www.cs.unc.edu/~bbb/feathertrace|Feather-Trace]]. Feather-Trace is designed to create negligible overhead when event sources are disabled, and to incur only low overhead when recording time stamps. The trace `ft_cpu_traceX` contains time stamps of CPU-local scheduling events (such as scheduler invocation, context switches, etc.). The `ft_msg_traceX` trace contains timestamps related to inter-processor events (primarily rescheduling IPIs).
 3. `sched_traceX`: This trace contains binary-encoded scheduling event information, ''e.g.'', an event can be recorded whenever a task got scheduled, a job was released, a job completed, etc. There is one `sched_traceX` buffer per processor (again, the `X` denotes the CPU). `sched_trace` is based on Feather-Trace and hence also incurs only little overhead when event sources are disabled.

== Accessing Trace Buffers ==

`litmus_log` functionality is provided through the kernel's `misc` driver interface. The other two traces are exported to user space through standard character device drivers.

On systems with default udev rules, the devices are created in the `/dev/litmus/` directory:

 * The `litmus_log` trace is exported as `/dev/litmus/log`.
 * The `ft_cpu_traceX` traces are exported as `/dev/litmus/ft_cpu_trace0`, `/dev/litmus/ft_cpu_trace1`, etc.
 * The `ft_msg_traceX` traces are exported as `/dev/litmus/ft_msg_trace0`, `/dev/litmus/ft_msg_trace1`, etc.
 * The `sched_trace` traces are exported as `/dev/litmus/sched_trace0`, `/dev/litmus/sched_trace1`,  etc. 

== Recording Debug Traces ==

The `litmus_log` buffer can be read by simply opening the file and reading its contents:

{{{
cat /dev/litmus/log > my_debug_log
}}}

Kill the `cat` process to stop recording debug messages.

No post-processing is required since the debug messages are plain text. However, note that messages may appear in an order that differs from the sequence of events at runtime. If order is important (for example when debugging race conditions), then recorded messages can be sorted offline with the help of the ''sequence number'' at the start of each recorded message.

Example:

{{{
sort -n my_debug_log > my_sorted_debug_log
}}}

'''Hint''': It's possible to use `netcat(1)` to send the debug messages to another host to avoid filesystem activity.

'''Hint''': If `cat` is being starved, you can make it a real-time task itself with `rt_launch` from `liblitmus`.

== Recording  Overhead Traces ==

Feather-Trace allows for much more fine-grained tracing than the simple debug stream realized by `litmus_log` and hence requires special-purpose tools to be used. These tools are available as part of the `ft_tools` package (see [[Repositories|list of repositories]]).

Feather-Trace events can be enabled on a per-event basis. Each event is identified by a unique 32-bit ID. Initially, when the device buffer is not being accessed by any user space programs (i.e., when the device driver is unused), all events are disabled. Events can be enabled (and subsequently disabled again) by writing binary commands to the buffer device file. Once events are enabled they can generate trace records, which are stored in the trace buffer. User space programs can obtain these records by reading from the device file. Reading a trace record removes it from the buffer, i.e., each record can be read exactly once. Records should be consumed shortly after they were created since buffer capacity is limited.

The tool `ftcat`, which is part of the `ft_tools` package (see above), automates the process of enabling events and retrieving trace records. It takes the name of the `ft_trace` device file and the events of interest as arguments:

{{{
ftcat <ft device> <event 1> <event 2> <event 3> ...
}}}

The tool enables the specified events and copies all recorded events to `stdout`.

Events can be specified either by their ID (see `include/litmus/trace.h` in the kernel directory for a list of time stamp generating events) or by their symbolic name. The following symbolic names are recognized:

 * `SCHED_START`, `SCHED_END`: Used to measure the time spent to make a scheduling decision.
 * `CXS_START`, `CXS_END`: Used to record the time spent to make a context switch.
 *`SCHED2_START`, `SCHED2_END`: Used to measure the time spent to perform post-context-switch cleanup and management activities. This is part of the scheduling overhead but for technical reasons cannot be measured as part of the interval [`SCHED_START`, `SCHED_END`].
 * `TICK_START`, `TICK_END`:   Used to measure the overhead incurred due to Linux's periodic system tick. As of LITMUS^RT^ 2014.2, no LITMUS^RT^-specific activity takes place during a system tick.
 * `QUANTUM_BOUDARY_START`, QUANTUM_BOUNDARY_END`: Used to measure the overhead incurred in quantum-driven plugins (e.g., PFAIR) at the start of each quantum. Available since LITMUS^RT^ 2014.2.
 * `PLUGIN_SCHED_START`, `PLUGIN_SCHED_END`:   Like [`SCHED_START`, `SCHED_END`], but only measures the time spent by the active scheduling plugin. There is no equivalent `SCHED2` counterpart because the scheduling plugins do not directly contribute to the `SCHED2` overhead.
 * `RELEASE_START`, `RELEASE_END`:   Used to measure the time spent to enqueue a newly-released job in a ready queue.
 * `RELEASE_LATENCY`: Used to measure the difference between when a timer ''should'' have fired, and when it actually ''did'' fire. In contrast to all other time stamps, this overhead is directly measured in nanoseconds (and not in processor cycles as the other overheads).

For example, the following command can be used to store the length of context switches and scheduling decisions in the file `my_trace`.

{{{
$ ftcat /dev/litmus/ft_cpu_trace0 CXS_START CXS_END SCHED_START SCHED_END SCHED2_START SCHED2_END > my_trace_of_scheduling_events_on_cpu0
}}}

The tracing can be stopped by interrupting `ftcat` with `^C` or by sending `SIGTERM` with `kill(1)`.

To record IPI timestamps, use the following command.

{{{
$  ftcat /dev/litmus/ft_msg_trace1 SEND_RESCHED_START SEND_RESCHED_END  > my_trace_of_IPIs_received_by_cpu1
}}}

To record all overheads on all CPUs, one must start two `ftcat` instances for each ${CPU}: one to record `/dev/litmus/ft_cpu_trace${CPU}` and one to record `/dev/litmus/ft_msg_trace${CPU}`. This is best automated with a script.

Note that all overhead traces are stored in the byte order of the host.

== Post-Processing Overhead Traces ==

The binary event stream recorded by `ftcat` is of course of limited direct use&mdash;the data has yet to be exported for analysis. This can be achieved with the tool `ft2csv`, which is also part of the `ft_tools` package.

As the name suggests, `ft2csv` extracts intervals defined by pairs of time stamps in a recorded trace and displays them as ''comma-separated values'' (CSV). It takes the name of an overhead trace and one start event as arguments:

{{{
ft2csv <start event> <overhead trace>
}}}

Events are specified in the same way as with `ftcat`. For example, the following command can be used to print the context-switch lengths that were recorded in the example above to `stdout`.

{{{
$ ft2csv CXS_START my_trace_of_scheduling_events_on_cpu0
2397634444579592, 2397634444583328, 3736
2397634477323130, 2397634477326686, 3556
2397634477346366, 2397634477348986, 2620
2397634611910924, 2397634611914348, 3424
...
}}}

For each event, the start time (in clock cycles) is given in the first column, the end time is given in the second column, and the length is given in the third column (again, in cycles).

`ft2csv` accepts a few options that affect how events are filtered. By default, events that do not involve real-time tasks are ignored. This can be changed by specifying the `-b` option. If one happens to be processing output on a little-endian host that was produced on a big-endian host then the `-e` option can come in handy.

Once the recorded overheads have been exported to CSV files they can be easily analyzed with tools such as Python's `csv` module, Octave, or Matlab.

One can either analyze the trace from each CPU individually, or just concatenate all traces (e.g., `cat my_trace_of_scheduling_events_on_cpu0 my_trace_of_scheduling_events_on_cpu1 ... > all_events`) and then run `ft2csv` on the resulting set of all events.

== Recording Scheduling Traces ==

Scheduling traces are also recorded using Feather-Trace. However, a different binary format is used. The definition and an explanation of the format can be found in the file `include/litmus/sched_trace.h` (in the kernel directory).

Since `sched_trace` uses per-processor buffers several `ftcat` instances have to be launched. The `ft_tools` package includes a wrapper script called `st_trace` that automates this setup procedure. `st_trace` accepts a tag as the only argument. The tag is used to assign a unique name to the trace files.

The following example illustrates how `st_trace` is used with the tag `global-exp`.

{{{
$ st_trace global-exp
CPU 0: 8204 > st-global-exp-0.bin [0]
CPU 1: 8205 > st-global-exp-1.bin [0]
CPU 2: 8206 > st-global-exp-2.bin [0]
CPU 3: 8207 > st-global-exp-3.bin [0]
Press Enter to end tracing... # enter pressed

Disabling 9 events.
Disabling 9 events.
Disabling 9 events.
Disabling 9 events.
/home/litmus/log0: 1234 bytes read.
/home/litmus/log1: 1234 bytes read.
/home/litmus/log2: 1234 bytes read.
/home/litmus/log3: 1234 bytes read.
}}}

Note that `st_trace` may have to be modified to change the default `sched_trace` device locations and the `ftcat` binary.

'''Hint''': The dummy real-time task `rtspin` (distributed as part of the `liblitmus` package) may be useful when studying/testing the behavior of a scheduler plugin.

Recorded scheduling traces can be analyzed with [[http://cs.unc.edu/~mollison/unit-trace|Unit-Trace]].

== Calibrating Cycle Counter Offsets in Feather-Trace ==

On some machines, the cycle counters on different cores may be offset by a constant or slowly changing value. For the purpose of recording per-processor events (such as context switches or scheduler invocations), any such offset is irrelevant. However, offsets are problematic when recording events across processors. In particular, in the main version of LITMUS^RT^, this affects IPI tracing (i.e, `SEND_RESCHED`).

As of version 2014.1, to deal with such offsets, LITMUS^RT^ supports ''calibrating'' Feather-Trace to account for observed offsets between cycle counters on different cores. To calibrate Feather-Trace, simply invoke `ftcat` with the `-c` option on `/dev/litmus/ft_msg_trace0`. For example, on a system with three CPUs:

{{{
$ ftcat -v -c /dev/litmus/ft_msg_trace0
Calibrating CPU 0... done.
Calibrating CPU 1... done.
Calibrating CPU 2... done.
/dev/litmus/ft_msg_trace0: 0 bytes read.
}}}

The system log (`dmesg`) will contain messages similar to the following example. (Note: these values come from QEMU and are not representative of a real system.)

{{{
Feather-Trace: determining TSC offsets for P0
Feather-Trace: TSC offset for P0->P1 is 90 cycles.
Feather-Trace: TSC offset for P0->P2 is 38 cycles.
Feather-Trace: determining TSC offsets for P1
Feather-Trace: TSC offset for P1->P0 is 15 cycles.
Feather-Trace: TSC offset for P1->P2 is 174 cycles.
Feather-Trace: determining TSC offsets for P2
Feather-Trace: TSC offset for P2->P0 is 27 cycles.
Feather-Trace: TSC offset for P2->P1 is 84 cycles.
}}}

Without the `-v` (verbose) option, no output will be produced.

The calibration code does the following to measure the offset between two cores A and B: after turning off interrupts on both cores, both cores first wait on a barrier (i.e., a simple flag set by core A) and then each record the local time.  The offset between the two times is stored and used to adjust the start time of IPI timestamps (i.e, `SEND_RESCHED_START`). The relevant code can be found in `litmus/trace.c`.

Note that, because the measurement code relies on a simple spin-based barrier, the accuracy of this approach is limited by delays inherent to the cache coherency protocol. We thus recommend to '''not''' use the calibration support unless a system is known to suffer from cycle counter offsets that are larger in magnitude than typical cache-coherency latencies. When using Feather-Trace calibration support, we recommend to calibrate the cycle counters before each measurement (e.g., before running each task set) to minimize the risk of drift.

== Concluding Remarks ==

If you have any questions, please ask for help on the [[Mailinglist|mailing list]].