| Age | Commit message (Collapse) | Author | Lines |
|
The patch 'Replace atoi() with a robust strtoi()' introduced a bug
in parse_cpu_set(), which relies on partial parsing of the input string.
The function parses CPU specifications like '0-3,5' by incrementing
a pointer through the string. strtoi() rejects strings with trailing
characters, causing parse_cpu_set() to fail on any CPU list with
multiple entries.
Restore the original use of atoi() in parse_cpu_set().
Fixes: 7e9dfccf8f11 ("rtla: Replace atoi() with a robust strtoi()")
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20260112192642.212848-2-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Correct the return value documentation for parse_cpu_set() function
in utils.c. The comment incorrectly stated that the function returns
1 on success and 0 on failure, but the actual implementation returns
0 on success and 1 on failure, following the common error-on-nonzero
convention used throughout the codebase.
This documentation fix ensures that developers reading the code
understand the correct return value semantics and prevents potential
misuse of the function's return value in conditional checks.
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-18-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Add explicit null termination and buffer initialization for read()
operations in procfs_is_workload_pid() and get_self_cgroup() functions.
The read() system call does not null-terminate the data it reads, and
when the buffer is filled to capacity, subsequent string operations
will read past the buffer boundary searching for a null terminator.
In procfs_is_workload_pid(), explicitly set buffer[MAX_PATH-1] to '\0'
to ensure the buffer is always null-terminated before passing it to
strncmp(). In get_self_cgroup(), use memset() to zero the path buffer
before reading, which ensures null termination when retval is less than
MAX_PATH. Additionally, set path[MAX_PATH-1] to '\0' after the read to
handle the case where the buffer is filled completely.
These defensive buffer handling practices prevent potential buffer
overruns and align with the ongoing buffer safety improvements across
the rtla codebase.
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-17-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
The stop_tracing global variable is accessed from both the signal
handler context and the main program flow without synchronization.
This creates a potential race condition where compiler optimizations
could cache the variable value in registers, preventing the signal
handler's updates from being visible to other parts of the program.
Add the volatile qualifier to stop_tracing in both common.c and
common.h to ensure all accesses to this variable bypass compiler
optimizations and read directly from memory. This guarantees that
when the signal handler sets stop_tracing, the change is immediately
visible to the main program loop, preventing potential hangs or
delayed shutdown when termination signals are received.
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-16-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
The actions_parse() function uses strtok() to tokenize the trigger
string, but does not check if the returned token is NULL before
passing it to strcmp(). If the trigger parameter is an empty string
or contains only delimiter characters, strtok() returns NULL, causing
strcmp() to dereference a NULL pointer and crash the program.
This issue can be triggered by malformed user input or edge cases in
trigger string parsing. Add a NULL check immediately after the strtok()
call to validate that a token was successfully extracted before using
it. If no token is found, the function now returns -1 to indicate a
parsing error.
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-13-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Remove unused includes for <errno.h> and <signal.h> to clean up the
code and reduce unnecessary dependencies.
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-12-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
The actions struct is allocated using calloc, which already returns
zeroed memory. The subsequent memset call to zero the 'present' member
is therefore redundant.
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-10-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
The result enum defines custom values for PASSED, ERROR, and FAILED.
These values correspond to standard exit codes EXIT_SUCCESS and
EXIT_FAILURE.
Update the enum to use the standard macros EXIT_SUCCESS and
EXIT_FAILURE to improve readability and adherence to standard C
practices.
The FAILED value is implicitly assigned EXIT_FAILURE + 1, so there
is no need to assign an explicit value.
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-9-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
The atoi() function does not perform error checking, which can lead to
undefined behavior when parsing invalid or out-of-range strings. This
can cause issues when parsing user-provided numerical inputs, such as
signal numbers, PIDs, or CPU lists.
To address this, introduce a new strtoi() helper function that safely
converts a string to an integer. This function validates the input and
checks for overflows, returning a negative value on failure.
Replace all calls to atoi() with the new strtoi() function and add
proper error handling to make the parsing more robust and prevent
potential issues.
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-5-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
The for loop to iterate over the list of actions is used in
more than one place. To avoid code duplication and improve
readability, introduce a for_each_action() helper macro.
Replace the open-coded for loops with the new helper.
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-4-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Both set_pid_cgroup() and set_comm_cgroup() functions contain
identical code for opening the cgroup.procs file.
Extract this common code into a new helper function open_cgroup_procs()
to reduce code duplication and improve maintainability.
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251224125058.1771519-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Each rtla tool duplicates parsing of -H/--house-keeping.
Migrate the option parsing from individual tools to the
common_parse_options().
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-8-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Each rtla tool duplicates parsing of -P/--priority.
Migrate the option parsing from individual tools to the
common_parse_options().
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-7-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Each rtla tool duplicates parsing of -e/--event.
Migrate the option parsing from individual tools to the
common_parse_options().
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-6-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Each rtla tool duplicates parsing of -d/--duration.
Migrate the option parsing from individual tools to the
common_parse_options().
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-5-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Each rtla tool duplicates parsing of -D/--debug.
Migrate the option parsing from individual tools to the
common_parse_options().
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-4-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Each rtla tool duplicates parsing of -C/--cgroup.
Migrate the option parsing from individual tools to the
common_parse_options().
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-3-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Each rtla tool duplicates parsing of -c/--cpus.
Migrate the option parsing from individual tools to the
common_parse_options().
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-2-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Each rtla tool duplicates parsing of many common options. This creates
maintenance overhead and risks inconsistencies when updating these
options.
Add common_parse_options() to centralize parsing of options used across
all tools.
Common options to be migrated in future patches.
Changes since v1:
- restore opterr
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Add option --bpf-action that allows the user to attach an external BPF
program that will be executed via BPF tail call on latency threshold
overflow.
Executing additional BPF code on latency threshold overflow allows doing
low-latency and in-kernel troubleshooting of the cause of the overflow.
The option takes an argument, which is a path to a BPF ELF file
expected to contain a function named "action_handler" in a section named
"tp/timerlat_action" (the section is necessary for libbpf to assign the
correct BPF program type to it).
Link: https://lore.kernel.org/r/20251126144205.331954-3-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Add a map to the rtla-timerlat BPF program that holds a file descriptor
of another BPF program, to be executed on threshold overflow.
timerlat_bpf_set_action() is added as an interface to set the program.
Link: https://lore.kernel.org/r/20251126144205.331954-2-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
The rtla tools have significant code quadruplication in their usage
functions. Each tool implements its own version of the same help text
formatting and option descriptions, leading to maintenance overhead and
inconsistencies. Documentation/tools/rtla/common_options.rst lists 14
common options.
Add common_usage() infrastructure to consolidate help formatting.
Subsequent patches will extend this to handle other common options.
The refactored output is almost identical to the original, with the
following changes:
- add square brackets to specify optionality: `usage: [rtla] ...`
- remove `-q` from timerlat hist because hist tools don't support it
- minor spacing
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251124063204.845425-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
This avoids startup races where one of the instances hit a threshold
before all instances were enabled, and thus tracing stops without
the relevant event. In particular, this is not uncommon with the
tests that set a very tight threshold and then complain if there's
no analysis.
This also ensures that we don't stop tracing during a warmup.
The downside is a small chance of having an event over the threshold
early in the output, without stopping on it, which could cause user
confusion. This should be less likely if the warmup feature is used, but
that doesn't eliminate the race window, just the odds of an unusual spike
right at that moment.
Signed-off-by: Crystal Wood <crwood@redhat.com>
Link: https://lore.kernel.org/r/20251112152529.956778-6-crwood@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Historically four function declarations remain orphaned or duplicated.
Remove them to keep the source clean.
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251012071133.290225-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Comparing to exactly 1 will fail if more than one ring buffer
event was seen since the last call to timerlat_bpf_wait(), which
can happen in some race scenarios.
Signed-off-by: Crystal Wood <crwood@redhat.com>
Link: https://lore.kernel.org/r/20251112152529.956778-5-crwood@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
When running rtla as
`rtla <timerlat|osnoise> <top|hist> -t custom_file.txt -a 100`
-a options override trace output filename specified by -t option.
Running the command above will create <timerlat|osnoise>_trace.txt file
instead of custom_file.txt. Fix this by making sure that -a option does
not override trace output filename even if it's passed after trace
output filename is specified.
Fixes: 173a3b014827 ("rtla/timerlat: Add the automatic trace option")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/b6ae60424050b2c1c8709e18759adead6012b971.1762186418.git.ipravdin.official@gmail.com
[ use capital letter in subject, as required by tracing subsystem ]
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Currently, user can only specify cgroup to the tracer's thread the
following ways:
`-C[cgroup]`
`-C[=cgroup]`
`--cgroup[=cgroup]`
If user tries to specify cgroup as `-C [cgroup]` or `--cgroup [cgroup]`,
the parser silently fails and rtla's cgroup is used for the tracer
threads.
To make interface more user-friendly, allow user to specify cgroup in
the aforementioned way, i.e. `-C [cgroup]` and `--cgroup [cgroup]`.
Refactor identical logic between -t/--trace and -C/--cgroup into a
common function.
Change documentation to reflect this user interface change.
Fixes: a957cbc02531 ("rtla: Add -C cgroup support")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/16132f1565cf5142b5fbd179975be370b529ced7.1762186418.git.ipravdin.official@gmail.com
[ use capital letter in subject, as required by tracing subsystem ]
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.
Replace osnoise_hist_usage("...") with fatal("...") on errors.
Remove the already unused 'usage' argument from osnoise_hist_usage().
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-6-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.
Replace osnoise_top_usage("...") with fatal("...") on errors.
Remove the already unused 'usage' argument from osnoise_top_usage().
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-5-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.
Replace timerlat_hist_usage("...\n") with fatal("...") on errors.
Remove the already unused 'usage' argument from timerlat_hist_usage().
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-4-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.
Replace timerlat_top_usage("...\n") with fatal("...") on errors.
Remove the already unused 'usage' argument from timerlat_top_usage().
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-3-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
The code contains some technical debt in error handling,
which complicates the consolidation of duplicated code.
Introduce an fatal() function to replace the common pattern of
err_msg() followed by exit(EXIT_FAILURE), reducing the length of an
already long function.
Further patches using fatal() follow.
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-2-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
Commit 8d933d5c89e8 ("rtla/timerlat: Add continue action") moved the
code performing on-threshold actions (enabled through --on-threshold
option) to inside the RTLA main loop.
The condition in the loop does not check whether the threshold was
actually exceeded or if stop tracing was requested by the user through
SIGINT or duration. This leads to a bug where on-threshold actions are
always performed, even when the threshold was not hit.
(BPF mode is not affected, since it uses a different condition in the
while loop.)
Add a condition that checks for !stop_tracing before executing the
actions. Also, fix incorrect brackets in hist_main_loop to match the
semantics of top_main_loop.
Fixes: 8d933d5c89e8 ("rtla/timerlat: Add continue action")
Fixes: 2f3172f9dd58 ("tools/rtla: Consolidate code between osnoise/timerlat and hist/top")
Reviewed-by: Crystal Wood <crwood@redhat.com>
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251007095341.186923-1-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
rtla-timerlat allows a *thread* latency threshold to be set via the
-T/--thread option. However, the timerlat tracer calls this *total*
latency (stop_tracing_total_us), and stops tracing also when the
return-to-user latency is over the threshold.
Change the behavior of the timerlat BPF program to reflect what the
timerlat tracer is doing, to avoid discrepancy between stopping
collecting data in the BPF program and stopping tracing in the timerlat
tracer.
Cc: stable@vger.kernel.org
Fixes: e34293ddcebd ("rtla/timerlat: Add BPF skeleton to collect samples")
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251006143100.137255-1-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
In recently introduced timerlat_free(),
the variable 'nr_cpus' is not assigned.
Assign it with sysconf(_SC_NPROCESSORS_CONF) as done elsewhere.
Remove the culprit: -Wno-maybe-uninitialized. The rest of the
code is clean.
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Fixes: 2f3172f9dd58 ("tools/rtla: Consolidate code between osnoise/timerlat and hist/top")
Link: https://lore.kernel.org/r/20251002170846.437888-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
The longindex argument of getopt_long() is optional
and tied to the unused local variable option_index.
Remove it to shorten the four longest functions
and make the code neater.
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251002123553.389467-2-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
The rtla tools have many instances of iterating over CPUs while
checking if they are monitored.
Add a for_each_monitored_cpu() helper macro to make the code
more readable and reduce code duplication.
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251002123553.389467-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing tools updates from Steven Rostedt
- This is mostly just consolidating code between osnoise/timerlat and
top/hist for easier maintenance and less future divergence
* tag 'trace-tools-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tools/rtla: Add remaining support for osnoise actions
tools/rtla: Add test engine support for unexpected output
tools/rtla: Fix -A option name in test comment
tools/rtla: Consolidate code between osnoise/timerlat and hist/top
tools/rtla: Create common_apply_config()
tools/rtla: Move top/hist params into common struct
tools/rtla: Consolidate common parameters into shared structure
|
|
The condition to check if the actions buffer needs to be resized was
incorrect. The check `self->size >= self->len` would evaluate to
true on almost every call to `actions_new()`, causing the buffer to
be reallocated unnecessarily each time an action was added.
Fix the condition to `self->len >= self.size`, ensuring
that the buffer is only resized when it is actually full.
Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250915181101.52513-1-wander@redhat.com
Fixes: 6ea082b171e00 ("rtla/timerlat: Add action on threshold feature")
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Currently, tests 3 and 13-22 in tests/timerlat.t fail with error:
*** buffer overflow detected ***: terminated
timeout: the monitored command dumped core
The result of running `sudo make check` is
tests/timerlat.t (Wstat: 0 Tests: 22 Failed: 11)
Failed tests: 3, 13-22
Files=3, Tests=34, 140 wallclock secs ( 0.07 usr 0.01 sys + 27.63 cusr
27.96 csys = 55.67 CPU)
Result: FAIL
Fix buffer overflow in actions_parse to avoid this error. After this
change, the tests results are
tests/hwnoise.t ... ok
tests/osnoise.t ... ok
tests/timerlat.t .. ok
All tests successful.
Files=3, Tests=34, 186 wallclock secs ( 0.06 usr 0.01 sys + 41.10 cusr
44.38 csys = 85.55 CPU)
Result: PASS
Link: https://lore.kernel.org/164ffc2ec8edacaf1295789dad82a07817b6263d.1757034919.git.ipravdin.official@gmail.com
Fixes: 6ea082b171e0 ("rtla/timerlat: Add action on threshold feature")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The basic functionality came with the consolidation; now hook up the
command line options, and add documentation and tests.
Cc: John Kacur <jkacur@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/20250907022325.243930-8-crwood@redhat.com
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Currently a lot of code is duplicated between the different rtla tools,
making maintenance more difficult, and encouraging divergence such as
features that are only implemented for certain tools even though they
could be more broadly applicable.
Merge the various main() functions into a common run_tool() with an ops
struct for tool-specific details.
Implement enough support for actions on osnoise to not need to keep the
old params->trace_output path.
Cc: John Kacur <jkacur@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/20250907022325.243930-5-crwood@redhat.com
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Merge the common bits of osnoise_apply_config() and
timerlat_apply_config(). Put the result in a new common.c, and move
enough things to common.h so that common.c does not need to include
osnoise.h.
Cc: John Kacur <jkacur@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/20250907022325.243930-4-crwood@redhat.com
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The hist members were very similar between timerlat and top, so
just use one common hist struct.
output_divisor, quiet, and pretty printing are pretty generic
concepts that can go in the main struct even if not every
specific tool (currently) uses them.
Cc: John Kacur <jkacur@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/20250907022325.243930-3-crwood@redhat.com
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
timerlat_params and osnoise_params structures contain 15 identical
fields.
Introduce a new header common.h and define a common_params structure to
consolidate shared fields, reduce code duplication, and enhance
maintainability.
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250907022325.243930-2-crwood@redhat.com
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Implement actions on end next to actions on threshold. A new option,
--on-end is added, parallel to --on-threshold. Instead of being
executed whenever a latency threshold is reached, it is executed at the
end of the measurement.
For example:
$ rtla timerlat hist -d 5s --on-end trace
will save the trace output at the end.
All actions supported by --on-threshold are also supported by --on-end,
except for continue, which does nothing with --on-end.
Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250626123405.1496931-6-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Introduce option to resume tracing after a latency threshold overflow.
The option is implemented as an action named "continue".
Example:
$ rtla timerlat top -q -T 200 -d 1s --on-threshold \
exec,command="echo Threshold" --on-threshold continue
Threshold
Threshold
Threshold
Timer Latency
...
The feature is supported for both hist and top. After the continue
action is executed, processing of the list of actions is stopped and
tracing is resumed.
Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250626123405.1496931-5-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Currently, rtla-timerlat BPF program uses a global variable stored in a
.bss section to store whether tracing has been stopped.
Move the information to a separate map, so that it is easily writable
from userspace, and add a function that clears the value, resuming
tracing after it has been stopped.
Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250626123405.1496931-4-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Extend the functionality provided by the -t/--trace option, which
triggers saving the contents of a tracefs buffer after tracing is
stopped, to support implementing arbitrary actions.
A new option, --on-threshold, is added, taking an argument
that further specifies the action. Actions added in this patch are:
- trace[,file=<filename>]: Saves tracefs buffer, optionally taking a
filename.
- signal,num=<sig>,pid=<pid>: Sends signal to process. "parent" might
be specified instead of number to send signal to parent process.
- shell,command=<command>: Execute shell command.
Multiple actions may be specified and will be executed in order,
including multiple actions of the same type. Trace output requested via
-t and -a now adds a trace action to the end of the list.
If an action fails, the following actions are not executed. For
example, this command:
$ rtla timerlat -T 20 --on-threshold trace \
--on-threshold shell,command="grep ipi_send timerlat_trace.txt" \
--on-threshold signal,num=2,pid=parent
will send signal 2 (SIGINT) to parent process, but only if saved trace
contains the text "ipi_send".
This way, the feature can be used for flexible reactions on latency
spikes, and allows combining rtla with other tooling like perf.
Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250626123405.1496931-3-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
After the introduction of BPF-based sample collection, rtla-timerlat
effectively runs in one of three modes:
- Pure BPF mode, with tracefs only being used to set up the timerlat
tracer. Sample processing and stop on threshold are handled by BPF.
- tracefs mode. BPF is unsupported or kernel is lacking the necessary
trace event (osnoise:timerlat_sample). Stop on theshold is handled by
timerlat tracer stopping tracing in all instances.
- BPF/tracefs mixed mode - BPF is used for sample collection for top or
histogram, tracefs is used for trace output and/or auto-analysis. Stop
on threshold is handled both through BPF program, which stops sample
collection for top/histogram and wakes up rtla, and by timerlat
tracer, which stops tracing for trace output/auto-analysis instances.
Add enum timerlat_tracing_mode, with three values:
- TRACING_MODE_BPF
- TRACING_MODE_TRACEFS
- TRACING_MODE_MIXED
Those represent the modes described above. A field of this type is added
to struct timerlat_params, named "mode", replacing the no_bpf variable.
params->mode is set in timerlat_{top,hist}_parse_args to
TRACING_MODE_BPF or TRACING_MODE_MIXED based on whether trace output
and/or auto-analysis is requested. timerlat_{top,hist}_main then checks
if BPF is not unavailable or disabled, in that case, it sets
params->mode to TRACING_MODE_TRACEFS.
A condition is added to timerlat_apply_config that skips setting
timerlat tracer thresholds if params->mode is TRACING_MODE_BPF (those
are unnecessary, since they only turn off tracing, which is already
turned off in that case, since BPF is used to collect samples).
Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250626123405.1496931-2-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|