summaryrefslogtreecommitdiffstats
path: root/tools/tracing/rtla/src
AgeCommit message (Collapse)AuthorLines
2026-01-13rtla: Fix parse_cpu_set() bug introduced by strtoi()Costa Shulyupin-6/+4
The patch 'Replace atoi() with a robust strtoi()' introduced a bug in parse_cpu_set(), which relies on partial parsing of the input string. The function parses CPU specifications like '0-3,5' by incrementing a pointer through the string. strtoi() rejects strings with trailing characters, causing parse_cpu_set() to fail on any CPU list with multiple entries. Restore the original use of atoi() in parse_cpu_set(). Fixes: 7e9dfccf8f11 ("rtla: Replace atoi() with a robust strtoi()") Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20260112192642.212848-2-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07rtla: Fix parse_cpu_set() return value documentationWander Lairson Costa-1/+1
Correct the return value documentation for parse_cpu_set() function in utils.c. The comment incorrectly stated that the function returns 1 on success and 0 on failure, but the actual implementation returns 0 on success and 1 on failure, following the common error-on-nonzero convention used throughout the codebase. This documentation fix ensures that developers reading the code understand the correct return value semantics and prevents potential misuse of the function's return value in conditional checks. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20260106133655.249887-18-wander@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07rtla: Ensure null termination after read operations in utils.cWander Lairson Costa-0/+3
Add explicit null termination and buffer initialization for read() operations in procfs_is_workload_pid() and get_self_cgroup() functions. The read() system call does not null-terminate the data it reads, and when the buffer is filled to capacity, subsequent string operations will read past the buffer boundary searching for a null terminator. In procfs_is_workload_pid(), explicitly set buffer[MAX_PATH-1] to '\0' to ensure the buffer is always null-terminated before passing it to strncmp(). In get_self_cgroup(), use memset() to zero the path buffer before reading, which ensures null termination when retval is less than MAX_PATH. Additionally, set path[MAX_PATH-1] to '\0' after the read to handle the case where the buffer is filled completely. These defensive buffer handling practices prevent potential buffer overruns and align with the ongoing buffer safety improvements across the rtla codebase. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20260106133655.249887-17-wander@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07rtla: Make stop_tracing variable volatileWander Lairson Costa-2/+2
The stop_tracing global variable is accessed from both the signal handler context and the main program flow without synchronization. This creates a potential race condition where compiler optimizations could cache the variable value in registers, preventing the signal handler's updates from being visible to other parts of the program. Add the volatile qualifier to stop_tracing in both common.c and common.h to ensure all accesses to this variable bypass compiler optimizations and read directly from memory. This guarantees that when the signal handler sets stop_tracing, the change is immediately visible to the main program loop, preventing potential hangs or delayed shutdown when termination signals are received. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20260106133655.249887-16-wander@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07rtla: Fix NULL pointer dereference in actions_parseWander Lairson Costa-0/+2
The actions_parse() function uses strtok() to tokenize the trigger string, but does not check if the returned token is NULL before passing it to strcmp(). If the trigger parameter is an empty string or contains only delimiter characters, strtok() returns NULL, causing strcmp() to dereference a NULL pointer and crash the program. This issue can be triggered by malformed user input or edge cases in trigger string parsing. Add a NULL check immediately after the strtok() call to validate that a token was successfully extracted before using it. If no token is found, the function now returns -1 to indicate a parsing error. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20260106133655.249887-13-wander@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07rtla: Remove unused headersWander Lairson Costa-4/+0
Remove unused includes for <errno.h> and <signal.h> to clean up the code and reduce unnecessary dependencies. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20260106133655.249887-12-wander@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07rtla: Remove redundant memset after callocWander Lairson Costa-2/+0
The actions struct is allocated using calloc, which already returns zeroed memory. The subsequent memset call to zero the 'present' member is therefore redundant. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20260106133655.249887-10-wander@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07rtla: Use standard exit codes for result enumWander Lairson Costa-3/+4
The result enum defines custom values for PASSED, ERROR, and FAILED. These values correspond to standard exit codes EXIT_SUCCESS and EXIT_FAILURE. Update the enum to use the standard macros EXIT_SUCCESS and EXIT_FAILURE to improve readability and adherence to standard C practices. The FAILED value is implicitly assigned EXIT_FAILURE + 1, so there is no need to assign an explicit value. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20260106133655.249887-9-wander@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07rtla: Replace atoi() with a robust strtoi()Wander Lairson Costa-8/+41
The atoi() function does not perform error checking, which can lead to undefined behavior when parsing invalid or out-of-range strings. This can cause issues when parsing user-provided numerical inputs, such as signal numbers, PIDs, or CPU lists. To address this, introduce a new strtoi() helper function that safely converts a string to an integer. This function validates the input and checks for overflows, returning a negative value on failure. Replace all calls to atoi() with the new strtoi() function and add proper error handling to make the parsing more robust and prevent potential issues. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20260106133655.249887-5-wander@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07rtla: Introduce for_each_action() helperWander Lairson Costa-2/+9
The for loop to iterate over the list of actions is used in more than one place. To avoid code duplication and improve readability, introduce a for_each_action() helper macro. Replace the open-coded for loops with the new helper. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20260106133655.249887-4-wander@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Deduplicate cgroup path opening codeCosta Shulyupin-33/+32
Both set_pid_cgroup() and set_comm_cgroup() functions contain identical code for opening the cgroup.procs file. Extract this common code into a new helper function open_cgroup_procs() to reduce code duplication and improve maintainability. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251224125058.1771519-1-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Consolidate -H/--house-keeping option parsingCosta Shulyupin-33/+11
Each rtla tool duplicates parsing of -H/--house-keeping. Migrate the option parsing from individual tools to the common_parse_options(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-8-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Consolidate -P/--priority option parsingCosta Shulyupin-33/+11
Each rtla tool duplicates parsing of -P/--priority. Migrate the option parsing from individual tools to the common_parse_options(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-7-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Consolidate -e/--event option parsingCosta Shulyupin-52/+16
Each rtla tool duplicates parsing of -e/--event. Migrate the option parsing from individual tools to the common_parse_options(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-6-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Consolidate -d/--duration option parsingCosta Shulyupin-29/+11
Each rtla tool duplicates parsing of -d/--duration. Migrate the option parsing from individual tools to the common_parse_options(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-5-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Consolidate -D/--debug option parsingCosta Shulyupin-21/+9
Each rtla tool duplicates parsing of -D/--debug. Migrate the option parsing from individual tools to the common_parse_options(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-4-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Consolidate -C/--cgroup option parsingCosta Shulyupin-25/+10
Each rtla tool duplicates parsing of -C/--cgroup. Migrate the option parsing from individual tools to the common_parse_options(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-3-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Consolidate -c/--cpus option parsingCosta Shulyupin-33/+11
Each rtla tool duplicates parsing of -c/--cpus. Migrate the option parsing from individual tools to the common_parse_options(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-2-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Add common_parse_options()Costa Shulyupin-0/+48
Each rtla tool duplicates parsing of many common options. This creates maintenance overhead and risks inconsistencies when updating these options. Add common_parse_options() to centralize parsing of options used across all tools. Common options to be migrated in future patches. Changes since v1: - restore opterr Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251209100047.2692515-1-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07rtla/timerlat: Add --bpf-action optionTomas Glozar-2/+80
Add option --bpf-action that allows the user to attach an external BPF program that will be executed via BPF tail call on latency threshold overflow. Executing additional BPF code on latency threshold overflow allows doing low-latency and in-kernel troubleshooting of the cause of the overflow. The option takes an argument, which is a path to a BPF ELF file expected to contain a function named "action_handler" in a section named "tp/timerlat_action" (the section is necessary for libbpf to assign the correct BPF program type to it). Link: https://lore.kernel.org/r/20251126144205.331954-3-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07rtla/timerlat: Support tail call from BPF programTomas Glozar-4/+35
Add a map to the rtla-timerlat BPF program that holds a file descriptor of another BPF program, to be executed on threshold overflow. timerlat_bpf_set_action() is added as an interface to set the program. Link: https://lore.kernel.org/r/20251126144205.331954-2-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07tools/rtla: Add common_usage()Costa Shulyupin-64/+81
The rtla tools have significant code quadruplication in their usage functions. Each tool implements its own version of the same help text formatting and option descriptions, leading to maintenance overhead and inconsistencies. Documentation/tools/rtla/common_options.rst lists 14 common options. Add common_usage() infrastructure to consolidate help formatting. Subsequent patches will extend this to handle other common options. The refactored output is almost identical to the original, with the following changes: - add square brackets to specify optionality: `usage: [rtla] ...` - remove `-q` from timerlat hist because hist tools don't support it - minor spacing Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251124063204.845425-1-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07rtla: Set stop threshold after all instances are enabledCrystal Wood-37/+38
This avoids startup races where one of the instances hit a threshold before all instances were enabled, and thus tracing stops without the relevant event. In particular, this is not uncommon with the tests that set a very tight threshold and then complain if there's no analysis. This also ensures that we don't stop tracing during a warmup. The downside is a small chance of having an event over the threshold early in the output, without stopping on it, which could cause user confusion. This should be less likely if the warmup feature is used, but that doesn't eliminate the race window, just the odds of an unusual spike right at that moment. Signed-off-by: Crystal Wood <crwood@redhat.com> Link: https://lore.kernel.org/r/20251112152529.956778-6-crwood@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-06tools/rtla: Remove unused function declarationsCosta Shulyupin-4/+0
Historically four function declarations remain orphaned or duplicated. Remove them to keep the source clean. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/r/20251012071133.290225-1-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21rtla/timerlat: Exit top main loop on any non-zero wait_retvalCrystal Wood-1/+1
Comparing to exactly 1 will fail if more than one ring buffer event was seen since the last call to timerlat_bpf_wait(), which can happen in some race scenarios. Signed-off-by: Crystal Wood <crwood@redhat.com> Link: https://lore.kernel.org/r/20251112152529.956778-5-crwood@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21rtla: Fix -a overriding -t argumentIvan Pravdin-4/+8
When running rtla as `rtla <timerlat|osnoise> <top|hist> -t custom_file.txt -a 100` -a options override trace output filename specified by -t option. Running the command above will create <timerlat|osnoise>_trace.txt file instead of custom_file.txt. Fix this by making sure that -a option does not override trace output filename even if it's passed after trace output filename is specified. Fixes: 173a3b014827 ("rtla/timerlat: Add the automatic trace option") Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/b6ae60424050b2c1c8709e18759adead6012b971.1762186418.git.ipravdin.official@gmail.com [ use capital letter in subject, as required by tracing subsystem ] Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21rtla: Fix -C/--cgroup interfaceIvan Pravdin-76/+55
Currently, user can only specify cgroup to the tracer's thread the following ways: `-C[cgroup]` `-C[=cgroup]` `--cgroup[=cgroup]` If user tries to specify cgroup as `-C [cgroup]` or `--cgroup [cgroup]`, the parser silently fails and rtla's cgroup is used for the tracer threads. To make interface more user-friendly, allow user to specify cgroup in the aforementioned way, i.e. `-C [cgroup]` and `--cgroup [cgroup]`. Refactor identical logic between -t/--trace and -C/--cgroup into a common function. Change documentation to reflect this user interface change. Fixes: a957cbc02531 ("rtla: Add -C cgroup support") Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/16132f1565cf5142b5fbd179975be370b529ced7.1762186418.git.ipravdin.official@gmail.com [ use capital letter in subject, as required by tracing subsystem ] Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21tools/rtla: Replace osnoise_hist_usage("...") with fatal("...")Costa Shulyupin-19/+13
A long time ago, when the usage help was short, it was a favor to the user to show it on error. Now that the usage help has become very long, it is too noisy to dump the complete help text for each typo after the error message itself. Replace osnoise_hist_usage("...") with fatal("...") on errors. Remove the already unused 'usage' argument from osnoise_hist_usage(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251011082738.173670-6-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21tools/rtla: Replace osnoise_top_usage("...") with fatal("...")Costa Shulyupin-16/+10
A long time ago, when the usage help was short, it was a favor to the user to show it on error. Now that the usage help has become very long, it is too noisy to dump the complete help text for each typo after the error message itself. Replace osnoise_top_usage("...") with fatal("...") on errors. Remove the already unused 'usage' argument from osnoise_top_usage(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251011082738.173670-5-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21tools/rtla: Replace timerlat_hist_usage("...") with fatal("...")Costa Shulyupin-19/+13
A long time ago, when the usage help was short, it was a favor to the user to show it on error. Now that the usage help has become very long, it is too noisy to dump the complete help text for each typo after the error message itself. Replace timerlat_hist_usage("...\n") with fatal("...") on errors. Remove the already unused 'usage' argument from timerlat_hist_usage(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251011082738.173670-4-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21tools/rtla: Replace timerlat_top_usage("...") with fatal("...")Costa Shulyupin-17/+11
A long time ago, when the usage help was short, it was a favor to the user to show it on error. Now that the usage help has become very long, it is too noisy to dump the complete help text for each typo after the error message itself. Replace timerlat_top_usage("...\n") with fatal("...") on errors. Remove the already unused 'usage' argument from timerlat_top_usage(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251011082738.173670-3-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21tools/rtla: Add fatal() and replace error handling patternCosta Shulyupin-129/+81
The code contains some technical debt in error handling, which complicates the consolidation of duplicated code. Introduce an fatal() function to replace the common pattern of err_msg() followed by exit(EXIT_FAILURE), reducing the length of an already long function. Further patches using fatal() follow. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251011082738.173670-2-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20tools/rtla: Fix --on-threshold always triggeringTomas Glozar-9/+15
Commit 8d933d5c89e8 ("rtla/timerlat: Add continue action") moved the code performing on-threshold actions (enabled through --on-threshold option) to inside the RTLA main loop. The condition in the loop does not check whether the threshold was actually exceeded or if stop tracing was requested by the user through SIGINT or duration. This leads to a bug where on-threshold actions are always performed, even when the threshold was not hit. (BPF mode is not affected, since it uses a different condition in the while loop.) Add a condition that checks for !stop_tracing before executing the actions. Also, fix incorrect brackets in hist_main_loop to match the semantics of top_main_loop. Fixes: 8d933d5c89e8 ("rtla/timerlat: Add continue action") Fixes: 2f3172f9dd58 ("tools/rtla: Consolidate code between osnoise/timerlat and hist/top") Reviewed-by: Crystal Wood <crwood@redhat.com> Reviewed-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20251007095341.186923-1-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20rtla/timerlat_bpf: Stop tracing on user latencyTomas Glozar-0/+3
rtla-timerlat allows a *thread* latency threshold to be set via the -T/--thread option. However, the timerlat tracer calls this *total* latency (stop_tracing_total_us), and stops tracing also when the return-to-user latency is over the threshold. Change the behavior of the timerlat BPF program to reflect what the timerlat tracer is doing, to avoid discrepancy between stopping collecting data in the BPF program and stopping tracing in the timerlat tracer. Cc: stable@vger.kernel.org Fixes: e34293ddcebd ("rtla/timerlat: Add BPF skeleton to collect samples") Reviewed-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20251006143100.137255-1-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20tools/rtla: Fix unassigned nr_cpusCosta Shulyupin-1/+2
In recently introduced timerlat_free(), the variable 'nr_cpus' is not assigned. Assign it with sysconf(_SC_NPROCESSORS_CONF) as done elsewhere. Remove the culprit: -Wno-maybe-uninitialized. The rest of the code is clean. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Fixes: 2f3172f9dd58 ("tools/rtla: Consolidate code between osnoise/timerlat and hist/top") Link: https://lore.kernel.org/r/20251002170846.437888-1-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20tools/rtla: Remove unused optional option_indexCosta Shulyupin-16/+4
The longindex argument of getopt_long() is optional and tied to the unused local variable option_index. Remove it to shorten the four longest functions and make the code neater. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251002123553.389467-2-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20tools/rtla: Add for_each_monitored_cpu() helperCosta Shulyupin-58/+23
The rtla tools have many instances of iterating over CPUs while checking if they are monitored. Add a for_each_monitored_cpu() helper macro to make the code more readable and reduce code duplication. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251002123553.389467-1-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-10-05Merge tag 'trace-tools-v6.18' of ↵Linus Torvalds-1636/+1260
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing tools updates from Steven Rostedt - This is mostly just consolidating code between osnoise/timerlat and top/hist for easier maintenance and less future divergence * tag 'trace-tools-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tools/rtla: Add remaining support for osnoise actions tools/rtla: Add test engine support for unexpected output tools/rtla: Fix -A option name in test comment tools/rtla: Consolidate code between osnoise/timerlat and hist/top tools/rtla: Create common_apply_config() tools/rtla: Move top/hist params into common struct tools/rtla: Consolidate common parameters into shared structure
2025-09-27rtla/actions: Fix condition for buffer reallocationWander Lairson Costa-1/+1
The condition to check if the actions buffer needs to be resized was incorrect. The check `self->size >= self->len` would evaluate to true on almost every call to `actions_new()`, causing the buffer to be reallocated unnecessarily each time an action was added. Fix the condition to `self->len >= self.size`, ensuring that the buffer is only resized when it is actually full. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250915181101.52513-1-wander@redhat.com Fixes: 6ea082b171e00 ("rtla/timerlat: Add action on threshold feature") Signed-off-by: Wander Lairson Costa <wander@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27rtla: Fix buffer overflow in actions_parseIvan Pravdin-1/+1
Currently, tests 3 and 13-22 in tests/timerlat.t fail with error: *** buffer overflow detected ***: terminated timeout: the monitored command dumped core The result of running `sudo make check` is tests/timerlat.t (Wstat: 0 Tests: 22 Failed: 11) Failed tests: 3, 13-22 Files=3, Tests=34, 140 wallclock secs ( 0.07 usr 0.01 sys + 27.63 cusr 27.96 csys = 55.67 CPU) Result: FAIL Fix buffer overflow in actions_parse to avoid this error. After this change, the tests results are tests/hwnoise.t ... ok tests/osnoise.t ... ok tests/timerlat.t .. ok All tests successful. Files=3, Tests=34, 186 wallclock secs ( 0.06 usr 0.01 sys + 41.10 cusr 44.38 csys = 85.55 CPU) Result: PASS Link: https://lore.kernel.org/164ffc2ec8edacaf1295789dad82a07817b6263d.1757034919.git.ipravdin.official@gmail.com Fixes: 6ea082b171e0 ("rtla/timerlat: Add action on threshold feature") Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27tools/rtla: Add remaining support for osnoise actionsCrystal Wood-9/+53
The basic functionality came with the consolidation; now hook up the command line options, and add documentation and tests. Cc: John Kacur <jkacur@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-8-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27tools/rtla: Consolidate code between osnoise/timerlat and hist/topCrystal Wood-1194/+792
Currently a lot of code is duplicated between the different rtla tools, making maintenance more difficult, and encouraging divergence such as features that are only implemented for certain tools even though they could be more broadly applicable. Merge the various main() functions into a common run_tool() with an ops struct for tool-specific details. Implement enough support for actions on osnoise to not need to keep the old params->trace_output path. Cc: John Kacur <jkacur@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-5-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27tools/rtla: Create common_apply_config()Crystal Wood-150/+142
Merge the common bits of osnoise_apply_config() and timerlat_apply_config(). Put the result in a new common.c, and move enough things to common.h so that common.c does not need to include osnoise.h. Cc: John Kacur <jkacur@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-4-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27tools/rtla: Move top/hist params into common structCrystal Wood-163/+152
The hist members were very similar between timerlat and top, so just use one common hist struct. output_divisor, quiet, and pretty printing are pretty generic concepts that can go in the main struct even if not every specific tool (currently) uses them. Cc: John Kacur <jkacur@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-3-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27tools/rtla: Consolidate common parameters into shared structureCosta Shulyupin-252/+253
timerlat_params and osnoise_params structures contain 15 identical fields. Introduce a new header common.h and define a common_params structure to consolidate shared fields, reduce code duplication, and enhance maintainability. Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-2-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/timerlat: Add action on end featureTomas Glozar-29/+65
Implement actions on end next to actions on threshold. A new option, --on-end is added, parallel to --on-threshold. Instead of being executed whenever a latency threshold is reached, it is executed at the end of the measurement. For example: $ rtla timerlat hist -d 5s --on-end trace will save the trace output at the end. All actions supported by --on-threshold are also supported by --on-end, except for continue, which does nothing with --on-end. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-6-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/timerlat: Add continue actionTomas Glozar-29/+100
Introduce option to resume tracing after a latency threshold overflow. The option is implemented as an action named "continue". Example: $ rtla timerlat top -q -T 200 -d 1s --on-threshold \ exec,command="echo Threshold" --on-threshold continue Threshold Threshold Threshold Timer Latency ... The feature is supported for both hist and top. After the continue action is executed, processing of the list of actions is stopped and tracing is resumed. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-5-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/timerlat_bpf: Allow resuming tracingTomas Glozar-4/+25
Currently, rtla-timerlat BPF program uses a global variable stored in a .bss section to store whether tracing has been stopped. Move the information to a separate map, so that it is easily writable from userspace, and add a function that clears the value, resuming tracing after it has been stopped. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-4-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/timerlat: Add action on threshold featureTomas Glozar-22/+341
Extend the functionality provided by the -t/--trace option, which triggers saving the contents of a tracefs buffer after tracing is stopped, to support implementing arbitrary actions. A new option, --on-threshold, is added, taking an argument that further specifies the action. Actions added in this patch are: - trace[,file=<filename>]: Saves tracefs buffer, optionally taking a filename. - signal,num=<sig>,pid=<pid>: Sends signal to process. "parent" might be specified instead of number to send signal to parent process. - shell,command=<command>: Execute shell command. Multiple actions may be specified and will be executed in order, including multiple actions of the same type. Trace output requested via -t and -a now adds a trace action to the end of the list. If an action fails, the following actions are not executed. For example, this command: $ rtla timerlat -T 20 --on-threshold trace \ --on-threshold shell,command="grep ipi_send timerlat_trace.txt" \ --on-threshold signal,num=2,pid=parent will send signal 2 (SIGINT) to parent process, but only if saved trace contains the text "ipi_send". This way, the feature can be used for flexible reactions on latency spikes, and allows combining rtla with other tooling like perf. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-3-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/timerlat: Introduce enum timerlat_tracing_modeTomas Glozar-53/+97
After the introduction of BPF-based sample collection, rtla-timerlat effectively runs in one of three modes: - Pure BPF mode, with tracefs only being used to set up the timerlat tracer. Sample processing and stop on threshold are handled by BPF. - tracefs mode. BPF is unsupported or kernel is lacking the necessary trace event (osnoise:timerlat_sample). Stop on theshold is handled by timerlat tracer stopping tracing in all instances. - BPF/tracefs mixed mode - BPF is used for sample collection for top or histogram, tracefs is used for trace output and/or auto-analysis. Stop on threshold is handled both through BPF program, which stops sample collection for top/histogram and wakes up rtla, and by timerlat tracer, which stops tracing for trace output/auto-analysis instances. Add enum timerlat_tracing_mode, with three values: - TRACING_MODE_BPF - TRACING_MODE_TRACEFS - TRACING_MODE_MIXED Those represent the modes described above. A field of this type is added to struct timerlat_params, named "mode", replacing the no_bpf variable. params->mode is set in timerlat_{top,hist}_parse_args to TRACING_MODE_BPF or TRACING_MODE_MIXED based on whether trace output and/or auto-analysis is requested. timerlat_{top,hist}_main then checks if BPF is not unavailable or disabled, in that case, it sets params->mode to TRACING_MODE_TRACEFS. A condition is added to timerlat_apply_config that skips setting timerlat tracer thresholds if params->mode is TRACING_MODE_BPF (those are unnecessary, since they only turn off tracing, which is already turned off in that case, since BPF is used to collect samples). Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-2-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>