| Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Unify guest entry code for KVM and MSHV (Sean Christopherson)
- Switch Hyper-V MSI domain to use msi_create_parent_irq_domain()
(Nam Cao)
- Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV
(Mukesh Rathor)
- Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov)
- Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna
Kumar T S M)
- Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari,
Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley)
* tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
hyperv: Remove the spurious null directive line
MAINTAINERS: Mark hyperv_fb driver Obsolete
fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver
Drivers: hv: Make CONFIG_HYPERV bool
Drivers: hv: Add CONFIG_HYPERV_VMBUS option
Drivers: hv: vmbus: Fix typos in vmbus_drv.c
Drivers: hv: vmbus: Fix sysfs output format for ring buffer index
Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store()
x86/hyperv: Switch to msi_create_parent_irq_domain()
mshv: Use common "entry virt" APIs to do work in root before running guest
entry: Rename "kvm" entry code assets to "virt" to genericize APIs
entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper
mshv: Handle NEED_RESCHED_LAZY before transferring to guest
x86/hyperv: Add kexec/kdump support on Azure CVMs
Drivers: hv: Simplify data structures for VMBus channel close message
Drivers: hv: util: Cosmetic changes for hv_utils_transport.c
mshv: Add support for a new parent partition configuration
clocksource: hyper-v: Skip unnecessary checks for the root partition
hyperv: Add missing field to hv_output_map_device_interrupt
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Paul McKenney:
"Documentation updates:
- Update whatisRCU.rst and checklist.rst for recent RCU API additions
- Fix RCU documentation formatting and typos
- Replace dead Ottawa Linux Symposium links in RTFP.txt
Miscellaneous RCU updates:
- Document that rcu_barrier() hurries RCU_LAZY callbacks
- Remove redundant interrupt disabling from
rcu_preempt_deferred_qs_handler()
- Move list_for_each_rcu from list.h to rculist.h, and adjust the
include directive in kernel/cgroup/dmem.c accordingly
- Make initial set of changes to accommodate upcoming
system_percpu_wq changes
SRCU updates:
- Create an srcu_read_lock_fast_notrace() for eventual use in
tracing, including adding guards
- Document the reliance on per-CPU operations as implicit RCU readers
in __srcu_read_{,un}lock_fast()
- Document the srcu_flip() function's memory-barrier D's relationship
to SRCU-fast readers
- Remove a redundant preempt_disable() and preempt_enable() pair from
srcu_gp_start_if_needed()
Torture-test updates:
- Fix jitter.sh spin time so that it actually varies as advertised.
It is still quite coarse-grained, but at least it does now vary
- Update torture.sh help text to include the not-so-new --do-normal
parameter, which permits (for example) testing KCSAN kernels
without doing non-debug kernels
- Fix a number of false-positive diagnostics that were being
triggered by rcutorture starting before boot completed. Running
multiple near-CPU-bound rcutorture processes when there is only the
boot CPU is after all a bit excessive
- Substitute kcalloc() for kzalloc()
- Remove a redundant kfree() and NULL out kfree()ed objects"
* tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits)
rcu: WQ_UNBOUND added to sync_wq workqueue
rcu: WQ_PERCPU added to alloc_workqueue users
rcu: replace use of system_wq with system_percpu_wq
refperf: Set reader_tasks to NULL after kfree()
refperf: Remove redundant kfree() after torture_stop_kthread()
srcu/tiny: Remove preempt_disable/enable() in srcu_gp_start_if_needed()
srcu: Document srcu_flip() memory-barrier D relation to SRCU-fast
srcu: Document __srcu_read_{,un}lock_fast() implicit RCU readers
rculist: move list_for_each_rcu() to where it belongs
refscale: Use kcalloc() instead of kzalloc()
rcutorture: Use kcalloc() instead of kzalloc()
docs: rcu: Replace multiple dead OLS links in RTFP.txt
doc: Fix typo in RCU's torture.rst documentation
Documentation: RCU: Retitle toctree index
Documentation: RCU: Reduce toctree depth
Documentation: RCU: Wrap kvm-remote.sh rerun snippet in literal code block
rcu: docs: Requirements.rst: Abide by conventions of kernel documentation
doc: Add RCU guards to checklist.rst
doc: Update whatisRCU.rst for recent RCU API additions
rcutorture: Delay forward-progress testing until boot completes
...
|
|
Rename the "kvm" entry code files and Kconfigs to use generic "virt"
nomenclature so that the code can be reused by other hypervisors (or
rather, their root/dom0 partition drivers), without incorrectly suggesting
the code somehow relies on and/or involves KVM.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Torture-test updates:
* rcutorture: Fix jitter.sh spin time
* torture: Add --do-normal parameter to torture.sh help text
* torture: Announce kernel boot status at torture-test startup
* rcutorture: Suppress "Writer stall state" reports during boot
* rcutorture: Delay rcutorture readers and writers until boot completes
* torture: Delay CPU-hotplug operations until boot completes
* rcutorture: Delay forward-progress testing until boot completes
* rcutorture,refscale: Use kcalloc() instead of kzalloc()
* refperf: Remove redundant kfree() after torture_stop_kthread()
* refperf: Set reader_tasks to NULL after kfree()
|
|
SRCU updates:
* Create srcu_read_{,un}lock_fast_notrace()
* Add srcu_read_lock_fast_notrace() and srcu_read_unlock_fast_notrace()
* Add guards for notrace variants of SRCU-fast readers
* Document srcu_read_{,un}lock_fast() use of implicit RCU readers
* Document srcu_flip() memory-barrier D relation to SRCU-fast
* Remove preempt_disable/enable() in Tiny SRCU srcu_gp_start_if_needed()
|
|
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This change add the WQ_UNBOUND flag to sync_wq, to make explicit this
workqueue can be unbound and that it does not benefit from per-cpu work.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This patch adds a new WQ_PERCPU flag to explicitly request the use of
the per-CPU behavior. Both flags coexist for one release cycle to allow
callers to transition their calls.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
All existing users have been updated accordingly.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
system_wq is a per-CPU worqueue, yet nothing in its name tells about that
CPU affinity constraint, which is very often not required by users. Make
it clear by adding a system_percpu_wq.
The old wq will be kept for a few release cylces.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Set reader_tasks to NULL after kfree() in ref_scale_cleanup() to
improve debugging experience with kernel debugging tools. This
follows the common pattern of NULLing pointers after freeing to
avoid dangling pointer issues during debugging sessions.
Setting pointers to NULL after freeing helps debugging tools like
kdgb,drgn, and other kernel debuggers by providing clear indication
that the memory has been freed and the pointer is no longer valid.
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Remove unnecessary kfree(main_task) call in ref_scale_cleanup() as
torture_stop_kthread() already handles the memory cleanup for the
task structure internally.
The additional kfree(main_task) call after torture_stop_kthread()
is redundant and confusing since torture_stop_kthread() sets the
pointer to NULL, making this a no-op.
This pattern is consistent with other torture test modules where
torture_stop_kthread() is called without explicit kfree() of the
task pointer, as the torture framework manages the task lifecycle
internally.
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Currently, the srcu_gp_start_if_needed() is always be invoked in
preempt disable's critical section, this commit therefore remove
redundant preempt_disable/enable() in srcu_gp_start_if_needed()
and adds a call to lockdep_assert_preemption_disabled() in order
to enable lockdep to diagnose mistaken invocations of this function
from preempts-enabled code.
Fixes: 65b4a59557f6 ("srcu: Make Tiny SRCU explicitly disable preemption")
Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The smp_mb() memory barrier at the end of srcu_flip() has a comment,
but that comment does not make it clear that this memory barrier is an
optimization, as opposed to being needed for correctness. This commit
therefore adds this information and points out that it is omitted
for SRCU-fast, where a much heavier weight synchronize_srcu() would
be required.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
|
|
Replace repeated (20 - PAGE_SHIFT) calculations with standard macros:
- MB_TO_PAGES(mb) converts MB to page count
- PAGES_TO_MB(pages) converts pages to MB
No functional change.
[akpm@linux-foundation.org: remove arc's private PAGES_TO_MB, remove its unused PAGES_TO_KB]
[akpm@linux-foundation.org: don't include mm.h due to include file ordering mess]
Link: https://lkml.kernel.org/r/20250718024134.1304745-1-ye.liu@linux.dev
Signed-off-by: Ye Liu <liuye@kylinos.cn>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lai jiangshan <jiangshanlai@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Use kcalloc() in main_func() to gain built-in overflow protection, making
memory allocation safer when calculating allocation size compared to
explicit multiplication.
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Use kcalloc() in rcu_torture_writer() to gain built-in overflow protection,
making memory allocation safer when calculating allocation size compared to
explicit multiplication.
Change sizeof(ulo[0]) and sizeof(rgo[0]) to sizeof(*ulo) and sizeof(*rgo),
as this is more consistent with coding conventions.
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Forward-progress testing can hog CPUs, which is not a great thing to do
before boot has completed. This commit therefore makes the CPU-hotplug
operations hold off until boot has completed.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The rcutorture writers and (especially) readers are the biggest CPU
hogs of the bunch, so this commit therefore makes them wait until boot
has completed.
This makes the current setting of the boot_ended local variable dead code,
so while in the area, this commit removes that as well.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
When rcutorture is running on only the one boot-time CPU while that CPU
is busy invoking initcall() functions, the added load is quite likely to
unduly delay the RCU grace-period kthread, rcutorture readers, and much
else besides. This can result in rcu_torture_stats_print() reporting
rcutorture writer stalls, which are not really a bug in that environment.
After all, one CPU can only do so much.
This commit therefore suppresses rcutorture writer stalls while the
kernel is booting, that is, while rcu_inkernel_boot_has_ended() continues
returning false.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The per-CPU rcu_data structure's ->defer_qs_iw field is initialized
by IRQ_WORK_INIT_HARD(), which means that the subsequent invocation of
rcu_preempt_deferred_qs_handler() will always be executed with interrupts
disabled. This commit therefore removes the local_irq_save/restore()
operations from rcu_preempt_deferred_qs_handler() and adds a call to
lockdep_assert_irqs_disabled() in order to enable lockdep to diagnose
mistaken invocations of this function from interrupts-enabled code.
Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit adds to the rcu_barrier() kerneldoc header stating that this
function hurries lazy callbacks and that it does not normally result in
additional RCU grace periods.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
RCU re-initializes the deferred QS irq work everytime before attempting
to queue it. However there are situations where the irq work is
attempted to be queued even though it is already queued. In that case
re-initializing messes-up with the irq work queue that is about to be
handled.
The chances for that to happen are higher when the architecture doesn't
support self-IPIs and irq work are then all lazy, such as with the
following sequence:
1) rcu_read_unlock() is called when IRQs are disabled and there is a
grace period involving blocked tasks on the node. The irq work
is then initialized and queued.
2) The related tasks are unblocked and the CPU quiescent state
is reported. rdp->defer_qs_iw_pending is reset to DEFER_QS_IDLE,
allowing the irq work to be requeued in the future (note the previous
one hasn't fired yet).
3) A new grace period starts and the node has blocked tasks.
4) rcu_read_unlock() is called when IRQs are disabled again. The irq work
is re-initialized (but it's queued! and its node is cleared) and
requeued. Which means it's requeued to itself.
5) The irq work finally fires with the tick. But since it was requeued
to itself, it loops and hangs.
Fix this with initializing the irq work only once before the CPU boots.
Fixes: b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- Add support for cgroup "cpu.max" interface
- Code organization cleanup so that ext_idle.c doesn't depend on the
source-file-inclusion build method of sched/
- Drop UP paths in accordance with sched core changes
- Documentation and other misc changes
* tag 'sched_ext-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Fix scx_bpf_reenqueue_local() reference
sched_ext: Drop kfuncs marked for removal in 6.15
sched_ext, rcu: Eject BPF scheduler on RCU CPU stall panic
kernel/sched/ext.c: fix typo "occured" -> "occurred" in comments
sched_ext: Add support for cgroup bandwidth control interface
sched_ext, sched/core: Factor out struct scx_task_group
sched_ext: Return NULL in llc_span
sched_ext: Always use SMP versions in kernel/sched/ext_idle.h
sched_ext: Always use SMP versions in kernel/sched/ext_idle.c
sched_ext: Always use SMP versions in kernel/sched/ext.h
sched_ext: Always use SMP versions in kernel/sched/ext.c
sched_ext: Documentation: Clarify time slice handling in task lifecycle
sched_ext: Make scx_locked_rq() inline
sched_ext: Make scx_rq_bypassing() inline
sched_ext: idle: Make local functions static in ext_idle.c
sched_ext: idle: Remove unnecessary ifdef in scx_bpf_cpu_node()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Neeraj Upadhyay:
"Expedited grace period updates:
- Protect against early RCU exp quiescent state reporting during exp
grace period initialization
- Remove superfluous barrier in task unblock path
- Remove the CPU online quiescent state report optimization, which is
error prone for certain scenarios
- Add warning for unexpected pending requested expedited quiescent
state on dying CPU
Core:
- Robustify rcu_is_cpu_rrupt_from_idle() by using more accurate
indicators of the actual context tracking state of a CPU
- Handle ->defer_qs_iw_pending field data race
- Enable rcu_normal_wake_from_gp by default on systems with <= 16
CPUs
- Fix lockup in rcu_read_unlock() due to recursive irq_exit() calls
- Refactor expedited handling condition in rcu_read_unlock_special()
- Documentation updates for hotplug and GP init scan ordering,
separation of rcu_state and rnp's gp_seq states, quiescent state
reporting for offline CPUs
torture-scripts:
- Cleanup and improve scripts : remove superfluous warnings for
disabled tests; better handling of kvm.sh --kconfig arg; suppress
some confusing diagnostics; tolerate bad kvm.sh args; add new
diagnostic for build output; fail allmodconfig testing on warnings
- Include RCU_TORTURE_TEST_CHK_RDR_STATE config for KCSAN kernels
- Disable default RCU-tasks and clocksource-wdog testing on arm64
- Add EXPERT Kconfig option for arm64 KCSAN runs
- Remove SRCU-lite testing
rcutorture:
- Start torture writer threads creation after reader threads to
handle race in SRCU-P scenario
- Add SRCU down_read()/up_read() test
- Add diagnostics for delayed SRCU up_read(), unmatched up_read(),
print number of up/down readers and the number of such readers
which migrated to other CPU
- Ignore certain unsupported configurations for trivial RCU test
- Fix splats in RT kernels due to inaccurate checks for BH-disabled
context
- Enable checks and logs to capture intentionally exercised
unexpected scenarios (too short readers) for BUSTED test
- Remove SRCU-lite testing
srcu:
- Expedite SRCU-fast grace periods
- Remove SRCU-lite implementation
- Add guards for SRCU-fast readers
rcu nocb:
- Dump NOCB group leader state on stall detection
- Robustify nocb_cb_kthread pointer accesses
- Fix delayed execution of hurry callbacks when LAZY_RCU is enabled
refscale:
- Fix multiplication overflow in "loops" and "nreaders" calculations"
* tag 'rcu.release.v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (49 commits)
rcu: Document concurrent quiescent state reporting for offline CPUs
rcu: Document separation of rcu_state and rnp's gp_seq
rcu: Document GP init vs hotplug-scan ordering requirements
srcu: Add guards for SRCU-fast readers
rcu: Fix delayed execution of hurry callbacks
rcu: Refactor expedited handling check in rcu_read_unlock_special()
checkpatch: Remove SRCU-lite deprecation
srcu: Remove SRCU-lite implementation
srcu: Expedite SRCU-fast grace periods
rcutorture: Remove support for SRCU-lite
rcutorture: Remove SRCU-lite scenarios
torture: Remove support for SRCU-lite
torture: Make torture.sh --allmodconfig testing fail on warnings
torture: Add "ERROR" diagnostic for testing kernel-build output
torture: Make torture.sh tolerate runs having bad kvm.sh arguments
torture: Add textid.txt file to --do-allmodconfig and --do-rcu-rust runs
torture: Extract testid.txt generation to separate script
torture: Suppress "find" diagnostics from torture.sh --do-none run
torture: Provide EXPERT Kconfig option for arm64 KCSAN torture.sh runs
rcu: Fix rcu_read_unlock() deadloop due to IRQ work
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Move sysctls out of the kern_table array
This is the final move of ctl_tables into their respective
subsystems. Only 5 (out of the original 50) will remain in
kernel/sysctl.c file; these handle either sysctl or common arch
variables.
By decentralizing sysctl registrations, subsystem maintainers regain
control over their sysctl interfaces, improving maintainability and
reducing the likelihood of merge conflicts.
- docs: Remove false positives from check-sysctl-docs
Stopped falsely identifying sysctls as undocumented or unimplemented
in the check-sysctl-docs script. This script can now be used to
automatically identify if documentation is missing.
* tag 'sysctl-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (23 commits)
docs: Downgrade arm64 & riscv from titles to comment
docs: Replace spaces with tabs in check-sysctl-docs
docs: Remove colon from ctltable title in vm.rst
docs: Add awk section for ucount sysctl entries
docs: Use skiplist when checking sysctl admin-guide
docs: nixify check-sysctl-docs
sysctl: rename kern_table -> sysctl_subsys_table
kernel/sys.c: Move overflow{uid,gid} sysctl into kernel/sys.c
uevent: mv uevent_helper into kobject_uevent.c
sysctl: Removed unused variable
sysctl: Nixify sysctl.sh
sysctl: Remove superfluous includes from kernel/sysctl.c
sysctl: Remove (very) old file changelog
sysctl: Move sysctl_panic_on_stackoverflow to kernel/panic.c
sysctl: move cad_pid into kernel/pid.c
sysctl: Move tainted ctl_table into kernel/panic.c
Input: sysrq: mv sysrq into drivers/tty/sysrq.c
fork: mv threads-max into kernel/fork.c
parisc/power: Move soft-power into power.c
mm: move randomize_va_space into memory.c
...
|
|
'torture-scripts.16.07.2025', 'srcu.19.07.2025', 'rcu.nocb.18.07.2025' and 'refscale.07.07.2025' into rcu.merge.23.07.2025
|
|
Move sysctl_panic_on_rcu_stall and sysctl_max_rcu_stall_to_panic into
the kernel/rcu subdirectory. Make these static in tree_stall.h and
removed them as extern from panic.h as their scope is now confined into
one file.
This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
The synchronization of CPU offlining with GP initialization is confusing
to put it mildly (rightfully so as the issue it deals with is complex).
Recent discussions brought up a question -- what prevents the
rcu_implicit_dyntick_qs() from warning about QS reports for offline
CPUs (missing QS reports for offline CPUs causing indefinite hangs).
QS reporting for now-offline CPUs should only happen from:
- gp_init()
- rcutree_cpu_report_dead()
Add some documentation on this and refer to it from comments in the code
explaining how QS reporting is not missed when these functions are
concurrently running.
I referred heavily to this post [1] about the need for the ofl_lock.
[1] https://lore.kernel.org/all/20180924164443.GF4222@linux.ibm.com/
[ Applied paulmck feedback on moving documentation to Requirements.rst ]
Link: https://lore.kernel.org/all/01b4d228-9416-43f8-a62e-124b92e8741a@paulmck-laptop/
Co-developed-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
The details of this are subtle and was discussed recently. Add a
quick-quiz about this and refer to it from the code, for more clarity.
Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
Add detailed comments explaining the critical ordering constraints
during RCU grace period initialization, based on discussions with
Frederic.
Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org>
Co-developed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
We observed a regression in our customer’s environment after enabling
CONFIG_LAZY_RCU. In the Android Update Engine scenario, where ioctl() is
used heavily, we found that callbacks queued via call_rcu_hurry (such as
percpu_ref_switch_to_atomic_rcu) can sometimes be delayed by up to 5
seconds before execution. This occurs because the new grace period does
not start immediately after the previous one completes.
The root cause is that the wake_nocb_gp_defer() function now checks
"rdp->nocb_defer_wakeup" instead of "rdp_gp->nocb_defer_wakeup". On CPUs
that are not rcuog, "rdp->nocb_defer_wakeup" may always be
RCU_NOCB_WAKE_NOT. This can cause "rdp_gp->nocb_defer_wakeup" to be
downgraded and the "rdp_gp->nocb_timer" to be postponed by up to 10
seconds, delaying the execution of hurry RCU callbacks.
The trace log of one scenario we encountered is as follow:
// previous GP ends at this point
rcu_preempt [000] d..1. 137.240210: rcu_grace_period: rcu_preempt 8369 end
rcu_preempt [000] ..... 137.240212: rcu_grace_period: rcu_preempt 8372 reqwait
// call_rcu_hurry enqueues "percpu_ref_switch_to_atomic_rcu", the callback waited on by UpdateEngine
update_engine [002] d..1. 137.301593: __call_rcu_common: wyy: unlikely p_ref = 00000000********. lazy = 0
// FirstQ on cpu 2 rdp_gp->nocb_timer is set to fire after 1 jiffy (4ms)
// and the rdp_gp->nocb_defer_wakeup is set to RCU_NOCB_WAKE
update_engine [002] d..2. 137.301595: rcu_nocb_wake: rcu_preempt 2 FirstQ on cpu2 with rdp_gp (cpu0).
// FirstBQ event on cpu2 during the 1 jiffy, make the timer postpond 10 seconds later.
// also, the rdp_gp->nocb_defer_wakeup is overwrite to RCU_NOCB_WAKE_LAZY
update_engine [002] d..1. 137.301601: rcu_nocb_wake: rcu_preempt 2 WakeEmptyIsDeferred
...
...
...
// before the 10 seconds timeout, cpu0 received another call_rcu_hurry
// reset the timer to jiffies+1 and set the waketype = RCU_NOCB_WAKE.
kworker/u32:0 [000] d..2. 142.557564: rcu_nocb_wake: rcu_preempt 0 FirstQ
kworker/u32:0 [000] d..1. 142.557576: rcu_nocb_wake: rcu_preempt 0 WakeEmptyIsDeferred
kworker/u32:0 [000] d..1. 142.558296: rcu_nocb_wake: rcu_preempt 0 WakeNot
kworker/u32:0 [000] d..1. 142.558562: rcu_nocb_wake: rcu_preempt 0 WakeNot
// idle(do_nocb_deferred_wakeup) wake rcuog due to waketype == RCU_NOCB_WAKE
<idle> [000] d..1. 142.558786: rcu_nocb_wake: rcu_preempt 0 DoWake
<idle> [000] dN.1. 142.558839: rcu_nocb_wake: rcu_preempt 0 DeferredWake
rcuog/0 [000] ..... 142.558871: rcu_nocb_wake: rcu_preempt 0 EndSleep
rcuog/0 [000] ..... 142.558877: rcu_nocb_wake: rcu_preempt 0 Check
// finally rcuog request a new GP at this point (5 seconds after the FirstQ event)
rcuog/0 [000] d..2. 142.558886: rcu_grace_period: rcu_preempt 8372 newreq
rcu_preempt [001] d..1. 142.559458: rcu_grace_period: rcu_preempt 8373 start
...
rcu_preempt [000] d..1. 142.564258: rcu_grace_period: rcu_preempt 8373 end
rcuop/2 [000] D..1. 142.566337: rcu_batch_start: rcu_preempt CBs=219 bl=10
// the hurry CB is invoked at this point
rcuop/2 [000] b.... 142.566352: blk_queue_usage_counter_release: wyy: wakeup. p_ref = 00000000********.
This patch changes the condition to check "rdp_gp->nocb_defer_wakeup" in
the lazy path. This prevents an already scheduled "rdp_gp->nocb_timer"
from being postponed and avoids overwriting "rdp_gp->nocb_defer_wakeup"
when it is not RCU_NOCB_WAKE_NOT.
Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power")
Co-developed-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
Co-developed-by: Lorry.Luo@mediatek.com
Signed-off-by: Lorry.Luo@mediatek.com
Tested-by: weiyangyang@vivo.com
Signed-off-by: weiyangyang@vivo.com
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
Extract the complex expedited handling condition in rcu_read_unlock_special()
into a separate function rcu_unlock_needs_exp_handling() with detailed
comments explaining each condition.
This improves code readability. No functional change intended.
Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
Currently, SRCU-fast grace periods use synchronize_rcu() to provide the
needed ordering with readers, even given an expedited SRCU-fast grace
period, which isn't all that expedited. This commit therefore instead
uses synchronize_rcu_expedited() if there is an expedited SRCU-fast
grace period in flight.
Of course, given an non-expedited SRCU-fast grace period blocked in
synchronize_rcu(), a later request for an expedited SRCU-fast grace
period will wait for that synchronize_rcu() to return before switching
to use of synchronize_rcu_expedited(). If this turns out to be a real
problem for a production workload, we can increase the complexity (but
likely also degrade the energy efficiency) to speed things up further.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
Because SRCU-lite is being replaced by SRCU-fast, this commit removes
support for SRCU-lite from rcutorture.c
Both SRCU-lite and SRCU-fast provide faster readers by dropping the
smp_mb() call from their lock and unlock primitives, but incur a pair
of added RCU grace periods during the SRCU grace period. There is a
trivial mapping from the SRCU-lite API to that of SRCU-fast, so there
should be no transition issues.
[ paulmck: Apply Christoph Hellwig feedback. ]
Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
Because SRCU-lite is being replaced by SRCU-fast, this commit removes
support for SRCU-lite from refscale.c.
Both SRCU-lite and SRCU-fast provide faster readers by dropping the
smp_mb() call from their lock and unlock primitives, but incur a pair
of added RCU grace periods during the SRCU grace period. There is a
trivial mapping from the SRCU-lite API to that of SRCU-fast, so there
should be no transition issues.
[ paulmck: Apply Christoph Hellwig feedback. ]
Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
During rcu_read_unlock_special(), if this happens during irq_exit(), we
can lockup if an IPI is issued. This is because the IPI itself triggers
the irq_exit() path causing a recursive lock up.
This is precisely what Xiongfeng found when invoking a BPF program on
the trace_tick_stop() tracepoint As shown in the trace below. Fix by
managing the irq_work state correctly.
irq_exit()
__irq_exit_rcu()
/* in_hardirq() returns false after this */
preempt_count_sub(HARDIRQ_OFFSET)
tick_irq_exit()
tick_nohz_irq_exit()
tick_nohz_stop_sched_tick()
trace_tick_stop() /* a bpf prog is hooked on this trace point */
__bpf_trace_tick_stop()
bpf_trace_run2()
rcu_read_unlock_special()
/* will send a IPI to itself */
irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
A simple reproducer can also be obtained by doing the following in
tick_irq_exit(). It will hang on boot without the patch:
static inline void tick_irq_exit(void)
{
+ rcu_read_lock();
+ WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
+ rcu_read_unlock();
+
Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Closes: https://lore.kernel.org/all/9acd5f9f-6732-7701-6880-4b51190aa070@huawei.com/
Tested-by: Qi Xi <xiqi2@huawei.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
[neeraj: Apply Frederic's suggested fix for PREEMPT_RT]
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
Automatically enable the rcu_normal_wake_from_gp parameter on
systems with a small number of CPUs. The activation threshold
is set to 16 CPUs.
This helps to reduce a latency of normal synchronize_rcu() API
by waking up GP-waiters earlier and decoupling synchronize_rcu()
callers from regular callback handling.
A benchmark running 64 parallel jobs(system with 64 CPUs) invoking
synchronize_rcu() demonstrates a notable latency reduction with the
setting enabled.
Latency distribution (microseconds):
<default>
0 - 9999 : 1
10000 - 19999 : 4
20000 - 29999 : 399
30000 - 39999 : 3197
40000 - 49999 : 10428
50000 - 59999 : 17363
60000 - 69999 : 15529
70000 - 79999 : 9287
80000 - 89999 : 4249
90000 - 99999 : 1915
100000 - 109999 : 922
110000 - 119999 : 390
120000 - 129999 : 187
...
<default>
<rcu_normal_wake_from_gp>
0 - 9999 : 1
10000 - 19999 : 234
20000 - 29999 : 6678
30000 - 39999 : 33463
40000 - 49999 : 20669
50000 - 59999 : 2766
60000 - 69999 : 183
...
<rcu_normal_wake_from_gp>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
On kernels built with CONFIG_IRQ_WORK=y, when rcu_read_unlock() is
invoked within an interrupts-disabled region of code [1], it will invoke
rcu_read_unlock_special(), which uses an irq-work handler to force the
system to notice when the RCU read-side critical section actually ends.
That end won't happen until interrupts are enabled at the soonest.
In some kernels, such as those booted with rcutree.use_softirq=y, the
irq-work handler is used unconditionally.
The per-CPU rcu_data structure's ->defer_qs_iw_pending field is
updated by the irq-work handler and is both read and updated by
rcu_read_unlock_special(). This resulted in the following KCSAN splat:
------------------------------------------------------------------------
BUG: KCSAN: data-race in rcu_preempt_deferred_qs_handler / rcu_read_unlock_special
read to 0xffff96b95f42d8d8 of 1 bytes by task 90 on cpu 8:
rcu_read_unlock_special+0x175/0x260
__rcu_read_unlock+0x92/0xa0
rt_spin_unlock+0x9b/0xc0
__local_bh_enable+0x10d/0x170
__local_bh_enable_ip+0xfb/0x150
rcu_do_batch+0x595/0xc40
rcu_cpu_kthread+0x4e9/0x830
smpboot_thread_fn+0x24d/0x3b0
kthread+0x3bd/0x410
ret_from_fork+0x35/0x40
ret_from_fork_asm+0x1a/0x30
write to 0xffff96b95f42d8d8 of 1 bytes by task 88 on cpu 8:
rcu_preempt_deferred_qs_handler+0x1e/0x30
irq_work_single+0xaf/0x160
run_irq_workd+0x91/0xc0
smpboot_thread_fn+0x24d/0x3b0
kthread+0x3bd/0x410
ret_from_fork+0x35/0x40
ret_from_fork_asm+0x1a/0x30
no locks held by irq_work/8/88.
irq event stamp: 200272
hardirqs last enabled at (200272): [<ffffffffb0f56121>] finish_task_switch+0x131/0x320
hardirqs last disabled at (200271): [<ffffffffb25c7859>] __schedule+0x129/0xd70
softirqs last enabled at (0): [<ffffffffb0ee093f>] copy_process+0x4df/0x1cc0
softirqs last disabled at (0): [<0000000000000000>] 0x0
------------------------------------------------------------------------
The problem is that irq-work handlers run with interrupts enabled, which
means that rcu_preempt_deferred_qs_handler() could be interrupted,
and that interrupt handler might contain an RCU read-side critical
section, which might invoke rcu_read_unlock_special(). In the strict
KCSAN mode of operation used by RCU, this constitutes a data race on
the ->defer_qs_iw_pending field.
This commit therefore disables interrupts across the portion of the
rcu_preempt_deferred_qs_handler() that updates the ->defer_qs_iw_pending
field. This suffices because this handler is not a fast path.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
In the preparation stage of CPU online, if the corresponding
the rdp's->nocb_cb_kthread does not exist, will be created,
there is a situation where the rdp's rcuop kthreads creation fails,
and then de-offload this CPU's rdp, does not assign this CPU's
rdp->nocb_cb_kthread pointer, but this rdp's->nocb_gp_rdp and
rdp's->rdp_gp->nocb_gp_kthread is still valid.
This will cause the subsequent re-offload operation of this offline
CPU, which will pass the conditional check and the kthread_unpark()
will access invalid rdp's->nocb_cb_kthread pointer.
This commit therefore use rdp's->nocb_gp_kthread instead of
rdp_gp's->nocb_gp_kthread for safety check.
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
It is not possible to send an IPI to a dying CPU that has passed the
CPUHP_TEARDOWN_CPU stage. Remaining unhandled IPIs are handled later at
CPUHP_AP_SMPCFD_DYING stage by stop machine. This is the last
opportunity for RCU exp handler to request an expedited quiescent state.
And the upcoming final context switch between stop machine and idle must
have reported the requested context switch.
Therefore, it should not be possible to observe a pending requested
expedited quiescent state when RCU finally stops watching the outgoing
CPU. Once IPIs aren't possible anymore, the QS for the target CPU will
be reported on its behalf by the RCU exp kworker.
Provide an assertion to verify those expectations.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
A CPU coming online checks for an ongoing grace period and reports
a quiescent state accordingly if needed. This special treatment that
shortcuts the expedited IPI finds its origin as an optimization purpose
on the following commit:
338b0f760e84 (rcu: Better hotplug handling for synchronize_sched_expedited()
The point is to avoid an IPI while waiting for a CPU to become online
or failing to become offline.
However this is pointless and even error prone for several reasons:
* If the CPU has been seen offline in the first round scanning offline
and idle CPUs, no IPI is even tried and the quiescent state is
reported on behalf of the CPU.
* This means that if the IPI fails, the CPU just became offline. So
it's unlikely to become online right away, unless the cpu hotplug
operation failed and rolled back, which is a rare event that can
wait a jiffy for a new IPI to be issued.
* But then the "optimization" applying on failing CPU hotplug down only
applies to !PREEMPT_RCU.
* This force reports a quiescent state even if ->cpu_no_qs.b.exp is not
set. As a result it can race with remote QS reports on the same rdp.
Fortunately it happens to be OK but an accident is waiting to happen.
For all those reasons, remove this optimization that doesn't look worthy
to keep around.
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
A full memory barrier in the RCU-PREEMPT task unblock path advertizes
to order the context switch (or rather the accesses prior to
rcu_read_unlock()) with the expedited grace period fastpath.
However the grace period can not complete without the rnp calling into
rcu_report_exp_rnp() with the node locked. This reports the quiescent
state in a fully ordered fashion against updater's accesses thanks to:
1) The READ-SIDE smp_mb__after_unlock_lock() barrier across nodes
locking while propagating QS up to the root.
2) The UPDATE-SIDE smp_mb__after_unlock_lock() barrier while holding the
the root rnp to wait/check for the GP completion.
3) The (perhaps redundant given step 1) and 2)) smp_mb() in rcu_seq_end()
before the grace period completes.
This makes the explicit barrier in this place superfluous. Therefore
remove it as it is confusing.
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
The nreaders and loops variables are exposed as module parameters, which,
in certain combinations, can lead to multiplication overflow.
Besides, loops parameter is defined as long, while through the code is
used as int, which can cause truncation on 64-bit kernels and possible
zeroes where they shouldn't appear.
Since code uses result of multiplication as int anyway, it only makes sense
to replace loops with int. Multiplication overflow check is also added
due to possible multiplication between two very big numbers.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of read-side synchronization")
Signed-off-by: Artem Sadovnikov <a.sadovnikov@ispras.ru>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
When a stall is detected, the state of each NOCB CPU is dumped along
with the state of each NOCB group. The latter part however is
incidentally ignored if the NOCB group leader happens not to be
offloaded itself.
Fix this to make sure related precious informations aren't lost over
a stall report.
Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
For built with CONFIG_PREEMPT_RT=y kernels, running rcutorture
tests resulted in the following splat:
[ 68.797425] rcutorture_one_extend_check during change: Current 0x1 To add 0x1 To remove 0x0 preempt_count() 0x0
[ 68.797533] WARNING: CPU: 2 PID: 512 at kernel/rcu/rcutorture.c:1993 rcutorture_one_extend_check+0x419/0x560 [rcutorture]
[ 68.797601] Call Trace:
[ 68.797602] <TASK>
[ 68.797619] ? lockdep_softirqs_off+0xa5/0x160
[ 68.797631] rcutorture_one_extend+0x18e/0xcc0 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797646] ? local_clock+0x19/0x40
[ 68.797659] rcu_torture_one_read+0xf0/0x280 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797678] ? __pfx_rcu_torture_one_read+0x10/0x10 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797804] ? __pfx_rcu_torture_timer+0x10/0x10 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797815] rcu-torture: rcu_torture_reader task started
[ 68.797824] rcu-torture: Creating rcu_torture_reader task
[ 68.797824] rcu_torture_reader+0x238/0x580 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797836] ? kvm_sched_clock_read+0x15/0x30
Disable BH does not change the SOFTIRQ corresponding bits in
preempt_count() for RT kernels, this commit therefore use
softirq_count() to check the if BH is disabled.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
Trivial RCU is a textbook implementation that is not used in the
Linux kernel, but tested to keep textbooks (and presentations) honest.
It is so trivial that it cannot deal with either CPU hotplug or external
migration from one CPU to another. This commit therefore splats whenever
onoff_interval or shuffle_interval are non-zero, and then sets them to
zero in order to avoid false-positive failures.
Those wishing to set these module parameters in order to force failures
in Trivial RCU are free to revert this commit. Just don't expect me to
be sympathetic to any resulting bug reports!
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202505131651.af6e81d7-lkp@intel.com
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
Given that the rcutorture_one_extend_check() function now uses
in_serving_softirq() and in_hardirq(), it is no longer necessary to pass
insoftirq flags down the function-call stack. This commit therefore
removes those flags, and, while in the area, does a bit of whitespace
cleanup.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
This commit prints the number of RCU up/down readers and the number
of such readers that migrated from one CPU to another, along
with the rest of the periodic rcu_torture_stats_print() output.
These statistics are currently used only by srcu_down_read{,_fast}()
and srcu_up_read(,_fast)().
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
The design of testing of up/down readers such as srcu_down_read()
and srcu_up_read() assumes that these are tested only by the
rcu_torture_updown() kthread, and never by the rcu_torture_reader()
kthread. Because we all know which road is paved with good intentions,
this commit adds WARN_ON_ONCE() to verify that things are going to plan.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
This commit creates counters in the rcu_torture_one_read_state_updown
structure that check for a call to ->up_read() that lacks a matching
call to ->down_read().
While in the area, add end-of-run cleanup code that prevents calls to
rcu_torture_updown_hrt() from happening after the test has moved on. Yes,
the srcu_barrier() at the end of the test will wait for them, but this
could result in confusing states, statistics, and diagnostic information.
So explicitly wait for them before we get to the end-of-test output.
[ paulmck: Apply kernel test robot feedback. ]
[ joel: Apply Boqun's fix for counter increment ordering. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
The down/up SRCU reader testing uses an hrtimer handler to exit the SRCU
read-side critical section. This might be delayed, and if delayed for
too long, it can prevent the rcutorture run from completing. This commit
therefore complains if the hrtimer handler is delayed for more than
ten seconds.
[ paulmck, joel: Apply kernel test robot feedback to avoid
false-positive complaint of excessive ->up_read() delays by using
HRTIMER_MODE_HARD ]
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|