aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/time (follow)
AgeCommit message (Collapse)AuthorFilesLines
2024-11-02timekeeping: Always check for negative motionThomas Gleixner2-12/+0
clocksource_delta() has two variants. One with a check for negative motion, which is only selected by x86. This is a historic leftover as this function was previously used in the time getter hot paths. Since 135225a363ae timekeeping_cycles_to_ns() has unconditional protection against this as a by-product of the protection against 64bit math overflow. clocksource_delta() is only used in the clocksource watchdog and in timekeeping_advance(). The extra conditional there is not hurting anyone. Remove the config option and unconditionally prevent negative motion of the readout. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241031120328.599430157@linutronix.de
2024-11-02timekeeping: Remove CONFIG_DEBUG_TIMEKEEPINGThomas Gleixner1-105/+3
Since 135225a363ae timekeeping_cycles_to_ns() handles large offsets which would lead to 64bit multiplication overflows correctly. It's also protected against negative motion of the clocksource unconditionally, which was exclusive to x86 before. timekeeping_advance() handles large offsets already correctly. That means the value of CONFIG_DEBUG_TIMEKEEPING which analyzed these cases is very close to zero. Remove all of it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241031120328.536010148@linutronix.de
2024-10-31timers: Add missing READ_ONCE() in __run_timer_base()Thomas Gleixner1-1/+2
__run_timer_base() checks base::next_expiry without holding base::lock. That can race with a remote CPU updating next_expiry under the lock. This is an intentional and harmless data race, but lacks a READ_ONCE(), so KCSAN complains about this. Add the missing READ_ONCE(). All other places are covered already. Fixes: 79f8b28e85f8 ("timers: Annotate possible non critical data race of next_expiry") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/87a5emyqk0.ffs@tglx Closes: https://lore.kernel.org/oe-lkp/202410301205.ef8e9743-lkp@intel.com
2024-10-31tick: Remove now unneeded low-res tick stop on CPUHP_AP_TICK_DYINGFrederic Weisbecker1-19/+6
The generic clockevent layer now detaches and stops the underlying clockevent from the dying CPU, unifying the tick behaviour for both periodic and oneshot mode on offline CPUs. There is no more need for the tick layer to care about that. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241029125451.54574-4-frederic@kernel.org
2024-10-31clockevents: Shutdown and unregister current clockevents at CPUHP_AP_TICK_DYINGFrederic Weisbecker2-21/+12
The way the clockevent devices are finally stopped while a CPU is offlining is currently chaotic. The layout being by order: 1) tick_sched_timer_dying() stops the tick and the underlying clockevent but only for oneshot case. The periodic tick and its related clockevent still runs. 2) tick_broadcast_offline() detaches and stops the per-cpu oneshot broadcast and append it to the released list. 3) Some individual clockevent drivers stop the clockevents (a second time if the tick is oneshot) 4) Once the CPU is dead, a control CPU remotely detaches and stops (a 3rd time if oneshot mode) the CPU clockevent and adds it to the released list. 5) The released list containing the broadcast device released on step 2) and the remotely detached clockevent from step 4) are unregistered. These random events can be factorized if the current clockevent is detached and stopped by the dying CPU at the generic layer, that is from the dying CPU: a) Stop the tick b) Stop/detach the underlying per-cpu oneshot broadcast clockevent c) Stop/detach the underlying clockevent d) Release / unregister the clockevents from b) and c) e) Release / unregister the remaining clockevents from the dying CPU. This part could be performed by the dying CPU This way the drivers and the tick layer don't need to care about clockevent operations during cpuhotplug down. This also unifies the tick behaviour on offline CPUs between oneshot and periodic modes, avoiding offline ticks altogether for sanity. Adopt the simplification. [ tglx: Remove the WARN_ON() in clockevents_register_device() as that is called from an upcoming CPU before the CPU is marked online ] Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241029125451.54574-3-frederic@kernel.org
2024-10-31clockevents: Improve clockevents_notify_released() commentFrederic Weisbecker1-2/+10
When a new clockevent device is added and replaces a previous device, the latter is put into the released list. Then the released list is added back. This may look counter-intuitive but the reason is that released device might be suitable for other uses. For example a released CPU regular clockevent can be a better replacement for the current broadcast event. Similarly a released broadcast clockevent can be a better replacement for the current regular clockevent of a given CPU. Improve comments stating about these subtleties. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241029125451.54574-2-frederic@kernel.org
2024-10-29posix-timers: Add proper state trackingThomas Gleixner4-17/+28
Right now the state tracking is done by two struct members: - it_active: A boolean which tracks armed/disarmed state - it_signal_seq: A sequence counter which is used to invalidate settings and prevent rearming Replace it_active with it_status and keep properly track about the states in one place. This allows to reuse it_signal_seq to track reprogramming, disarm and delete operations in order to drop signals which are related to the state previous of those operations. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20241001083835.670337048@linutronix.de
2024-10-29posix-timers: Rename k_itimer:: It_requeue_pendingThomas Gleixner3-9/+9
Prepare for using this struct member to do a proper reprogramming and deletion accounting so that stale signals can be dropped. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20241001083835.611997737@linutronix.de
2024-10-29posix-timers: Drop signal if timer has been deleted or reprogrammedThomas Gleixner1-4/+5
No point in delivering a signal from the past. POSIX does not specify the behaviour here: - "The effect of disarming or resetting a timer with pending expiration notifications is unspecified." - "The disposition of pending signals for the deleted timer is unspecified." In both cases it is reasonable to expect that pending signals are discarded. Especially in the reprogramming case it does not make sense to account for previous overruns or to deliver a signal for a timer which has been disarmed. Drop the signal as that is conistent and understandable behaviour. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20241001083835.553646280@linutronix.de
2024-10-29signal: Allow POSIX timer signals to be droppedThomas Gleixner1-1/+2
In case that a timer was reprogrammed or deleted an already pending signal is obsolete. Right now such signals are kept around and eventually delivered. While POSIX is blury about this: - "The effect of disarming or resetting a timer with pending expiration notifications is unspecified." - "The disposition of pending signals for the deleted timer is unspecified." it is reasonable in both cases to expect that pending signals are discarded as they have no meaning anymore. Prepare the signal code to allow dropping posix timer signals. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20241001083835.494416923@linutronix.de
2024-10-29posix-timers: Cure si_sys_private raceThomas Gleixner1-14/+1
The si_sys_private member of the siginfo which is embedded in the preallocated sigqueue is used by the posix timer code to decide whether a timer must be reprogrammed on signal delivery. The handling of this is racy as a long standing comment in that code documents. It is modified with the timer lock held, but without sighand lock being held. The actual signal delivery code checks for it under sighand lock without holding the timer lock. Hand the new value to send_sigqueue() as argument and store it with sighand lock held. This is an intermediate change to address this issue. The arguments to this function will be cleanup in subsequent changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20241001083835.434338954@linutronix.de
2024-10-29signal: Confine POSIX_TIMERS properlyThomas Gleixner2-3/+34
Move the itimer rearming out of the signal code and consolidate all posix timer related functions in the signal code under one ifdef. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20241001083835.314100569@linutronix.de
2024-10-25time: Fix references to _msecs_to_jiffies() handling of valuesMiguel Ojeda1-1/+1
The details about the handling of the "normal" values were moved to the _msecs_to_jiffies() helpers in commit ca42aaf0c861 ("time: Refactor msecs_to_jiffies"). However, the same commit still mentioned __msecs_to_jiffies() in the added documentation. Thus point to _msecs_to_jiffies() instead. Fixes: ca42aaf0c861 ("time: Refactor msecs_to_jiffies") Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241025110141.157205-2-ojeda@kernel.org
2024-10-25time: Partially revert cleanup on msecs_to_jiffies() documentationMiguel Ojeda1-1/+1
The documentation's intention is to compare msecs_to_jiffies() (first sentence) with __msecs_to_jiffies() (second sentence), which is what the original documentation did. One of the cleanups in commit f3cb80804b82 ("time: Fix various kernel-doc problems") may have thought the paragraph was talking about the latter since that is what it is being documented. Thus revert that part of the change. Fixes: f3cb80804b82 ("time: Fix various kernel-doc problems") Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241025110141.157205-1-ojeda@kernel.org
2024-10-25timekeeping: Merge timekeeping_update_staged() and timekeeping_update()Anna-Maria Behnsen1-17/+14
timekeeping_update_staged() is the only call site of timekeeping_update(). Merge those functions. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-25-554456a44a15@linutronix.de
2024-10-25timekeeping: Remove TK_MIRROR timekeeping_update() actionAnna-Maria Behnsen1-9/+1
All call sites of using TK_MIRROR flag in timekeeping_update() are gone. The TK_MIRROR dependent code path is therefore dead code. Remove it along with the TK_MIRROR define. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-24-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework do_adjtimex() to use shadow_timekeeperAnna-Maria Behnsen1-16/+25
Updates of the timekeeper can be done by operating on the shadow timekeeper and afterwards copying the result into the real timekeeper. This has the advantage, that the sequence count write protected region is kept as small as possible. Convert do_adjtimex() to use this scheme and take the opportunity to use a scoped_guard() for locking. That requires to have a separate function for updating the leap state so that the update is protected by the sequence count. This also brings the timekeeper and the shadow timekeeper in sync for this state, which was not the case so far. That's not a correctness problem as the state is only used at the read sides which use the real timekeeper, but it's inconsistent nevertheless. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-23-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework timekeeping_suspend() to use shadow_timekeeperAnna-Maria Behnsen1-12/+10
Updates of the timekeeper can be done by operating on the shadow timekeeper and afterwards copying the result into the real timekeeper. This has the advantage, that the sequence count write protected region is kept as small as possible. While the sequence count held time is not relevant for the resume path as there is no concurrency, there is no reason to have this function different than all the other update sites. Convert timekeeping_inject_offset() to use this scheme and cleanup the variable declarations while at it. As halt_fast_timekeeper() does not need protection sequence counter, it is no problem to move it with this change outside of the sequence counter protected area. But it still needs to be executed while holding the lock. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-22-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework timekeeping_resume() to use shadow_timekeeperAnna-Maria Behnsen1-12/+10
Updates of the timekeeper can be done by operating on the shadow timekeeper and afterwards copying the result into the real timekeeper. This has the advantage, that the sequence count write protected region is kept as small as possible. While the sequence count held time is not relevant for the resume path as there is no concurrency, there is no reason to have this function different than all the other update sites. Convert timekeeping_inject_offset() to use this scheme and cleanup the variable declaration while at it. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-21-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework timekeeping_inject_sleeptime64() to use shadow_timekeeperAnna-Maria Behnsen1-15/+7
Updates of the timekeeper can be done by operating on the shadow timekeeper and afterwards copying the result into the real timekeeper. This has the advantage, that the sequence count write protected region is kept as small as possible. Convert timekeeping_inject_sleeptime64() to use this scheme. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-20-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework timekeeping_init() to use shadow_timekeeperAnna-Maria Behnsen1-9/+7
For timekeeping_init() the sequence count write held time is not relevant and it could keep working on the real timekeeper, but there is no reason to make it different from other timekeeper updates. Convert it to operate on the shadow timekeeper. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-19-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework change_clocksource() to use shadow_timekeeperAnna-Maria Behnsen1-11/+7
Updates of the timekeeper can be done by operating on the shadow timekeeper and afterwards copying the result into the real timekeeper. This has the advantage, that the sequence count write protected region is kept as small as possible. Convert change_clocksource() to use this scheme. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-18-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework timekeeping_inject_offset() to use shadow_timekeeperAnna-Maria Behnsen1-25/+16
Updates of the timekeeper can be done by operating on the shadow timekeeper and afterwards copying the result into the real timekeeper. This has the advantage, that the sequence count write protected region is kept as small as possible. Convert timekeeping_inject_offset() to use this scheme. That allows to use a scoped_guard() for locking the timekeeper lock as the usage of the shadow timekeeper allows a rollback in the error case instead of the full timekeeper update of the original code. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-17-554456a44a15@linutronix.de
2024-10-25timekeeping: Rework do_settimeofday64() to use shadow_timekeeperAnna-Maria Behnsen1-26/+16
Updates of the timekeeper can be done by operating on the shadow timekeeper and afterwards copying the result into the real timekeeper. This has the advantage, that the sequence count write protected region is kept as small as possible. Convert do_settimeofday64() to use this scheme. That allows to use a scoped_guard() for locking the timekeeper lock as the usage of the shadow timekeeper allows a rollback in the error case instead of the full timekeeper update of the original code. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-16-554456a44a15@linutronix.de
2024-10-25timekeeping: Provide timekeeping_restore_shadow()Thomas Gleixner1-1/+10
Functions which operate on the real timekeeper, e.g. do_settimeofday(), have error conditions. If they are hit a full timekeeping update is still required because the already committed operations modified the timekeeper. When switching these functions to operate on the shadow timekeeper then the full update can be avoided in the error case, but the modified shadow timekeeper has to be restored. Provide a helper function for that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-15-554456a44a15@linutronix.de
2024-10-25timekeeping: Introduce combined timekeeping action flagAnna-Maria Behnsen1-4/+6
Instead of explicitly listing all the separate timekeeping actions flags, introduce a new one which covers all actions except TK_MIRROR action. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-14-554456a44a15@linutronix.de
2024-10-25timekeeping: Split out timekeeper update of timekeeping_advanced()Anna-Maria Behnsen1-16/+27
timekeeping_advance() is the only optimized function which uses shadow_timekeeper for updating the real timekeeper to keep the sequence counter protected region as small as possible. To be able to transform timekeeper updates in other functions to use the same logic, split out functionality into a separate function timekeeper_update_staged(). While at it, document the reason why the sequence counter must be write held over the call to timekeeping_update() and the copying to the real timekeeper and why using a pointer based update is suboptimal. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-13-554456a44a15@linutronix.de
2024-10-25timekeeping: Add struct tk_data as argument to timekeeping_update()Anna-Maria Behnsen1-12/+13
Updates of the timekeeper are done in two ways: 1. Updating timekeeper and afterwards memcpy()'ing the result into shadow_timekeeper using timekeeping_update(). Used everywhere for updates except in timekeeping_advance(); the sequence counter protected region starts before the first change to the timekeeper is done. 2. Updating shadow_timekeeper and then memcpy()'ing the result into timekeeper. Used only by in timekeeping_advance(); The seqence counter protected region is only around timekeeping_update() and the memcpy for copy from shadow to timekeeper. The second option is fast path optimized. The sequence counter protected region is as short as possible. As this behaviour is mainly documented by commit messages, but not in code, it makes the not easy timekeeping code more complicated to read. There is no reason why updates to the timekeeper can't use the optimized version everywhere. With this, the code will be cleaner, as code is reused instead of duplicated. To be able to access tk_data which contains all required information, add a pointer to tk_data as an argument to timekeeping_update(). With that convert the comment about holding the lock into a lockdep assert. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-12-554456a44a15@linutronix.de
2024-10-25timekeeping: Introduce tkd_basic_setup() to make lock and seqcount init reusableAnna-Maria Behnsen1-2/+7
Initialization of lock and seqcount needs to be done for every instance of timekeeper struct. To be able to easily reuse it, create a separate function for it. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-11-554456a44a15@linutronix.de
2024-10-25timekeeping: Define a struct type for tk_core to make it reusableAnna-Maria Behnsen1-2/+4
The struct tk_core uses is not reusable. As long as there is only a single timekeeper, this is not a problem. But when the timekeeper infrastructure will be reused for per ptp clock timekeepers, an explicit struct type is required. Define struct tk_data as explicit struct type for tk_core. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-10-554456a44a15@linutronix.de
2024-10-25timekeeping: Move timekeeper_lock into tk_coreAnna-Maria Behnsen1-43/+29
timekeeper_lock protects updates to struct tk_core but is not part of struct tk_core. As long as there is only a single timekeeper, this is not a problem. But when the timekeeper infrastructure will be reused for per ptp clock timekeepers, timekeeper_lock needs to be part of tk_core. Move the lock into tk_core, move initialisation of the lock and sequence counter into timekeeping_init() and update all users of timekeeper_lock. As this is touching all lock sites, convert them to use: guard(raw_spinlock_irqsave)(&tk_core.lock); instead of lock/unlock functions whenever possible. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-9-554456a44a15@linutronix.de
2024-10-25timekeeping: Encapsulate locking/unlocking of timekeeper_lockThomas Gleixner3-5/+18
timekeeper_lock protects updates of timekeeper (tk_core). It is also used by vdso_update_begin/end() and not only internally by the timekeeper code. As long as there is only a single timekeeper, this works fine. But when the timekeeper infrastructure will be reused for per ptp clock timekeepers, timekeeper_lock needs to be part of tk_core.. Therefore encapuslate locking/unlocking of timekeeper_lock and make the lock static. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-8-554456a44a15@linutronix.de
2024-10-25timekeeping: Move shadow_timekeeper into tk_coreThomas Gleixner1-4/+3
tk_core requires shadow_timekeeper to allow timekeeping_advance() updating without holding the timekeeper sequence count write locked. This allows the readers to make progress up to the actual update where the shadow timekeeper is copied over to the real timekeeper. As long as there is only a single timekeeper, having them separate is fine. But when the timekeeper infrastructure will be reused for per ptp clock timekeepers, shadow_timekeeper needs to be part of tk_core. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-7-554456a44a15@linutronix.de
2024-10-25timekeeping: Simplify code in timekeeping_advance()Thomas Gleixner1-10/+6
timekeeping_advance() takes the timekeeper_lock and releases it before returning. When an early return is required, goto statements are used to make sure the lock is realeased properly. When the code was written the locking guard() was not yet available. Use the guard() to simplify the code and while at it cleanup ordering of function variables. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-5-554456a44a15@linutronix.de
2024-10-25timekeeping: Abort clocksource change in case of failureThomas Gleixner1-18/+13
There is no point to go through a full timekeeping update when acquiring a module reference or enabling the new clocksource fails. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-4-554456a44a15@linutronix.de
2024-10-25timekeeping: Avoid duplicate leap state updateAnna-Maria Behnsen1-1/+2
do_adjtimex() invokes tk_update_leap_state() unconditionally even when a previous invocation of timekeeping_update() already did that update. Put it into the else path which is invoked when timekeeping_update() is not called. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-3-554456a44a15@linutronix.de
2024-10-25timekeeping: Don't stop time readers across hard_pps() updateThomas Gleixner1-4/+0
hard_pps() update does not modify anything which might be required by time readers so forcing readers out of the way during the update is a pointless exercise. The interaction with adjtimex() and timekeeper updates which call into the NTP code is properly serialized by timekeeper_lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-2-554456a44a15@linutronix.de
2024-10-25timekeeping: Read NTP tick length only onceThomas Gleixner1-2/+3
No point in reading it a second time when the comparison fails. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-1-554456a44a15@linutronix.de
2024-10-24Merge tag 'net-6.12-rc5' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfiler, xfrm and bluetooth. Oddly this includes a fix for a posix clock regression; in our previous PR we included a change there as a pre-requisite for networking one. That fix proved to be buggy and requires the follow-up included here. Thomas suggested we should send it, given we sent the buggy patch. Current release - regressions: - posix-clock: Fix unbalanced locking in pc_clock_settime() - netfilter: fix typo causing some targets not to load on IPv6 Current release - new code bugs: - xfrm: policy: remove last remnants of pernet inexact list Previous releases - regressions: - core: fix races in netdev_tx_sent_queue()/dev_watchdog() - bluetooth: fix UAF on sco_sock_timeout - eth: hv_netvsc: fix VF namespace also in synthetic NIC NETDEV_REGISTER event - eth: usbnet: fix name regression - eth: be2net: fix potential memory leak in be_xmit() - eth: plip: fix transmit path breakage Previous releases - always broken: - sched: deny mismatched skip_sw/skip_hw flags for actions created by classifiers - netfilter: bpf: must hold reference on net namespace - eth: virtio_net: fix integer overflow in stats - eth: bnxt_en: replace ptp_lock with irqsave variant - eth: octeon_ep: add SKB allocation failures handling in __octep_oq_process_rx() Misc: - MAINTAINERS: add Simon as an official reviewer" * tag 'net-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits) net: dsa: mv88e6xxx: support 4000ps cycle counter period net: dsa: mv88e6xxx: read cycle counter period from hardware net: dsa: mv88e6xxx: group cycle counter coefficients net: usb: qmi_wwan: add Fibocom FG132 0x0112 composition hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event net: dsa: microchip: disable EEE for KSZ879x/KSZ877x/KSZ876x Bluetooth: ISO: Fix UAF on iso_sock_timeout Bluetooth: SCO: Fix UAF on sco_sock_timeout Bluetooth: hci_core: Disable works on hci_unregister_dev posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime() r8169: avoid unsolicited interrupts net: sched: use RCU read-side critical section in taprio_dump() net: sched: fix use-after-free in taprio_change() net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers net: usb: usbnet: fix name regression mlxsw: spectrum_router: fix xa_store() error checking virtio_net: fix integer overflow in stats net: fix races in netdev_tx_sent_queue()/dev_watchdog() net: wwan: fix global oob in wwan_rtnl_policy netfilter: xtables: fix typo causing some targets not to load on IPv6 ...
2024-10-24posix-timers: Replace call_rcu() by kfree_rcu() for simple kmem_cache_free() ↵Julia Lawall1-8/+1
callback Since SLOB was removed and since commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"), it is not longer necessary to use call_rcu() when the callback only performs kmem_cache_free(). Use kfree_rcu() directly. The changes were made using Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lore.kernel.org/all/20241013201704.49576-12-Julia.Lawall@inria.fr
2024-10-23posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()Jinjie Ruan1-3/+3
If get_clock_desc() succeeds, it calls fget() for the clockid's fd, and get the clk->rwsem read lock, so the error path should release the lock to make the lock balance and fput the clockid's fd to make the refcount balance and release the fd related resource. However the below commit left the error path locked behind resulting in unbalanced locking. Check timespec64_valid_strict() before get_clock_desc() to fix it, because the "ts" is not changed after that. Fixes: d8794ac20a29 ("posix-clock: Fix missing timespec64 check in pc_clock_settime()") Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Anna-Maria Behnsen <anna-maria@linutronix.de> [pabeni@redhat.com: fixed commit message typo] Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-20Merge tag 'sched_urgent_for_v6.12_rc4' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduling fixes from Borislav Petkov: - Add PREEMPT_RT maintainers - Fix another aspect of delayed dequeued tasks wrt determining their state, i.e., whether they're runnable or blocked - Handle delayed dequeued tasks and their migration wrt PSI properly - Fix the situation where a delayed dequeue task gets enqueued into a new class, which should not happen - Fix a case where memory allocation would happen while the runqueue lock is held, which is a no-no - Do not over-schedule when tasks with shorter slices preempt the currently running task - Make sure delayed to deque entities are properly handled before unthrottling - Other smaller cleanups and improvements * tag 'sched_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Add an entry for PREEMPT_RT. sched/fair: Fix external p->on_rq users sched/psi: Fix mistaken CPU pressure indication after corrupted task state bug sched/core: Dequeue PSI signals for blocked tasks that are delayed sched: Fix delayed_dequeue vs switched_from_fair() sched/core: Disable page allocation in task_tick_mm_cid() sched/deadline: Use hrtick_enabled_dl() before start_hrtick_dl() sched/eevdf: Fix wakeup-preempt by checking cfs_rq->nr_running sched: Fix sched_delayed vs cfs_bandwidth
2024-10-16timers: Add a warning to usleep_range_state() for wrong order of argumentsAnna-Maria Behnsen1-0/+3
There is a warning in checkpatch script that triggers, when min and max arguments of usleep_range_state() are in reverse order. This check does only cover callsites which uses constants. Add this check into the code as a WARN_ON_ONCE() to also cover callsites not using constants and fix the mis-usage by resetting the delta to 0. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20241014-devel-anna-maria-b4-timers-flseep-v3-9-dc8b907cb62f@linutronix.de
2024-10-16timers: Update function descriptions of sleep/delay related functionsAnna-Maria Behnsen1-6/+47
A lot of commonly used functions for inserting a sleep or delay lack a proper function description. Add function descriptions to all of them to have important information in a central place close to the code. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20241014-devel-anna-maria-b4-timers-flseep-v3-5-dc8b907cb62f@linutronix.de
2024-10-16timers: Update schedule_[hr]timeout*() related function descriptionsAnna-Maria Behnsen1-25/+41
schedule_timeout*() functions do not have proper kernel-doc formatted function descriptions. schedule_hrtimeout() and schedule_hrtimeout_range() have a almost identical description. Add missing function descriptions. Remove copy of function description and add a pointer to the existing description instead. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20241014-devel-anna-maria-b4-timers-flseep-v3-3-dc8b907cb62f@linutronix.de
2024-10-16timers: Move *sleep*() and timeout functions into a separate fileAnna-Maria Behnsen4-313/+318
All schedule_timeout() and *sleep*() related functions are interfaces on top of timer list timers and hrtimers to add a sleep to the code. As they are built on top of the timer list timers and hrtimers, the [hr]timer interfaces are already used except when queuing the timer in schedule_timeout(). But there exists the appropriate interface add_timer() which does the same job with an extra check for an already pending timer. Split all those functions as they are into a separate file and use add_timer() instead of __mod_timer() in schedule_timeout(). While at it fix minor formatting issues and a multi line printk function call in schedule_timeout(). Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20241014-devel-anna-maria-b4-timers-flseep-v3-2-dc8b907cb62f@linutronix.de
2024-10-16time: Remove '%' from numeric constant in kernel-doc commentWang Jinchao1-8/+8
Change %0 to 0 in kernel-doc comments. %0 is not valid. Signed-off-by: Wang Jinchao <wangjinchao@xfusion.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241009022135.92400-2-wangjinchao@xfusion.com
2024-10-15vdso: Remove timekeeper argument of __arch_update_vsyscall()Thomas Weißschuh1-1/+1
No implementation of this hook uses the passed in timekeeper anymore. This avoids including a non-VDSO header while building the VDSO, which can lead to compilation errors. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/all/20241010-vdso-generic-arch_update_vsyscall-v1-1-7fe5a3ea4382@linutronix.de
2024-10-14posix-clock: Fix missing timespec64 check in pc_clock_settime()Jinjie Ruan1-0/+3
As Andrew pointed out, it will make sense that the PTP core checked timespec64 struct's tv_sec and tv_nsec range before calling ptp->info->settime64(). As the man manual of clock_settime() said, if tp.tv_sec is negative or tp.tv_nsec is outside the range [0..999,999,999], it should return EINVAL, which include dynamic clocks which handles PTP clock, and the condition is consistent with timespec64_valid(). As Thomas suggested, timespec64_valid() only check the timespec is valid, but not ensure that the time is in a valid range, so check it ahead using timespec64_valid_strict() in pc_clock_settime() and return -EINVAL if not valid. There are some drivers that use tp->tv_sec and tp->tv_nsec directly to write registers without validity checks and assume that the higher layer has checked it, which is dangerous and will benefit from this, such as hclge_ptp_settime(), igb_ptp_settime_i210(), _rcar_gen4_ptp_settime(), and some drivers can remove the checks of itself. Cc: stable@vger.kernel.org Fixes: 0606f422b453 ("posix clocks: Introduce dynamic clocks") Acked-by: Richard Cochran <richardcochran@gmail.com> Suggested-by: Andrew Lunn <andrew@lunn.ch> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patch.msgid.link/20241009072302.1754567-2-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14sched/fair: Fix external p->on_rq usersPeter Zijlstra1-0/+6
Sean noted that ever since commit 152e11f6df29 ("sched/fair: Implement delayed dequeue") KVM's preemption notifiers have started mis-classifying preemption vs blocking. Notably p->on_rq is no longer sufficient to determine if a task is runnable or blocked -- the aforementioned commit introduces tasks that remain on the runqueue even through they will not run again, and should be considered blocked for many cases. Add the task_is_runnable() helper to classify things and audit all external users of the p->on_rq state. Also add a few comments. Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue") Reported-by: Sean Christopherson <seanjc@google.com> Tested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20241010091843.GK33184@noisy.programming.kicks-ass.net