aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2025-11-06drm/amd/display: Enable mst when it's detected but yet to be initializedWayne Lin1-1/+9
[Why] drm_dp_mst_topology_queue_probe() is used under the assumption that mst is already initialized. If we connect system with SST first then switch to the mst branch during suspend, we will fail probing topology by calling the wrong API since the mst manager is yet to be initialized. [How] At dm_resume(), once it's detected as mst branc connected, check if the mst is initialized already. If not, call dm_helpers_dp_mst_start_top_mgr() instead to initialize mst V2: Adjust the commit msg a bit Fixes: bc068194f548 ("drm/amd/display: Don't write DP_MSTM_CTRL after LT") Cc: Fangzhi Zuo <jerry.zuo@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 62320fb8d91a0bddc44a228203cfa9bfbb5395bd) Cc: stable@vger.kernel.org
2025-11-06drm/amdgpu: Fix wait after reset sequence in S3Lijo Lazar2-3/+32
For a mode-1 reset done at the end of S3 on PSPv11 dGPUs, only check if TOS is unloaded. Fixes: 32f73741d6ee ("drm/amdgpu: Wait for bootloader after PSPv11 reset") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4649 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1ad25fd272753db14c5d1cc8c68e20ce01f3f888)
2025-11-06drm/amd: Fix suspend failure with secure display TAMario Limonciello1-1/+4
commit c760bcda83571 ("drm/amd: Check whether secure display TA loaded successfully") attempted to fix extra messages, but failed to port the cleanup that was in commit 5c6d52ff4b61e ("drm/amd: Don't try to enable secure display TA multiple times") to prevent multiple tries. Add that to the failure handling path even on a quick failure. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4679 Fixes: c760bcda8357 ("drm/amd: Check whether secure display TA loaded successfully") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4104c0a454f6a4d1e0d14895d03c0e7bdd0c8240)
2025-11-06drm/amdgpu: fix gpu page fault after hibernation on PF passthroughSamuel Zhang2-2/+5
On PF passthrough environment, after hibernate and then resume, coralgemm will cause gpu page fault. Mode1 reset happens during hibernate, but partition mode is not restored on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right after resume. When CP access the MQD BO, wrong stride size is used, this will cause out of bound access on the MQD BO, resulting page fault. The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called when resume from a hibernation. KFD resume is called separately during a reset recovery or resume from suspend sequence. Hence it's not required to be called as part of partition switch. Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5d1b32cfe4a676fe552416cb5ae847b215463a1a)
2025-11-06Merge tag 'net-6.18-rc5' of ↵Linus Torvalds72-333/+876
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: Including fixes from bluetooth and wireless. Current release - new code bugs: - ptp: expose raw cycles only for clocks with free-running counter - bonding: fix null-deref in actor_port_prio setting - mdio: ERR_PTR-check regmap pointer returned by device_node_to_regmap() - eth: libie: depend on DEBUG_FS when building LIBIE_FWLOG Previous releases - regressions: - virtio_net: fix perf regression due to bad alignment of virtio_net_hdr_v1_hash - Revert "wifi: ath10k: avoid unnecessary wait for service ready message" caused regressions for QCA988x and QCA9984 - Revert "wifi: ath12k: Fix missing station power save configuration" caused regressions for WCN7850 - eth: bnxt_en: shutdown FW DMA in bnxt_shutdown(), fix memory corruptions after kexec Previous releases - always broken: - virtio-net: fix received packet length check for big packets - sctp: fix races in socket diag handling - wifi: add an hrtimer-based delayed work item to avoid low granularity of timers set relatively far in the future, and use it where it matters (e.g. when performing AP-scheduled channel switch) - eth: mlx5e: - correctly propagate error in case of module EEPROM read failure - fix HW-GRO on systems with PAGE_SIZE == 64kB - dsa: b53: fixes for tagging, link configuration / RMII, FDB, multicast - phy: lan8842: implement latest errata" * tag 'net-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits) selftests/vsock: avoid false-positives when checking dmesg net: bridge: fix MST static key usage net: bridge: fix use-after-free due to MST port state bypass lan966x: Fix sleeping in atomic context bonding: fix NULL pointer dereference in actor_port_prio setting net: dsa: microchip: Fix reserved multicast address table programming net: wan: framer: pef2256: Switch to devm_mfd_add_devices() net: libwx: fix device bus LAN ID net/mlx5e: SHAMPO, Fix header formulas for higher MTUs and 64K pages net/mlx5e: SHAMPO, Fix skb size check for 64K pages net/mlx5e: SHAMPO, Fix header mapping for 64K pages net: ti: icssg-prueth: Fix fdb hash size configuration net/mlx5e: Fix return value in case of module EEPROM read error net: gro_cells: Reduce lock scope in gro_cell_poll libie: depend on DEBUG_FS when building LIBIE_FWLOG wifi: mac80211_hwsim: Limit destroy_on_close radio removal to netgroup netpoll: Fix deadlock in memory allocation under spinlock net: ethernet: ti: netcp: Standardize knav_dma_open_channel to return NULL on error virtio-net: fix received length check in big packets bnxt_en: Fix warning in bnxt_dl_reload_down() ...
2025-11-06kbuild: Strip trailing padding bytes from modules.builtin.modinfoNathan Chancellor1-1/+14
After commit d50f21091358 ("kbuild: align modinfo section for Secureboot Authenticode EDK2 compat"), running modules_install with certain versions of kmod (such as 29.1 in Ubuntu Jammy) in certain configurations may fail with: depmod: ERROR: kmod_builtin_iter_next: unexpected string without modname prefix The additional padding bytes to ensure .modinfo is aligned within vmlinux.unstripped are unexpected by kmod, as this section has always just been null-terminated strings. Strip the trailing padding bytes from modules.builtin.modinfo after it has been extracted from vmlinux.unstripped to restore the format that kmod expects while keeping .modinfo aligned within vmlinux.unstripped to avoid regressing the Authenticode calculation fix for EDK2. Cc: stable@vger.kernel.org Fixes: d50f21091358 ("kbuild: align modinfo section for Secureboot Authenticode EDK2 compat") Reported-by: Omar Sandoval <osandov@fb.com> Reported-by: Samir M <samir@linux.ibm.com> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/7fef7507-ad64-4e51-9bb8-c9fb6532e51e@linux.ibm.com/ Tested-by: Omar Sandoval <osandov@fb.com> Tested-by: Samir M <samir@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Nicolas Schier <nsc@kernel.org> Link: https://patch.msgid.link/20251105-kbuild-fix-builtin-modinfo-for-kmod-v1-1-b419d8ad4606@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2025-11-06selftests/vsock: avoid false-positives when checking dmesgBobby Eshleman1-4/+4
Sometimes VMs will have some intermittent dmesg warnings that are unrelated to vsock. Change the dmesg parsing to filter on strings containing 'vsock' to avoid false positive failures that are unrelated to vsock. The downside is that it is possible for some vsock related warnings to not contain the substring 'vsock', so those will be missed. Fixes: a4a65c6fe08b ("selftests/vsock: add initial vmtest.sh for vsock") Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251105-vsock-vmtest-dmesg-fix-v2-1-1a042a14892c@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06Merge branch 'net-bridge-fix-two-mst-bugs'Jakub Kicinski5-8/+22
Nikolay Aleksandrov says: ==================== net: bridge: fix two MST bugs Patch 01 fixes a race condition that exists between expired fdb deletion and port deletion when MST is enabled. Learning can happen after the port's state has been changed to disabled which could lead to that port's memory being used after it's been freed. The issue was reported by syzbot, more information in patch 01. Patch 02 fixes an issue with MST's static key which Ido spotted, we can have multiple bridges with MST and a single bridge can erroneously disable it for all. ==================== Link: https://patch.msgid.link/20251105111919.1499702-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06net: bridge: fix MST static key usageNikolay Aleksandrov3-2/+14
As Ido pointed out, the static key usage in MST is buggy and should use inc/dec instead of enable/disable because we can have multiple bridges with MST enabled which means a single bridge can disable MST for all. Use static_branch_inc/dec to avoid that. When destroying a bridge decrement the key if MST was enabled. Fixes: ec7328b59176 ("net: bridge: mst: Multiple Spanning Tree (MST) mode") Reported-by: Ido Schimmel <idosch@nvidia.com> Closes: https://lore.kernel.org/netdev/20251104120313.1306566-1-razor@blackwall.org/T/#m6888d87658f94ed1725433940f4f4ebb00b5a68b Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251105111919.1499702-3-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06net: bridge: fix use-after-free due to MST port state bypassNikolay Aleksandrov3-6/+8
syzbot reported[1] a use-after-free when deleting an expired fdb. It is due to a race condition between learning still happening and a port being deleted, after all its fdbs have been flushed. The port's state has been toggled to disabled so no learning should happen at that time, but if we have MST enabled, it will bypass the port's state, that together with VLAN filtering disabled can lead to fdb learning at a time when it shouldn't happen while the port is being deleted. VLAN filtering must be disabled because we flush the port VLANs when it's being deleted which will stop learning. This fix adds a check for the port's vlan group which is initialized to NULL when the port is getting deleted, that avoids the port state bypass. When MST is enabled there would be a minimal new overhead in the fast-path because the port's vlan group pointer is cache-hot. [1] https://syzkaller.appspot.com/bug?extid=dd280197f0f7ab3917be Fixes: ec7328b59176 ("net: bridge: mst: Multiple Spanning Tree (MST) mode") Reported-by: syzbot+dd280197f0f7ab3917be@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69088ffa.050a0220.29fc44.003d.GAE@google.com/ Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251105111919.1499702-2-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06lan966x: Fix sleeping in atomic contextHoratiu Vultur4-17/+15
The following warning was seen when we try to connect using ssh to the device. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:575 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 104, name: dropbear preempt_count: 1, expected: 0 INFO: lockdep is turned off. CPU: 0 UID: 0 PID: 104 Comm: dropbear Tainted: G W 6.18.0-rc2-00399-g6f1ab1b109b9-dirty #530 NONE Tainted: [W]=WARN Hardware name: Generic DT based system Call trace: unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x7c/0xac dump_stack_lvl from __might_resched+0x16c/0x2b0 __might_resched from __mutex_lock+0x64/0xd34 __mutex_lock from mutex_lock_nested+0x1c/0x24 mutex_lock_nested from lan966x_stats_get+0x5c/0x558 lan966x_stats_get from dev_get_stats+0x40/0x43c dev_get_stats from dev_seq_printf_stats+0x3c/0x184 dev_seq_printf_stats from dev_seq_show+0x10/0x30 dev_seq_show from seq_read_iter+0x350/0x4ec seq_read_iter from seq_read+0xfc/0x194 seq_read from proc_reg_read+0xac/0x100 proc_reg_read from vfs_read+0xb0/0x2b0 vfs_read from ksys_read+0x6c/0xec ksys_read from ret_fast_syscall+0x0/0x1c Exception stack(0xf0b11fa8 to 0xf0b11ff0) 1fa0: 00000001 00001000 00000008 be9048d8 00001000 00000001 1fc0: 00000001 00001000 00000008 00000003 be905920 0000001e 00000000 00000001 1fe0: 0005404c be9048c0 00018684 b6ec2cd8 It seems that we are using a mutex in a atomic context which is wrong. Change the mutex with a spinlock. Fixes: 12c2d0a5b8e2 ("net: lan966x: add ethtool configuration and statistics") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251105074955.1766792-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06bonding: fix NULL pointer dereference in actor_port_prio settingHangbin Liu1-8/+1
Liang reported an issue where setting a slave’s actor_port_prio to predefined values such as 0, 255, or 65535 would cause a system crash. The problem occurs because in bond_opt_parse(), when the provided value matches a predefined table entry, the function returns that table entry, which does not contain slave information. Later, in bond_option_actor_port_prio_set(), calling bond_slave_get_rtnl() leads to a NULL pointer dereference. Since actor_port_prio is defined as a u16 and initialized to the default value of 255 in ad_initialize_port(), there is no need for the bond_actor_port_prio_tbl. Using the BOND_OPTFLAG_RAWVAL flag is sufficient. Fixes: 6b6dc81ee7e8 ("bonding: add support for per-port LACP actor priority") Reported-by: Liang Li <liali@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20251105072620.164841-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06net: dsa: microchip: Fix reserved multicast address table programmingTristram Ha4-16/+91
KSZ9477/KSZ9897 and LAN937X families of switches use a reserved multicast address table for some specific forwarding with some multicast addresses, like the one used in STP. The hardware assumes the host port is the last port in KSZ9897 family and port 5 in LAN937X family. Most of the time this assumption is correct but not in other cases like KSZ9477. Originally the function just setups the first entry, but the others still need update, especially for one common multicast address that is used by PTP operation. LAN937x also uses different register bits when accessing the reserved table. Fixes: 457c182af597 ("net: dsa: microchip: generic access to ksz9477 static and reserved table") Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Tested-by: Łukasz Majewski <lukma@nabladev.com> Link: https://patch.msgid.link/20251105033741.6455-1-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06drm/tiny: pixpaper: add explicit dependency on MMULiangCheng Wang1-0/+1
The DRM_GEM_SHMEM_HELPER helper requires MMU enabled because it uses vmf_insert_pfn() in its mmap implementation. On NOMMU configurations (e.g. some RISC-V randconfig builds), this symbol is unavailable and selecting DRM_GEM_SHMEM_HELPER causes a modpost undefined reference: ERROR: modpost: "vmf_insert_pfn" [drivers/gpu/drm/drm_shmem_helper.ko] undefined! Normally, Kconfig prevents this helper from being selected when CONFIG_MMU=n. However, in some randconfig builds (such as those used by 0day CI), select statements can override unmet dependencies, triggering the issue. Add an explicit dependency on MMU to DRM_PIXPAPER to prevent this. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202510280213.0rlYA4T3-lkp@intel.com/ Fixes: 0c4932f6ddf8 ("drm/tiny: pixpaper: Fix missing dependency on DRM_GEM_SHMEM_HELPER") Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: LiangCheng Wang <zaq14760@gmail.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20251028-bar-v1-1-edfbd13fafff@gmail.com
2025-11-06futex: Optimize per-cpu reference countingPeter Zijlstra1-6/+6
Shrikanth noted that the per-cpu reference counter was still some 10% slower than the old immutable option (which removes the reference counting entirely). Further optimize the per-cpu reference counter by: - switching from RCU to preempt; - using __this_cpu_*() since we now have preempt disabled; - switching from smp_load_acquire() to READ_ONCE(). This is all safe because disabling preemption inhibits the RCU grace period exactly like rcu_read_lock(). Having preemption disabled allows using __this_cpu_*() provided the only access to the variable is in task context -- which is the case here. Furthermore, since we know changing fph->state to FR_ATOMIC demands a full RCU grace period we can rely on the implied smp_mb() from that to replace the acquire barrier(). This is very similar to the percpu_down_read_internal() fast-path. The reason this is significant for PowerPC is that it uses the generic this_cpu_*() implementation which relies on local_irq_disable() (the x86 implementation relies on it being a single memop instruction to be IRQ-safe). Switching to preempt_disable() and __this_cpu*() avoids this IRQ state swizzling. Also, PowerPC needs LWSYNC for the ACQUIRE barrier, not having to use explicit barriers safes a bunch. Combined this reduces the performance gap by half, down to some 5%. Fixes: 760e6f7befba ("futex: Remove support for IMMUTABLE") Reported-by: Shrikanth Hegde <sshegde@linux.ibm.com> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20251106092929.GR4067720@noisy.programming.kicks-ass.net
2025-11-06sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remainingAaron Lu2-10/+7
When a cfs_rq is to be throttled, its limbo list should be empty and that's why there is a warn in tg_throttle_down() for non empty cfs_rq->throttled_limbo_list. When running a test with the following hierarchy: root / \ A* ... / | \ ... B / \ C* where both A and C have quota settings, that warn on non empty limbo list is triggered for a cfs_rq of C, let's call it cfs_rq_c(and ignore the cpu part of the cfs_rq for the sake of simpler representation). Debug showed it happened like this: Task group C is created and quota is set, so in tg_set_cfs_bandwidth(), cfs_rq_c is initialized with runtime_enabled set, runtime_remaining equals to 0 and *unthrottled*. Before any tasks are enqueued to cfs_rq_c, *multiple* throttled tasks can migrate to cfs_rq_c (e.g., due to task group changes). When enqueue_task_fair(cfs_rq_c, throttled_task) is called and cfs_rq_c is in a throttled hierarchy (e.g., A is throttled), these throttled tasks are directly placed into cfs_rq_c's limbo list by enqueue_throttled_task(). Later, when A is unthrottled, tg_unthrottle_up(cfs_rq_c) enqueues these tasks. The first enqueue triggers check_enqueue_throttle(), and with zero runtime_remaining, cfs_rq_c can be throttled in throttle_cfs_rq() if it can't get more runtime and enters tg_throttle_down(), where the warning is hit due to remaining tasks in the limbo list. I think it's a chaos to trigger throttle on unthrottle path, the status of a being unthrottled cfs_rq can be in a mixed state in the end, so fix this by granting 1ns to cfs_rq in tg_set_cfs_bandwidth(). This ensures cfs_rq_c has a positive runtime_remaining when initialized as unthrottled and cannot enter tg_unthrottle_up() with zero runtime_remaining. Also, update outdated comments in tg_throttle_down() since unthrottle_cfs_rq() is no longer called with zero runtime_remaining. While at it, remove a redundant assignment to se in tg_throttle_down(). Fixes: e1fad12dcb66 ("sched/fair: Switch to task based throttle model") Reviewed-By: Benjamin Segall <bsegall@google.com> Suggested-by: Benjamin Segall <bsegall@google.com> Signed-off-by: Aaron Lu <ziqianlu@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Hao Jia <jiahao1@lixiang.com> Link: https://patch.msgid.link/20251030032755.560-1-ziqianlu@bytedance.com
2025-11-06xfs: free xfs_busy_extents structure when no RT extents are queuedChristoph Hellwig1-1/+3
kmemleak occasionally reports leaking xfs_busy_extents structure from xfs_scrub calls after running xfs/528 (but attributed to following tests), which seems to be caused by not freeing the xfs_busy_extents structure when tr.queued is 0 and xfs_trim_rtgroup_extents breaks out of the main loop. Free the structure in this case. Fixes: a3315d11305f ("xfs: use rtgroup busy extent list for FITRIM") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-06slab: prevent infinite loop in kmalloc_nolock() with debuggingVlastimil Babka1-1/+5
In review of a followup work, Harry noticed a potential infinite loop. Upon closed inspection, it already exists for kmalloc_nolock() on a cache with debugging enabled, since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") When alloc_single_from_new_slab() fails to trylock node list_lock, we keep retrying to get partial slab or allocate a new slab. If we indeed interrupted somebody holding the list_lock, the trylock fill fail deterministically and we end up allocating and defer-freeing slabs indefinitely with no progress. To fix it, fail the allocation if spinning is not allowed. This is acceptable in the restricted context of kmalloc_nolock(), especially with debugging enabled. Reported-by: Harry Yoo <harry.yoo@oracle.com> Closes: https://lore.kernel.org/all/aQLqZjjq1SPD3Fml@hyeyoo/ Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") Acked-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20251103-fix-nolock-loop-v1-1-6e2b3e82b9da@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-05Merge tag 'wireless-2025-11-05' of ↵Jakub Kicinski2-70/+59
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Just two small fixes: - ath12k: revert a change that caused performance regressions - hwsim: don't ignore netns on netlink socket matching * tag 'wireless-2025-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211_hwsim: Limit destroy_on_close radio removal to netgroup Revert "wifi: ath12k: Fix missing station power save configuration" ==================== Link: https://patch.msgid.link/20251105152827.53254-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05net: wan: framer: pef2256: Switch to devm_mfd_add_devices()Haotian Zhang1-3/+4
The driver calls mfd_add_devices() but fails to call mfd_remove_devices() in error paths after successful MFD device registration and in the remove function. This leads to resource leaks where MFD child devices are not properly unregistered. Replace mfd_add_devices with devm_mfd_add_devices to automatically manage the device resources. Fixes: c96e976d9a05 ("net: wan: framer: Add support for the Lantiq PEF2256 framer") Suggested-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Acked-by: Herve Codina <herve.codina@bootlin.com> Link: https://patch.msgid.link/20251105034716.662-1-vulab@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05net: libwx: fix device bus LAN IDJiawen Wu2-3/+4
The device bus LAN ID was obtained from PCI_FUNC(), but when a PF port is passthrough to a virtual machine, the function number may not match the actual port index on the device. This could cause the driver to perform operations such as LAN reset on the wrong port. Fix this by reading the LAN ID from port status register. Fixes: a34b3e6ed8fb ("net: txgbe: Store PCI info") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/B60A670C1F52CB8E+20251104062321.40059-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05Merge branch 'net-mlx5e-shampo-fixes-for-64kb-page-size'Jakub Kicinski3-38/+61
Tariq Toukan says: ==================== net/mlx5e: SHAMPO fixes for 64KB page size This series by Dragos contains fixes for HW-GRO issues found on systems with 64KB page size. ==================== Link: https://patch.msgid.link/1762238915-1027590-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05net/mlx5e: SHAMPO, Fix header formulas for higher MTUs and 64K pagesDragos Tatulea3-19/+41
The MLX5E_SHAMPO_WQ_HEADER_PER_PAGE and MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE macros are used directly in several places under the assumption that there will always be more headers per WQE than headers per page. However, this assumption doesn't hold for 64K page sizes and higher MTUs (> 4K). This can be first observed during header page allocation: ksm_entries will become 0 during alignment to MLX5E_SHAMPO_WQ_HEADER_PER_PAGE. This patch introduces 2 additional members to the mlx5e_shampo_hd struct which are meant to be used instead of the macrose mentioned above. When the number of headers per WQE goes below MLX5E_SHAMPO_WQ_HEADER_PER_PAGE, clamp the number of headers per page and expand the header size accordingly so that the headers for one WQE cover a full page. All the formulas are adapted to use these two new members. Fixes: 945ca432bfd0 ("net/mlx5e: SHAMPO, Drop info array") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1762238915-1027590-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05net/mlx5e: SHAMPO, Fix skb size check for 64K pagesDragos Tatulea1-1/+4
mlx5e_hw_gro_skb_has_enough_space() uses a formula to check if there is enough space in the skb frags to store more data. This formula is incorrect for 64K page sizes and it triggers early GRO session termination because the first fragment will blow up beyond GRO_LEGACY_MAX_SIZE. This patch adds a special case for page sizes >= GRO_LEGACY_MAX_SIZE (64K) which uses the skb->len instead. Within this context, the check is safe from fragment overflow because the hardware will continuously fill the data up to the reservation size of 64K and the driver will coalesce all data from the same page to the same fragment. This means that the data will span one fragment or at most two for such a large page size. It is expected that the if statement will be optimized out as the check is done with constants. Fixes: 92552d3abd32 ("net/mlx5e: HW_GRO cqe handler implementation") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1762238915-1027590-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05net/mlx5e: SHAMPO, Fix header mapping for 64K pagesDragos Tatulea1-19/+17
HW-GRO is broken on mlx5 for 64K page sizes. The patch in the fixes tag didn't take into account larger page sizes when doing an align down of max_ksm_entries. For 64K page size, max_ksm_entries is 0 which will skip mapping header pages via WQE UMR. This breaks header-data split and will result in the following syndrome: mlx5_core 0000:00:08.0 eth2: Error cqe on cqn 0x4c9, ci 0x0, qn 0x1133, opcode 0xe, syndrome 0x4, vendor syndrome 0x32 00000000: 00 00 00 00 04 4a 00 00 00 00 00 00 20 00 93 32 00000010: 55 00 00 00 fb cc 00 00 00 00 00 00 07 18 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4a 00000030: 00 00 3b c7 93 01 32 04 00 00 00 00 00 00 bf e0 mlx5_core 0000:00:08.0 eth2: ERR CQE on RQ: 0x1133 Furthermore, the function that fills in WQE UMRs for the headers (mlx5e_build_shampo_hd_umr()) only supports mapping page sizes that fit in a single UMR WQE. This patch goes back to the old non-aligned max_ksm_entries value and it changes mlx5e_build_shampo_hd_umr() to support mapping a large page over multiple UMR WQEs. This means that mlx5e_build_shampo_hd_umr() can now leave a page only partially mapped. The caller, mlx5e_alloc_rx_hd_mpwqe(), ensures that there are enough UMR WQEs to cover complete pages by working on ksm_entries that are multiples of MLX5E_SHAMPO_WQ_HEADER_PER_PAGE. Fixes: 8a0ee54027b1 ("net/mlx5e: SHAMPO, Simplify UMR allocation for headers") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1762238915-1027590-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05net: ti: icssg-prueth: Fix fdb hash size configurationMeghana Malladi1-0/+7
The ICSSG driver does the initial FDB configuration which includes setting the control registers. Other run time management like learning is managed by the PRU's. The default FDB hash size used by the firmware is 512 slots, which is currently missing in the current driver. Update the driver FDB config to include FDB hash size as well. Please refer trm [1] 6.4.14.12.17 section on how the FDB config register gets configured. From the table 6-1404, there is a reset field for FDB_HAS_SIZE which is 4, meaning 1024 slots. Currently the driver is not updating this reset value from 4(1024 slots) to 3(512 slots). This patch fixes this by updating the reset value to 512 slots. [1]: https://www.ti.com/lit/pdf/spruim2 Fixes: abd5576b9c57f ("net: ti: icssg-prueth: Add support for ICSSG switch firmware") Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251104104415.3110537-1-m-malladi@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05net/mlx5e: Fix return value in case of module EEPROM read errorGal Pressman1-3/+1
mlx5e_get_module_eeprom_by_page() has weird error handling. First, it is treating -EINVAL as a special case, but it is unclear why. Second, it tries to fail "gracefully" by returning the number of bytes read even in case of an error. This results in wrongly returning success (0 return value) if the error occurs before any bytes were read. Simplify the error handling by returning an error when such occurs. This also aligns with the error handling we have in mlx5e_get_module_eeprom() for the old API. This fixes the following case where the query fails, but userspace ethtool wrongly treats it as success and dumps an output: # ethtool -m eth2 netlink warning: mlx5_core: Query module eeprom by page failed, read 0 bytes, err -5 netlink warning: mlx5_core: Query module eeprom by page failed, read 0 bytes, err -5 Offset Values ------ ------ 0x0000: 00 00 00 00 05 00 04 00 00 00 00 00 05 00 05 00 0x0010: 00 00 00 00 05 00 06 00 50 00 00 00 67 65 20 66 0x0020: 61 69 6c 65 64 2c 20 72 65 61 64 20 30 20 62 79 0x0030: 74 65 73 2c 20 65 72 72 20 2d 35 00 14 00 03 00 0x0040: 08 00 01 00 03 00 00 00 08 00 02 00 1a 00 00 00 0x0050: 14 00 04 00 08 00 01 00 04 00 00 00 08 00 02 00 0x0060: 0e 00 00 00 14 00 05 00 08 00 01 00 05 00 00 00 0x0070: 08 00 02 00 1a 00 00 00 14 00 06 00 08 00 01 00 Fixes: e109d2b204da ("net/mlx5: Implement get_module_eeprom_by_page()") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Alex Lazar <alazar@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1762265736-1028868-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05net: gro_cells: Reduce lock scope in gro_cell_pollSebastian Andrzej Siewior1-2/+2
One GRO-cell device's NAPI callback can nest into the GRO-cell of another device if the underlying device is also using GRO-cell. This is the case for IPsec over vxlan. These two GRO-cells are separate devices. From lockdep's point of view it is the same because each device is sharing the same lock class and so it reports a possible deadlock assuming one device is nesting into itself. Hold the bh_lock only while accessing gro_cell::napi_skbs in gro_cell_poll(). This reduces the locking scope and avoids acquiring the same lock class multiple times. Fixes: 25718fdcbdd2 ("net: gro_cells: Use nested-BH locking for gro_cell") Reported-by: Gal Pressman <gal@nvidia.com> Closes: https://lore.kernel.org/all/66664116-edb8-48dc-ad72-d5223696dd19@nvidia.com/ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20251104153435.ty88xDQt@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05libie: depend on DEBUG_FS when building LIBIE_FWLOGMichal Swiatkowski3-4/+14
LIBIE_FWLOG is unusable without DEBUG_FS. Mark it in Kconfig. Fix build error on ixgbe when DEBUG_FS is not set. To not add another layer of #if IS_ENABLED(LIBIE_FWLOG) in ixgbe fwlog code define debugfs dentry even when DEBUG_FS isn't enabled. In this case the dummy functions of LIBIE_FWLOG will be used, so not initialized dentry isn't a problem. Fixes: 641585bc978e ("ixgbe: fwlog support for e610") Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/lkml/f594c621-f9e1-49f2-af31-23fbcb176058@roeck-us.net/ Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20251104172333.752445-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06drm/nouveau: Advertise correct modifiers on GB20xJames Jones4-3/+59
8 and 16 bit formats use a different layout on GB20x than they did on prior chips. Add the corresponding DRM format modifiers to the list of modifiers supported by the display engine on such chips, and filter the supported modifiers for each format based on its bytes per pixel in nv50_plane_format_mod_supported(). Note this logic will need to be updated when GB10 support is added, since it is a GB20x chip that uses the pre-GB20x sector layout for all formats. Fixes: 6cc6e08d4542 ("drm/nouveau/kms: add support for GB20x") Signed-off-by: James Jones <jajones@nvidia.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251030181153.1208-3-jajones@nvidia.com
2025-11-06drm: define NVIDIA DRM format modifiers for GB20xJames Jones1-9/+16
The layout of bits within the individual tiles (referred to as sectors in the DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro) changed for 8 and 16-bit surfaces starting in Blackwell 2 GPUs (With the exception of GB10). To denote the difference, extend the sector field in the parametric format modifier definition used to generate modifier values for NVIDIA hardware. Without this change, it would be impossible to differentiate the two layouts based on modifiers, and as a result software could attempt to share surfaces directly between pre-GB20x and GB20x cards, resulting in corruption when the surface was accessed on one of the GPUs after being populated with content by the other. Of note: This change causes the DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro to evaluate its "s" parameter twice, with the side effects that entails. I surveyed all usage of the modifier in the kernel and Mesa code, and that does not appear to be problematic in any current usage, but I thought it was worth calling out. Fixes: 6cc6e08d4542 ("drm/nouveau/kms: add support for GB20x") Signed-off-by: James Jones <jajones@nvidia.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251030181153.1208-2-jajones@nvidia.com
2025-11-06drm/nouveau: set DMA mask before creating the flush pageTimur Tabi1-12/+12
Set the DMA mask before calling nvkm_device_ctor(), so that when the flush page is created in nvkm_fb_ctor(), the allocation will not fail if the page is outside of DMA address space, which can easily happen if IOMMU is disable. In such situations, you will get an error like this: nouveau 0000:65:00.0: DMA addr 0x0000000107c56000+4096 overflow (mask ffffffff, bus limit 0). Commit 38f5359354d4 ("rm/nouveau/pci: set streaming DMA mask early") set the mask after calling nvkm_device_ctor(), but back then there was no flush page being created, which might explain why the mask wasn't set earlier. Flush page allocation was added in commit 5728d064190e ("drm/nouveau/fb: handle sysmem flush page from common code"). nvkm_fb_ctor() calls alloc_page(), which can allocate a page anywhere in system memory, but then calls dma_map_page() on that page. But since the DMA mask is still set to 32, the map can fail if the page is allocated above 4GB. This is easy to reproduce on systems with a lot of memory and IOMMU disabled. An alternative approach would be to force the allocation of the flush page to low memory, by specifying __GFP_DMA32. However, this would always allocate the page in low memory, even though the hardware can access high memory. Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20251014174512.3172102-1-ttabi@nvidia.com
2025-11-05Merge tag 'rust-fixes-6.18' of ↵Linus Torvalds5-5/+18
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull rust fixes from Miguel Ojeda: - Fix/workaround a couple Rust 1.91.0 build issues when sanitizers are enabled due to extra checking performed by the compiler and an upstream issue already fixed for Rust 1.93.0 - Fix future Rust 1.93.0 builds by supporting the stabilized name for the 'no-jump-tables' flag - Fix a couple private/broken intra-doc links uncovered by the future move of pin-init to 'syn' * tag 'rust-fixes-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: rust: kbuild: support `-Cjump-tables=n` for Rust 1.93.0 rust: kbuild: workaround `rustdoc` doctests modifier bug rust: kbuild: treat `build_error` and `rustdoc` as kernel objects rust: condvar: fix broken intra-doc link rust: devres: fix private intra-doc link
2025-11-05iommufd: Make vfio_compat's unmap succeed if the range is already emptyJason Gunthorpe3-9/+9
iommufd returns ENOENT when attempting to unmap a range that is already empty, while vfio type1 returns success. Fix vfio_compat to match. Fixes: d624d6652a65 ("iommufd: vfio container FD ioctl compatibility") Link: https://patch.msgid.link/r/0-v1-76be45eff0be+5d-iommufd_unmap_compat_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Alex Mastro <amastro@fb.com> Reported-by: Alex Mastro <amastro@fb.com> Closes: https://lore.kernel.org/r/aP0S5ZF9l3sWkJ1G@devgpu012.nha5.facebook.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-05Merge tag 'platform-drivers-x86-v6.18-3' of ↵Linus Torvalds6-7/+28
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: "Fixes and New Hotkey Support: - input + dell-wmi-base: Electronic privacy screen on/off hotkey support - int3472: Fix unregister double free - wireless-hotkey: Fix Kconfig typo" * tag 'platform-drivers-x86-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform: x86: Kconfig: fix minor typo in help for WIRELESS_HOTKEY platform/x86: dell-wmi-base: Handle electronic privacy screen on/off events Input: Add keycodes for electronic privacy screen on/off hotkeys MAINTAINERS: Update int3472 maintainers platform/x86: int3472: Fix double free of GPIO device during unregister
2025-11-05io_uring: fix types for region size calulationPavel Begunkov1-1/+1
->nr_pages is int, it needs type extension before calculating the region size. Fixes: a90558b36ccee ("io_uring/memmap: helper for pinning region pages") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> [axboe: style fixup] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-05xfs: fix zone selection in xfs_select_open_zone_mruChristoph Hellwig1-1/+1
xfs_select_open_zone_mru needs to pass XFS_ZONE_ALLOC_OK to xfs_try_use_zone because we only want to tightly pack into zones of the same or a compatible temperature instead of any available zone. This got broken in commit 0301dae732a5 ("xfs: refactor hint based zone allocation"), which failed to update this particular caller when switching to an enum. xfs/638 sometimes, but not reliably fails due to this change. Fixes: 0301dae732a5 ("xfs: refactor hint based zone allocation") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-05xfs: fix a rtgroup leak when xfs_init_zone failsChristoph Hellwig1-1/+3
Drop the rtgrop reference when xfs_init_zone fails for a conventional device. Fixes: 4e4d52075577 ("xfs: add the zoned space allocator") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-05xfs: fix various problems in xfs_atomic_write_cow_iomap_beginDarrick J. Wong1-11/+50
I think there are several things wrong with this function: A) xfs_bmapi_write can return a much larger unwritten mapping than what the caller asked for. We convert part of that range to written, but return the entire written mapping to iomap even though that's inaccurate. B) The arguments to xfs_reflink_convert_cow_locked are wrong -- an unwritten mapping could be *smaller* than the write range (or even the hole range). In this case, we convert too much file range to written state because we then return a smaller mapping to iomap. C) It doesn't handle delalloc mappings. This I covered in the patch that I already sent to the list. D) Reassigning count_fsb to handle the hole means that if the second cmap lookup attempt succeeds (due to racing with someone else) we trim the mapping more than is strictly necessary. The changing meaning of count_fsb makes this harder to notice. E) The tracepoint is kinda wrong because @length is mutated. That makes it harder to chase the data flows through this function because you can't just grep on the pos/bytecount strings. F) We don't actually check that the br_state = XFS_EXT_NORM assignment is accurate, i.e that the cow fork actually contains a written mapping for the range we're interested in G) Somewhat inadequate documentation of why we need to xfs_trim_extent so aggressively in this function. H) Not sure why xfs_iomap_end_fsb is used here, the vfs already clamped the write range to s_maxbytes. Fix these issues, and then the atomic writes regressions in generic/760, generic/617, generic/091, generic/263, and generic/521 all go away for me. Cc: stable@vger.kernel.org # v6.16 Fixes: bd1d2c21d5d249 ("xfs: add xfs_atomic_write_cow_iomap_begin()") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-05xfs: fix delalloc write failures in software-provided atomic writesDarrick J. Wong1-2/+19
With the 20 Oct 2025 release of fstests, generic/521 fails for me on regular (aka non-block-atomic-writes) storage: QA output created by 521 dowrite: write: Input/output error LOG DUMP (8553 total operations): 1( 1 mod 256): SKIPPED (no operation) 2( 2 mod 256): WRITE 0x7e000 thru 0x8dfff (0x10000 bytes) HOLE 3( 3 mod 256): READ 0x69000 thru 0x79fff (0x11000 bytes) 4( 4 mod 256): FALLOC 0x53c38 thru 0x5e853 (0xac1b bytes) INTERIOR 5( 5 mod 256): COPY 0x55000 thru 0x59fff (0x5000 bytes) to 0x25000 thru 0x29fff 6( 6 mod 256): WRITE 0x74000 thru 0x88fff (0x15000 bytes) 7( 7 mod 256): ZERO 0xedb1 thru 0x11693 (0x28e3 bytes) with a warning in dmesg from iomap about XFS trying to give it a delalloc mapping for a directio write. Fix the software atomic write iomap_begin code to convert the reservation into a written mapping. This doesn't fix the data corruption problems reported by generic/760, but it's a start. Cc: stable@vger.kernel.org # v6.16 Fixes: bd1d2c21d5d249 ("xfs: add xfs_atomic_write_cow_iomap_begin()") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-05Merge tag 'ath-current-20251103' of ↵Johannes Berg1-67/+55
git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath Jeff Johnson says: ================== ath.git update for v6.18-rc5 Revert an ath12k change which resulted in a significance performance impact on WCN7850. ================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-05wifi: mac80211_hwsim: Limit destroy_on_close radio removal to netgroupMartin Willi1-3/+4
hwsim radios marked destroy_on_close are removed when the Netlink socket that created them is closed. As the portid is not unique across network namespaces, closing a socket in one namespace may remove radios in another if it has the destroy_on_close flag set. Instead of matching the network namespace, match the netgroup of the radio to limit radio removal to those that have been created by the closing Netlink socket. The netgroup of a radio identifies the network namespace it was created in, and matching on it removes a destroy_on_close radio even if it has been moved to another namespace. Fixes: 100cb9ff40e0 ("mac80211_hwsim: Allow managing radios from non-initial namespaces") Signed-off-by: Martin Willi <martin@strongswan.org> Link: https://patch.msgid.link/20251103082436.30483-1-martin@strongswan.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-05drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cbPierre-Eric Pelloux-Prayer1-15/+19
The Mesa issue referenced below pointed out a possible deadlock: [ 1231.611031] Possible interrupt unsafe locking scenario: [ 1231.611033] CPU0 CPU1 [ 1231.611034] ---- ---- [ 1231.611035] lock(&xa->xa_lock#17); [ 1231.611038] local_irq_disable(); [ 1231.611039] lock(&fence->lock); [ 1231.611041] lock(&xa->xa_lock#17); [ 1231.611044] <Interrupt> [ 1231.611045] lock(&fence->lock); [ 1231.611047] *** DEADLOCK *** In this example, CPU0 would be any function accessing job->dependencies through the xa_* functions that don't disable interrupts (eg: drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()). CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling callback so in an interrupt context. It will deadlock when trying to grab the xa_lock which is already held by CPU0. Replacing all xa_* usage by their xa_*_irq counterparts would fix this issue, but Christian pointed out another issue: dma_fence_signal takes fence.lock and so does dma_fence_add_callback. dma_fence_signal() // locks f1.lock -> drm_sched_entity_kill_jobs_cb() -> foreach dependencies -> dma_fence_add_callback() // locks f2.lock This will deadlock if f1 and f2 share the same spinlock. To fix both issues, the code iterating on dependencies and re-arming them is moved out to drm_sched_entity_kill_jobs_work(). Cc: stable@vger.kernel.org # v6.2+ Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini") Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13908 Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> [phasta: commit message nits] Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patch.msgid.link/20251104095358.15092-1-pierre-eric.pelloux-prayer@amd.com
2025-11-05gpio: aggregator: restore the set_config operationThomas Richard1-0/+1
Restore the set_config operation, as it was lost during the refactoring of the gpio-aggregator driver while creating the gpio forwarder library. Fixes: b31c68fd851e7 ("gpio: aggregator: handle runtime registration of gpio_desc in gpiochip_fwd") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202509281206.a7334ae8-lkp@intel.com Signed-off-by: Thomas Richard <thomas.richard@bootlin.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20250929-gpio-aggregator-fix-set-config-callback-v1-1-39046e1da609@bootlin.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-11-05Merge tag 'media/v6.18-2' of ↵Linus Torvalds10-37/+73
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - honour privacy led with pdx86/int3472 - fix invalid file access on cx18 and ivtv - forbid remove_bufs when legacy fileio is active on videbuf2 - add an heuristic to find stream entity on uvcvideo * tag 'media/v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: videobuf2: forbid remove_bufs when legacy fileio is active media: uvcvideo: Use heuristic to find stream entity media: v4l2-subdev / pdx86: int3472: Use "privacy" as con_id for the privacy LED media: ivtv: Fix invalid access to file * media: cx18: Fix invalid access to file *
2025-11-04netpoll: Fix deadlock in memory allocation under spinlockBreno Leitao1-5/+2
Fix a AA deadlock in refill_skbs() where memory allocation while holding skb_pool->lock can trigger a recursive lock acquisition attempt. The deadlock scenario occurs when the system is under severe memory pressure: 1. refill_skbs() acquires skb_pool->lock (spinlock) 2. alloc_skb() is called while holding the lock 3. Memory allocator fails and calls slab_out_of_memory() 4. This triggers printk() for the OOM warning 5. The console output path calls netpoll_send_udp() 6. netpoll_send_udp() attempts to acquire the same skb_pool->lock 7. Deadlock: the lock is already held by the same CPU Call stack: refill_skbs() spin_lock_irqsave(&skb_pool->lock) <- lock acquired __alloc_skb() kmem_cache_alloc_node_noprof() slab_out_of_memory() printk() console_flush_all() netpoll_send_udp() skb_dequeue() spin_lock_irqsave(&skb_pool->lock) <- deadlock attempt This bug was exposed by commit 248f6571fd4c51 ("netpoll: Optimize skb refilling on critical path") which removed refill_skbs() from the critical path (where nested printk was being deferred), letting nested printk being called from inside refill_skbs() Refactor refill_skbs() to never allocate memory while holding the spinlock. Another possible solution to fix this problem is protecting the refill_skbs() from nested printks, basically calling printk_deferred_{enter,exit}() in refill_skbs(), then, any nested pr_warn() would be deferred. I prefer this approach, given I _think_ it might be a good idea to move the alloc_skb() from GFP_ATOMIC to GFP_KERNEL in the future, so, having the alloc_skb() outside of the lock will be necessary step. There is a possible TOCTOU issue when checking for the pool length, and queueing the new allocated skb, but, this is not an issue, given that an extra SKB in the pool is harmless and it will be eventually used. Signed-off-by: Breno Leitao <leitao@debian.org> Fixes: 248f6571fd4c51 ("netpoll: Optimize skb refilling on critical path") Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251103-fix_netpoll_aa-v4-1-4cfecdf6da7c@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: ethernet: ti: netcp: Standardize knav_dma_open_channel to return NULL ↵Nishanth Menon2-12/+12
on error Make knav_dma_open_channel consistently return NULL on error instead of ERR_PTR. Currently the header include/linux/soc/ti/knav_dma.h returns NULL when the driver is disabled, but the driver implementation does not even return NULL or ERR_PTR on failure, causing inconsistency in the users. This results in a crash in netcp_free_navigator_resources as followed (trimmed): Unhandled fault: alignment exception (0x221) at 0xfffffff2 [fffffff2] *pgd=80000800207003, *pmd=82ffda003, *pte=00000000 Internal error: : 221 [#1] SMP ARM Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.0-rc7 #1 NONE Hardware name: Keystone PC is at knav_dma_close_channel+0x30/0x19c LR is at netcp_free_navigator_resources+0x2c/0x28c [... TRIM...] Call trace: knav_dma_close_channel from netcp_free_navigator_resources+0x2c/0x28c netcp_free_navigator_resources from netcp_ndo_open+0x430/0x46c netcp_ndo_open from __dev_open+0x114/0x29c __dev_open from __dev_change_flags+0x190/0x208 __dev_change_flags from netif_change_flags+0x1c/0x58 netif_change_flags from dev_change_flags+0x38/0xa0 dev_change_flags from ip_auto_config+0x2c4/0x11f0 ip_auto_config from do_one_initcall+0x58/0x200 do_one_initcall from kernel_init_freeable+0x1cc/0x238 kernel_init_freeable from kernel_init+0x1c/0x12c kernel_init from ret_from_fork+0x14/0x38 [... TRIM...] Standardize the error handling by making the function return NULL on all error conditions. The API is used in just the netcp_core.c so the impact is limited. Note, this change, in effect reverts commit 5b6cb43b4d62 ("net: ethernet: ti: netcp_core: return error while dma channel open issue"), but provides a less error prone implementation. Suggested-by: Simon Horman <horms@kernel.org> Suggested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251103162811.3730055-1-nm@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04virtio-net: fix received length check in big packetsBui Quang Minh1-13/+12
Since commit 4959aebba8c0 ("virtio-net: use mtu size as buffer length for big packets"), when guest gso is off, the allocated size for big packets is not MAX_SKB_FRAGS * PAGE_SIZE anymore but depends on negotiated MTU. The number of allocated frags for big packets is stored in vi->big_packets_num_skbfrags. Because the host announced buffer length can be malicious (e.g. the host vhost_net driver's get_rx_bufs is modified to announce incorrect length), we need a check in virtio_net receive path. Currently, the check is not adapted to the new change which can lead to NULL page pointer dereference in the below while loop when receiving length that is larger than the allocated one. This commit fixes the received length check corresponding to the new change. Fixes: 4959aebba8c0 ("virtio-net: use mtu size as buffer length for big packets") Cc: stable@vger.kernel.org Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://patch.msgid.link/20251030144438.7582-1-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04Merge branch 'bnxt_en-bug-fixes'Jakub Kicinski5-7/+13
Michael Chan says: ==================== bnxt_en: Bug fixes Patches 1, 3, and 4 are bug fixes related to the FW log tracing driver coredump feature recently added in 6.13. Patch #1 adds the necessary call to shutdown the FW logging DMA during PCI shutdown. Patch #3 fixes a possible null pointer derefernce when using early versions of the FW with this feature. Patch #4 adds the coredump header information unconditionally to make it more robust. Patch #2 fixes a possible memory leak during PTP shutdown. Patch #5 eliminates a dmesg warning when doing devlink reload. ==================== Link: https://patch.msgid.link/20251104005700.542174-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04bnxt_en: Fix warning in bnxt_dl_reload_down()Shantiprasad Shettar3-2/+3
The existing code calls bnxt_cancel_reservations() after bnxt_hwrm_func_drv_unrgtr() in bnxt_dl_reload_down(). bnxt_cancel_reservations() calls the FW and it will always fail since the driver has already unregistered, triggering this warning: bnxt_en 0000:0a:00.0 ens2np0: resc_qcaps failed Fix it by calling bnxt_clear_reservations() which will skip the unnecessary FW call since we have unregistered. Fixes: 228ea8c187d8 ("bnxt_en: implement devlink dev reload driver_reinit") Reviewed-by: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Shantiprasad Shettar <shantiprasad.shettar@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20251104005700.542174-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>