summaryrefslogtreecommitdiffstats
path: root/include/net
AgeCommit message (Collapse)AuthorLines
2026-02-26Merge tag 'net-7.0-rc2' of ↵Linus Torvalds-8/+26
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from IPsec, Bluetooth and netfilter Current release - regressions: - wifi: fix dev_alloc_name() return value check - rds: fix recursive lock in rds_tcp_conn_slots_available Current release - new code bugs: - vsock: lock down child_ns_mode as write-once Previous releases - regressions: - core: - do not pass flow_id to set_rps_cpu() - consume xmit errors of GSO frames - netconsole: avoid OOB reads, msg is not nul-terminated - netfilter: h323: fix OOB read in decode_choice() - tcp: re-enable acceptance of FIN packets when RWIN is 0 - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb(). - wifi: brcmfmac: fix potential kernel oops when probe fails - phy: register phy led_triggers during probe to avoid AB-BA deadlock - eth: - bnxt_en: fix deleting of Ntuple filters - wan: farsync: fix use-after-free bugs caused by unfinished tasklets - xscale: check for PTP support properly Previous releases - always broken: - tcp: fix potential race in tcp_v6_syn_recv_sock() - kcm: fix zero-frag skb in frag_list on partial sendmsg error - xfrm: - fix race condition in espintcp_close() - always flush state and policy upon NETDEV_UNREGISTER event - bluetooth: - purge error queues in socket destructors - fix response to L2CAP_ECRED_CONN_REQ - eth: - mlx5: - fix circular locking dependency in dump - fix "scheduling while atomic" in IPsec MAC address query - gve: fix incorrect buffer cleanup for QPL - team: avoid NETDEV_CHANGEMTU event when unregistering slave - usb: validate USB endpoints" * tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) netfilter: nf_conntrack_h323: fix OOB read in decode_choice() dpaa2-switch: validate num_ifs to prevent out-of-bounds write net: consume xmit errors of GSO frames vsock: document write-once behavior of the child_ns_mode sysctl vsock: lock down child_ns_mode as write-once selftests/vsock: change tests to respect write-once child ns mode net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query net/mlx5: Fix missing devlink lock in SRIOV enable error path net/mlx5: E-switch, Clear legacy flag when moving to switchdev net/mlx5: LAG, disable MPESW in lag_disable_change() net/mlx5: DR, Fix circular locking dependency in dump selftests: team: Add a reference count leak test team: avoid NETDEV_CHANGEMTU event when unregistering slave net: mana: Fix double destroy_workqueue on service rescan PCI path MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER dpll: zl3073x: Remove redundant cleanup in devm_dpll_init() selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0 tcp: re-enable acceptance of FIN packets when RWIN is 0 vsock: Use container_of() to get net namespace in sysctl handlers net: usb: kaweth: validate USB endpoints ...
2026-02-26vsock: lock down child_ns_mode as write-onceBobby Eshleman-2/+14
Two administrator processes may race when setting child_ns_mode as one process sets child_ns_mode to "local" and then creates a namespace, but another process changes child_ns_mode to "global" between the write and the namespace creation. The first process ends up with a namespace in "global" mode instead of "local". While this can be detected after the fact by reading ns_mode and retrying, it is fragile and error-prone. Make child_ns_mode write-once so that a namespace manager can set it once and be sure it won't change. Writing a different value after the first write returns -EBUSY. This applies to all namespaces, including init_net, where an init process can write "local" to lock all future namespaces into local mode. Fixes: eafb64f40ca4 ("vsock: add netns to vsock core") Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com> Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Co-developed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-2-c0cde6959923@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-24Merge tag 'for-net-2026-02-23' of ↵Paolo Abeni-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - purge error queues in socket destructors - hci_sync: Fix CIS host feature condition - L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ - L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short - L2CAP: Fix response to L2CAP_ECRED_CONN_REQ - L2CAP: Fix not checking output MTU is acceptable on L2CAP_ECRED_CONN_REQ - L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ - hci_qca: Cleanup on all setup failures * tag 'for-net-2026-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ Bluetooth: L2CAP: Fix not checking output MTU is acceptable on L2CAP_ECRED_CONN_REQ Bluetooth: Fix CIS host feature condition Bluetooth: L2CAP: Fix response to L2CAP_ECRED_CONN_REQ Bluetooth: hci_qca: Cleanup on all setup failures Bluetooth: purge error queues in socket destructors Bluetooth: L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short Bluetooth: L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ ==================== Link: https://patch.msgid.link/20260223211634.3800315-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-24net: Drop the lock in skb_may_tx_timestamp()Sebastian Andrzej Siewior-1/+1
skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must not be taken in IRQ context, only softirq is okay. A few drivers receive the timestamp via a dedicated interrupt and complete the TX timestamp from that handler. This will lead to a deadlock if the lock is already write-locked on the same CPU. Taking the lock can be avoided. The socket (pointed by the skb) will remain valid until the skb is released. The ->sk_socket and ->file member will be set to NULL once the user closes the socket which may happen before the timestamp arrives. If we happen to observe the pointer while the socket is closing but before the pointer is set to NULL then we may use it because both pointer (and the file's cred member) are RCU freed. Drop the lock. Use READ_ONCE() to obtain the individual pointer. Add a matching WRITE_ONCE() where the pointer are cleared. Link: https://lore.kernel.org/all/20260205145104.iWinkXHv@linutronix.de Fixes: b245be1f4db1a ("net-timestamp: no-payload only sysctl") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260220183858.N4ERjFW6@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-23Bluetooth: L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too shortLuiz Augusto von Dentz-3/+3
Test L2CAP/ECFC/BV-26-C expect the response to L2CAP_ECRED_CONN_REQ with and MTU value < L2CAP_ECRED_MIN_MTU (64) to be L2CAP_CR_LE_INVALID_PARAMS rather than L2CAP_CR_LE_UNACCEPT_PARAMS. Also fix not including the correct number of CIDs in the response since the spec requires all CIDs being rejected to be included in the response. Link: https://github.com/bluez/bluez/issues/1868 Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-02-23Bluetooth: L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQLuiz Augusto von Dentz-0/+2
This fixes responding with an invalid result caused by checking the wrong size of CID which should have been (cmd_len - sizeof(*req)) and on top of it the wrong result was use L2CAP_CR_LE_INVALID_PARAMS which is invalid/reserved for reconf when running test like L2CAP/ECFC/BI-03-C: > ACL Data RX: Handle 64 flags 0x02 dlen 14 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6 MTU: 64 MPS: 64 Source CID: 64 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reserved (0x000c) Result: Reconfiguration failed - one or more Destination CIDs invalid (0x0003) Fiix L2CAP/ECFC/BI-04-C which expects L2CAP_RECONF_INVALID_MPS (0x0002) when more than one channel gets its MPS reduced: > ACL Data RX: Handle 64 flags 0x02 dlen 16 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 8 MTU: 264 MPS: 99 Source CID: 64 ! Source CID: 65 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reconfiguration successful (0x0000) Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002) Fix L2CAP/ECFC/BI-05-C when SCID is invalid (85 unconnected): > ACL Data RX: Handle 64 flags 0x02 dlen 14 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6 MTU: 65 MPS: 64 ! Source CID: 85 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reconfiguration successful (0x0000) Result: Reconfiguration failed - one or more Destination CIDs invalid (0x0003) Fix L2CAP/ECFC/BI-06-C when MPS < L2CAP_ECRED_MIN_MPS (64): > ACL Data RX: Handle 64 flags 0x02 dlen 14 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6 MTU: 672 ! MPS: 63 Source CID: 64 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002) Result: Reconfiguration failed - other unacceptable parameters (0x0004) Fix L2CAP/ECFC/BI-07-C when MPS reduced for more than one channel: > ACL Data RX: Handle 64 flags 0x02 dlen 16 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 3 len 8 MTU: 84 ! MPS: 71 Source CID: 64 ! Source CID: 65 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reconfiguration successful (0x0000) Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002) Link: https://github.com/bluez/bluez/issues/1865 Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds-2/+1
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds-2/+2
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook-6/+6
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-19tcp: fix potential race in tcp_v6_syn_recv_sock()Eric Dumazet-2/+6
Code in tcp_v6_syn_recv_sock() after the call to tcp_v4_syn_recv_sock() is done too late. After tcp_v4_syn_recv_sock(), the child socket is already visible from TCP ehash table and other cpus might use it. Since newinet->pinet6 is still pointing to the listener ipv6_pinfo bad things can happen as syzbot found. Move the problematic code in tcp_v6_mapped_child_init() and call this new helper from tcp_v4_syn_recv_sock() before the ehash insertion. This allows the removal of one tcp_sync_mss(), since tcp_v4_syn_recv_sock() will call it with the correct context. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+937b5bbb6a815b3e5d0b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69949275.050a0220.2eeac1.0145.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260217161205.2079883-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-19Merge tag 'net-7.0-rc1' of ↵Linus Torvalds-7/+18
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Netfilter. Current release - new code bugs: - net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT - eth: mlx5e: XSK, Fix unintended ICOSQ change - phy_port: correctly recompute the port's linkmodes - vsock: prevent child netns mode switch from local to global - couple of kconfig fixes for new symbols Previous releases - regressions: - nfc: nci: fix false-positive parameter validation for packet data - net: do not delay zero-copy skbs in skb_attempt_defer_free() Previous releases - always broken: - mctp: ensure our nlmsg responses to user space are zero-initialised - ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data() - fixes for ICMP rate limiting Misc: - intel: fix PCI device ID conflict between i40e and ipw2200" * tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits) net: nfc: nci: Fix parameter validation for packet data net/mlx5e: Use unsigned for mlx5e_get_max_num_channels net/mlx5e: Fix deadlocks between devlink and netdev instance locks net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event net/mlx5: Fix misidentification of write combining CQE during poll loop net/mlx5e: Fix misidentification of ASO CQE during poll loop net/mlx5: Fix multiport device check over light SFs bonding: alb: fix UAF in rlb_arp_recv during bond up/down bnge: fix reserving resources from FW eth: fbnic: Advertise supported XDP features. rds: tcp: fix uninit-value in __inet_bind net/rds: Fix NULL pointer dereference in rds_tcp_accept_one octeontx2-af: Fix default entries mcam entry action net/mlx5e: XSK, Fix unintended ICOSQ change ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow() inet: move icmp_global_{credit,stamp} to a separate cache line icmp: prevent possible overflow in icmp_global_allow() selftests/net: packetdrill: add ipv4-mapped-ipv6 tests ...
2026-02-18inet: move icmp_global_{credit,stamp} to a separate cache lineEric Dumazet-2/+7
icmp_global_credit was meant to be changed ~1000 times per second, but if an admin sets net.ipv4.icmp_msgs_per_sec to a very high value, icmp_global_credit changes can inflict false sharing to surrounding fields that are read mostly. Move icmp_global_credit and icmp_global_stamp to a separate cacheline aligned group. Fixes: b056b4cd9178 ("icmp: move icmp_global.credit and icmp_global.stamp to per netns storage") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260216142832.3834174-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-18Merge tag 'sysctl-7.00-rc1' of ↵Linus Torvalds-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Remove macros from proc handler converters Replace the proc converter macros with "regular" functions. Though it is more verbose than the macro version, it helps when debugging and better aligns with coding-style.rst. - General cleanup Remove superfluous ctl_table forward declarations. Const qualify the memory_allocation_profiling_sysctl and loadpin_sysctl_table arrays. Add missing kernel doc to proc_dointvec_conv. - Testing This series was run through sysctl selftests/kunit test suite in x86_64. And went into linux-next after rc4, giving it a good 3 weeks of testing * tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: replace SYSCTL_INT_CONV_CUSTOM macro with functions sysctl: Replace unidirectional INT converter macros with functions sysctl: Add kernel doc to proc_douintvec_conv sysctl: Replace UINT converter macros with functions sysctl: Add CONFIG_PROC_SYSCTL guards for converter macros sysctl: clarify proc_douintvec_minmax doc sysctl: Return -ENOSYS from proc_douintvec_conv when CONFIG_PROC_SYSCTL=n sysctl: Remove unused ctl_table forward declarations loadpin: Implement custom proc_handler for enforce alloc_tag: move memory_allocation_profiling_sysctls into .rodata sysctl: Add missing kernel-doc for proc_dointvec_conv
2026-02-17ipv6: addrconf: reduce default temp_valid_lft to 2 daysFernando Fernandez Mancera-1/+2
This is a recommendation from RFC 8981 and it was intended to be changed by commit 969c54646af0 ("ipv6: Implement draft-ietf-6man-rfc4941bis") but it only changed the sysctl documentation. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260214172543.5783-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-17ipv6: fix a race in ip6_sock_set_v6only()Eric Dumazet-4/+7
It is unlikely that this function will be ever called with isk->inet_num being not zero. Perform the check on isk->inet_num inside the locked section for complete safety. Fixes: 9b115749acb24 ("ipv6: add ip6_sock_set_v6only") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260216102202.3343588-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-13ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()Qanux-0/+2
On the receive path, __ioam6_fill_trace_data() uses trace->nodelen to decide how much data to write for each node. It trusts this field as-is from the incoming packet, with no consistency check against trace->type (the 24-bit field that tells which data items are present). A crafted packet can set nodelen=0 while setting type bits 0-21, causing the function to write ~100 bytes past the allocated region (into skb_shared_info), which corrupts adjacent heap memory and leads to a kernel panic. Add a shared helper ioam6_trace_compute_nodelen() in ioam6.c to derive the expected nodelen from the type field, and use it: - in ioam6_iptunnel.c (send path, existing validation) to replace the open-coded computation; - in exthdrs.c (receive path, ipv6_hop_ioam) to drop packets whose nodelen is inconsistent with the type field, before any data is written. Per RFC 9197, bits 12-21 are each short (4-octet) fields, so they are included in IOAM6_MASK_SHORT_FIELDS (changed from 0xff100000 to 0xff1ffc00). Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Cc: stable@vger.kernel.org Signed-off-by: Junxi Qian <qjx1298677004@gmail.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260211040412.86195-1-qjx1298677004@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds-3/+44
Pull rdma updates from Jason Gunthorpe: "Usual smallish cycle. The NFS biovec work to push it down into RDMA instead of indirecting through a scatterlist is pretty nice to see, been talked about for a long time now. - Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe - Small driver improvements and minor bug fixes to hns, mlx5, rxe, mana, mlx5, irdma - Robusness improvements in completion processing for EFA - New query_port_speed() verb to move past limited IBA defined speed steps - Support for SG_GAPS in rts and many other small improvements - Rare list corruption fix in iwcm - Better support different page sizes in rxe - Device memory support for mana - Direct bio vec to kernel MR for use by NFS-RDMA - QP rate limiting for bnxt_re - Remote triggerable NULL pointer crash in siw - DMA-buf exporter support for RDMA mmaps like doorbells" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits) RDMA/mlx5: Implement DMABUF export ops RDMA/uverbs: Add DMABUF object type and operations RDMA/uverbs: Support external FD uobjects RDMA/siw: Fix potential NULL pointer dereference in header processing RDMA/umad: Reject negative data_len in ib_umad_write IB/core: Extend rate limit support for RC QPs RDMA/mlx5: Support rate limit only for Raw Packet QP RDMA/bnxt_re: Report QP rate limit in debugfs RDMA/bnxt_re: Report packet pacing capabilities when querying device RDMA/bnxt_re: Add support for QP rate limiting MAINTAINERS: Drop RDMA files from Hyper-V section RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc svcrdma: use bvec-based RDMA read/write API RDMA/core: add rdma_rw_max_sge() helper for SQ sizing RDMA/core: add MR support for bvec-based RDMA operations RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations RDMA/core: add bio_vec based RDMA read/write API RDMA/irdma: Use kvzalloc for paged memory DMA address array RDMA/rxe: Fix race condition in QP timer handlers RDMA/mana_ib: Add device‑memory support ...
2026-02-11net: dsa: add tag format for MxL862xx switchesDaniel Golle-0/+2
Add proprietary special tag format for the MaxLinear MXL862xx family of switches. While using the same Ethertype as MaxLinear's GSW1xx switches, the actual tag format differs significantly, hence we need a dedicated tag driver for that. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/c64e6ddb6c93a4fac39f9ab9b2d8bf551a2b118d.1770433307.git.daniel@makrotopia.org Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10tcp: populate inet->cork.fl.u.ip6 in tcp_v6_syn_recv_sock()Eric Dumazet-1/+3
As explained in commit 85d05e281712 ("ipv6: change inet6_sk_rebuild_header() to use inet->cork.fl.u.ip6"): TCP v6 spends a good amount of time rebuilding a fresh fl6 at each transmit in inet6_csk_xmit()/inet6_csk_route_socket(). TCP v4 caches the information in inet->cork.fl.u.ip4 instead. After this patch, passive TCP ipv6 flows have correctly initialized inet->cork.fl.u.ip6 structure. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-10Merge tag 'nf-next-26-02-06' of ↵Jakub Kicinski-0/+5
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter: updates for net-next The following patchset contains Netfilter updates for *net-next*: 1) Fix net-next-only use-after-free bug in nf_tables rbtree set: Expired elements cannot be released right away after unlink anymore because there is no guarantee that the binary-search blob is going to be updated. Spotted by syzkaller. 2) Fix esoteric bug in nf_queue with udp fraglist gro, broken since 6.11. Patch 3 adds extends the nfqueue selftest for this. 4) Use dedicated slab for flowtable entries, currently the -512 cache is used, which is wasteful. From Qingfang Deng. 5) Recent net-next update extended existing test for ip6ip6 tunnels, add the required /config entry. Test still passed by accident because the previous tests network setup gets re-used, so also update the test so it will fail in case the ip6ip6 tunnel interface cannot be added. 6) Fix 'nft get element mytable myset { 1.2.3.4 }' on big endian platforms, this was broken since code was added in v5.1. 7) Fix nf_tables counter reset support on 32bit platforms, where counter reset may cause huge values to appear due to wraparound. Broken since reset feature was added in v6.11. From Anders Grahn. 8-11) update nf_tables rbtree set type to detect partial operlaps. This will eventually speed up nftables userspace: at this time userspace does a netlink dump of the set content which slows down incremental updates on interval sets. From Pablo Neira Ayuso. * tag 'nf-next-26-02-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nft_set_rbtree: validate open interval overlap netfilter: nft_set_rbtree: validate element belonging to interval netfilter: nft_set_rbtree: check for partial overlaps in anonymous sets netfilter: nft_set_rbtree: fix bogus EEXIST with NLM_F_CREATE with null interval netfilter: nft_counter: fix reset of counters on 32bit archs netfilter: nft_set_hash: fix get operation on big endian selftests: netfilter: add IPV6_TUNNEL to config netfilter: flowtable: dedicated slab for flow entry selftests: netfilter: nft_queue.sh: add udp fraglist gro test case netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation netfilter: nft_set_rbtree: don't gc elements on insert ==================== Link: https://patch.msgid.link/20260206153048.17570-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-10xfrm: reduce struct sec_path sizePaolo Abeni-5/+5
The mentioned struct has an hole and uses unnecessary wide type to store MAC length and indexes of very small arrays. It's also embedded into the skb_extensions, and the latter, due to recent CAN changes, may exceeds the 192 bytes mark (3 cachelines on x86_64 arch) on some reasonable configurations. Reordering and the sec_path fields, shrinking xfrm_offload.orig_mac_len to 16 bits and xfrm_offload.{len,olen,verified_cnt} to u8, we can save 16 bytes and keep skb_extensions size under control. Before: struct sec_path { int len; int olen; int verified_cnt; /* XXX 4 bytes hole, try to pack */$ struct xfrm_state * xvec[6]; struct xfrm_offload ovec[1]; /* size: 88, cachelines: 2, members: 5 */ /* sum members: 84, holes: 1, sum holes: 4 */ /* last cacheline: 24 bytes */ }; After: struct sec_path { struct xfrm_state * xvec[6]; struct xfrm_offload ovec[1]; /* typedef u8 -> __u8 */ unsigned char len; /* typedef u8 -> __u8 */ unsigned char olen; /* typedef u8 -> __u8 */ unsigned char verified_cnt; /* size: 72, cachelines: 2, members: 5 */ /* padding: 1 */ /* last cacheline: 8 bytes */ }; Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Reviewed-by: Steffen Klassert <steffen.klassert@secunet.com> Link: https://patch.msgid.link/83846bd2e3fa08899bd0162e41bfadfec95e82ef.1770398071.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-10net: dsa: eliminate local type for tc policersVladimir Oltean-22/+18
David Yang is saying that struct flow_action_entry in include/net/flow_offload.h has gained new fields and DSA's struct dsa_mall_policer_tc_entry, derived from that, isn't keeping up. This structure is passed to drivers and they are completely oblivious to the values of fields they don't see. This has happened before, and almost always the solution was to make the DSA layer thinner and use the upstream data structures. Here, the reason why we didn't do that is because struct flow_action_entry :: police is an anonymous structure. That is easily enough fixable, just name those fields "struct flow_action_police" and reference them from DSA. Make the according transformations to the two users (sja1105 and felix): "rate_bytes_per_sec" -> "rate_bytes_ps". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Co-developed-by: David Yang <mmyangfl@gmail.com> Signed-off-by: David Yang <mmyangfl@gmail.com> Link: https://patch.msgid.link/20260206075427.44733-1-mmyangfl@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-06net/ipv6: Remove HBH helpersAlice Mikityanska-77/+0
Now that the HBH jumbo helpers are not used by any driver or GSO, remove them altogether. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-13-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06net/ipv6: Introduce payload_len helpersAlice Mikityanska-4/+2
The next commits will transition away from using the hop-by-hop extension header to encode packet length for BIG TCP. Add wrappers around ip6->payload_len that return the actual value if it's non-zero, and calculate it from skb->len if payload_len is set to zero (and a symmetrical setter). The new helpers are used wherever the surrounding code supports the hop-by-hop jumbo header for BIG TCP IPv6, or the corresponding IPv4 code uses skb_ip_totlen (e.g., in include/net/netfilter/nf_tables_ipv6.h). No behavioral change in this commit. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-2-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06tcp: inline tcp_filter()Eric Dumazet-1/+7
This helper is already (auto)inlined from IPv4 TCP stack. Make it an inline function to benefit IPv6 as well. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/2 grow/shrink: 1/0 up/down: 30/-49 (-19) Function old new delta tcp_v6_rcv 3448 3478 +30 __pfx_tcp_filter 16 - -16 tcp_filter 33 - -33 Total: Before=24891904, After=24891885, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260205164329.3401481-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06netns: optimize netns cleaning by batching unhash_nsid callsQiliang Yuan-0/+1
Currently, unhash_nsid() scans the entire system for each netns being killed, leading to O(L_dying_net * M_alive_net * N_id) complexity, as __peernet2id() also performs a linear search in the IDR. Optimize this to O(M_alive_net * N_id) by batching unhash operations. Move unhash_nsid() out of the per-netns loop in cleanup_net() to perform a single-pass traversal over survivor namespaces. Identify dying peers by an 'is_dying' flag, which is set under net_rwsem write lock after the netns is removed from the global list. This batches the unhashing work and eliminates the O(L_dying_net) multiplier. To minimize the impact on struct net size, 'is_dying' is placed in an existing hole after 'hash_mix' in struct net. Use a restartable idr_get_next() loop for iteration. This avoids the unsafe modification issue inherent to idr_for_each() callbacks and allows dropping the nsid_lock to safely call sleepy rtnl_net_notifyid(). Clean up redundant nsid_lock and simplify the destruction loop now that unhashing is centralized. Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204074854.3506916-1-realwujing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06netfilter: nft_set_rbtree: validate open interval overlapPablo Neira Ayuso-0/+4
Open intervals do not have an end element, in particular an open interval at the end of the set is hard to validate because of it is lacking the end element, and interval validation relies on such end element to perform the checks. This patch adds a new flag field to struct nft_set_elem, this is not an issue because this is a temporary object that is allocated in the stack from the insert/deactivate path. This flag field is used to specify that this is the last element in this add/delete command. The last flag is used, in combination with the start element cookie, to check if there is a partial overlap, eg. Already exists: 255.255.255.0-255.255.255.254 Add interval: 255.255.255.0-255.255.255.255 ~~~~~~~~~~~~~ start element overlap Basically, the idea is to check for an existing end element in the set if there is an overlap with an existing start element. However, the last open interval can come in any position in the add command, the corner case can get a bit more complicated: Already exists: 255.255.255.0-255.255.255.254 Add intervals: 255.255.255.0-255.255.255.255,255.255.255.0-255.255.255.254 ~~~~~~~~~~~~~ start element overlap To catch this overlap, annotate that the new start element is a possible overlap, then report the overlap if the next element is another start element that confirms that previous element in an open interval at the end of the set. For deletions, do not update the start cookie when deleting an open interval, otherwise this can trigger spurious EEXIST when adding new elements. Unfortunately, there is no NFT_SET_ELEM_INTERVAL_OPEN flag which would make easier to detect open interval overlaps. Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentationFlorian Westphal-0/+1
Ulrich reports a regression with nfqueue: If an application did not set the 'F_GSO' capability flag and a gso packet with an unconfirmed nf_conn entry is received all packets are now dropped instead of queued, because the check happens after skb_gso_segment(). In that case, we did have exclusive ownership of the skb and its associated conntrack entry. The elevated use count is due to skb_clone happening via skb_gso_segment(). Move the check so that its peformed vs. the aggregated packet. Then, annotate the individual segments except the first one so we can do a 2nd check at reinject time. For the normal case, where userspace does in-order reinjects, this avoids packet drops: first reinjected segment continues traversal and confirms entry, remaining segments observe the confirmed entry. While at it, simplify nf_ct_drop_unconfirmed(): We only care about unconfirmed entries with a refcnt > 1, there is no need to special-case dying entries. This only happens with UDP. With TCP, the only unconfirmed packet will be the TCP SYN, those aren't aggregated by GRO. Next patch adds a udpgro test case to cover this scenario. Reported-by: Ulrich Weber <ulrich.weber@gmail.com> Fixes: 7d8dc1c7be8d ("netfilter: nf_queue: drop packets with cloned unconfirmed conntracks") Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-05net/sched: don't use dynamic lockdep keys with clsact/ingress/noqueueDavide Caratti-0/+24
Currently we are registering one dynamic lockdep key for each allocated qdisc, to avoid false deadlock reports when mirred (or TC eBPF) redirects packets to another device while the root lock is acquired [1]. Since dynamic keys are a limited resource, we can save them at least for qdiscs that are not meant to acquire the root lock in the traffic path, or to carry traffic at all, like: - clsact - ingress - noqueue Don't register dynamic keys for the above schedulers, so that we hit MAX_LOCKDEP_KEYS later in our tests. [1] https://github.com/multipath-tcp/mptcp_net-next/issues/451 Changes in v2: - change ordering of spin_lock_init() vs. lockdep_register_key() (Jakub Kicinski) Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://patch.msgid.link/94448f7fa7c4f52d2ce416a4895ec87d456d7417.1770220576.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05tcp: move __reqsk_free() out of lineEric Dumazet-8/+1
Inlining __reqsk_free() is overkill, let's reclaim 2 Kbytes of text. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 2/4 grow/shrink: 2/14 up/down: 225/-2338 (-2113) Function old new delta __reqsk_free - 114 +114 sock_edemux 18 82 +64 inet_csk_listen_start 233 264 +31 __pfx___reqsk_free - 16 +16 __pfx_reqsk_queue_alloc 16 - -16 __pfx_reqsk_free 16 - -16 reqsk_queue_alloc 46 - -46 tcp_req_err 272 177 -95 reqsk_fastopen_remove 348 253 -95 cookie_bpf_check 157 62 -95 cookie_tcp_reqsk_alloc 387 290 -97 cookie_v4_check 1568 1465 -103 reqsk_free 105 - -105 cookie_v6_check 1519 1412 -107 sock_gen_put 187 78 -109 sock_pfree 212 82 -130 tcp_try_fastopen 1818 1683 -135 tcp_v4_rcv 3478 3294 -184 reqsk_put 306 90 -216 tcp_get_cookie_sock 551 318 -233 tcp_v6_rcv 3404 3141 -263 tcp_conn_request 2677 2384 -293 Total: Before=24887415, After=24885302, chg -0.01% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204055147.1682705-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05inet: move reqsk_queue_alloc() to net/ipv4/inet_connection_sock.cEric Dumazet-2/+0
Only called once from inet_csk_listen_start(), it can be static. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204055147.1682705-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05flow_offload: add const qualifiers to function argumentsDavid Yang-2/+2
Some functions do not modify the pointed-to data, but lack const qualifiers. Add const qualifiers to the arguments of flow_rule_match_has_control_flags() and flow_cls_offload_flow_rule(). Signed-off-by: David Yang <mmyangfl@gmail.com> Link: https://patch.msgid.link/20260204052839.198602-1-mmyangfl@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-05can: add CAN skb extension infrastructureOliver Hartkopp-0/+28
To remove the private CAN bus skb headroom infrastructure 8 bytes need to be stored in the skb. The skb extensions are a common pattern and an easy and efficient way to hold private data travelling along with the skb. We only need the skb_ext_add() and skb_ext_find() functions to allocate and access CAN specific content as the skb helpers to copy/clone/free skbs automatically take care of skb extensions and their final removal. This patch introduces the complete CAN skb extensions infrastructure: - add struct can_skb_ext in new file include/net/can.h - add include/net/can.h in MAINTAINERS - add SKB_EXT_CAN to skbuff.c and skbuff.h - select SKB_EXTENSIONS in Kconfig when CONFIG_CAN is enabled - check for existing CAN skb extensions in can_rcv() in af_can.c - add CAN skb extensions allocation at every skb_alloc() location - duplicate the skb extensions if cloning outgoing skbs (framelen/gw_hops) - introduce can_skb_ext_add() and can_skb_ext_find() helpers The patch also corrects an indention issue in the original code from 2018: Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202602010426.PnGrYAk3-lkp@intel.com/ Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20260201-can_skb_ext-v8-2-3635d790fe8b@hartkopp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-04net/iucv: clean up iucv kernel-doc warningsRandy Dunlap-204/+3
Fix numerous (many) kernel-doc warnings in iucv.[ch]: - convert function documentation comments to a common (kernel-doc) look, even for static functions (without "/**") - use matching parameter and parameter description names - use better wording in function descriptions (Jakub & AI) - remove duplicate kernel-doc comments from the header file (Jakub) Examples: Warning: include/net/iucv/iucv.h:210 missing initial short description on line: * iucv_unregister Warning: include/net/iucv/iucv.h:216 function parameter 'handle' not described in 'iucv_unregister' Warning: include/net/iucv/iucv.h:467 function parameter 'answer' not described in 'iucv_message_send2way' Warning: net/iucv/iucv.c:727 missing initial short description on line: * iucv_cleanup_queue Build-tested with both "make htmldocs" and "make ARCH=s390 defconfig all". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20260203075248.1177869-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-04tcp: split tcp_check_space() in two partsEric Dumazet-1/+9
tcp_check_space() is fat and not inlined. Move its slow path in (out of line) __tcp_check_space() and make tcp_check_space() an inline function for better TCP performance. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 2/2 grow/shrink: 4/0 up/down: 708/-582 (126) Function old new delta __tcp_check_space - 521 +521 tcp_rcv_established 1860 1916 +56 tcp_rcv_state_process 3342 3384 +42 tcp_event_new_data_sent 248 286 +38 tcp_data_snd_check 71 106 +35 __pfx___tcp_check_space - 16 +16 __pfx_tcp_check_space 16 - -16 tcp_check_space 566 - -566 Total: Before=24896373, After=24896499, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260203050932.3522221-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-04Merge tag 'wireless-next-2026-02-04' of ↵Jakub Kicinski-5/+120
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Some more changes, including pulls from drivers: - ath drivers: small features/cleanups - rtw drivers: mostly refactoring for rtw89 RTL8922DE support - mac80211: use hrtimers for CAC to avoid too long delays - cfg80211/mac80211: some initial UHR (Wi-Fi 8) support * tag 'wireless-next-2026-02-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (59 commits) wifi: brcmsmac: phy: Remove unreachable error handling code wifi: mac80211: Add eMLSR/eMLMR action frame parsing support wifi: mac80211: add initial UHR support wifi: cfg80211: add initial UHR support wifi: ieee80211: add some initial UHR definitions wifi: mac80211: use wiphy_hrtimer_work for CAC timeout wifi: mac80211: correct ieee80211-{s1g/eht}.h include guard comments wifi: ath12k: clear stale link mapping of ahvif->links_map wifi: ath12k: Add support TX hardware queue stats wifi: ath12k: Add support RX PDEV stats wifi: ath12k: Fix index decrement when array_len is zero wifi: ath12k: support OBSS PD configuration for AP mode wifi: ath12k: add WMI support for spatial reuse parameter configuration dt-bindings: net: wireless: ath11k-pci: deprecate 'firmware-name' property wifi: ath11k: add usecase firmware handling based on device compatible wifi: ath10k: sdio: add missing lock protection in ath10k_sdio_fw_crashed_dump() wifi: ath10k: fix lock protection in ath10k_wmi_event_peer_sta_ps_state_chg() wifi: ath10k: snoc: support powering on the device via pwrseq wifi: rtw89: pci: warn if SPS OCP happens for RTL8922DE wifi: rtw89: pci: restore LDO setting after device resume ... ==================== Link: https://patch.msgid.link/20260204121143.181112-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-03tcp: accecn: add tcpi_ecn_mode and tcpi_option2 in tcp_infoChia-Yu Chang-11/+0
Add 2-bit tcpi_ecn_mode feild within tcp_info to indicate which ECN mode is negotiated: ECN_MODE_DISABLED, ECN_MODE_RFC3168, ECN_MODE_ACCECN, or ECN_MODE_PENDING. This is done by utilizing available bits from tcpi_accecn_opt_seen (reduced from 16 bits to 2 bits) and tcpi_accecn_fail_mode (reduced from 16 bits to 4 bits). Also, an extra 24-bit tcpi_options2 field is identified to represent newer options and connection features, as all 8 bits of tcpi_options field have been used. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Co-developed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-14-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSISTChia-Yu Chang-0/+2
Detect spurious retransmission of a previously sent ACK carrying the AccECN option after the second retransmission. Since this might be caused by the middlebox dropping ACK with options it does not recognize, disable the sending of the AccECN option in all subsequent ACKs. This patch follows Section 3.2.3.2.2 of AccECN spec (RFC9768), and a new field (accecn_opt_sent_w_dsack) is added to indicate that an AccECN option was sent with duplicate SACK info. Also, a new AccECN option sending mode is added to tcp_ecn_option sysctl: (TCP_ECN_OPTION_PERSIST), which ignores the AccECN fallback policy and persistently sends AccECN option once it fits into TCP option space. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-13-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: accecn: fallback outgoing half link to non-AccECNChia-Yu Chang-1/+3
According to Section 3.2.2.1 of AccECN spec (RFC9768), if the Server is in AccECN mode and in SYN-RCVD state, and if it receives a value of zero on a pure ACK with SYN=0 and no SACK blocks, for the rest of the connection the Server MUST NOT set ECT on outgoing packets and MUST NOT respond to AccECN feedback. Nonetheless, as a Data Receiver it MUST NOT disable AccECN feedback. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-12-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACKChia-Yu Chang-5/+15
For Accurate ECN, the first SYN/ACK sent by the TCP server shall set the ACE flag (Table 1 of RFC9768) and the AccECN option to complete the capability negotiation. However, if the TCP server needs to retransmit such a SYN/ACK (for example, because it did not receive an ACK acknowledging its SYN/ACK, or received a second SYN requesting AccECN support), the TCP server retransmits the SYN/ACK without the AccECN option. This is because the SYN/ACK may be lost due to congestion, or a middlebox may block the AccECN option. Furthermore, if this retransmission also times out, to expedite connection establishment, the TCP server should retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and without the AccECN option, while maintaining AccECN feedback mode. This complies with Section 3.2.3.2.2 of the AccECN spec RFC9768. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-10-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: add TCP_SYNACK_RETRANS synack_typeChia-Yu Chang-0/+1
Before this patch, retransmitted SYN/ACK did not have a specific synack_type; however, the upcoming patch needs to distinguish between retransmitted and non-retransmitted SYN/ACK for AccECN negotiation to transmit the fallback SYN/ACK during AccECN negotiation. Therefore, this patch introduces a new synack_type (TCP_SYNACK_RETRANS). Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-9-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: accecn: handle unexpected AccECN negotiation feedbackChia-Yu Chang-13/+31
According to Sections 3.1.2 and 3.1.3 of AccECN spec (RFC9768). In Section 3.1.2, it says an AccECN implementation has no need to recognize or support the Server response labelled 'Nonce' or ECN-nonce feedback more generally, as RFC 3540 has been reclassified as Historic. AccECN is compatible with alternative ECN feedback integrity approaches to the nonce. The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is reserved for future use. A TCP Client (A) that receives such a SYN/ACK follows the procedure for forward compatibility given in Section 3.1.3. Then in Section 3.1.3, it says if a TCP Client has sent a SYN requesting AccECN feedback with (AE,CWR,ECE) = (1,1,1) then receives a SYN/ACK with the currently reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have logic specific to such a combination, the Client MUST enable AccECN mode as if the SYN/ACK onfirmed that the Server supported AccECN and as if it fed back that the IP-ECN field on the SYN had arrived unchanged. Fixes: 3cae34274c79 ("tcp: accecn: AccECN negotiation"). Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-7-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: disable RFC3168 fallback identifier for CC modulesChia-Yu Chang-4/+19
When AccECN is not successfully negociated for a TCP flow, it defaults fallback to classic ECN (RFC3168). However, L4S service will fallback to non-ECN. This patch enables congestion control module to control whether it should not fallback to classic ECN after unsuccessful AccECN negotiation. A new CA module flag (TCP_CONG_NO_FALLBACK_RFC3168) identifies this behavior expected by the CA. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-6-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiersChia-Yu Chang-7/+47
Two flags for congestion control (CC) module are added in this patch related to AccECN negotiation. First, a new flag (TCP_CONG_NEEDS_ACCECN) defines that the CC expects to negotiate AccECN functionality using the ECE, CWR and AE flags in the TCP header. Second, during ECN negotiation, ECT(0) in the IP header is used. This patch enables CC to control whether ECT(0) or ECT(1) should be used on a per-segment basis. A new flag (TCP_CONG_ECT_1_NEGOTIATION) defines the expected ECT value in the IP header by the CA when not-yet initialized for the connection. The detailed AccECN negotiaotn can be found in IETF RFC9768. Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com> Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com> Signed-off-by: Ilpo Järvinen <ij@kernel.org> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-5-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-02tcp: export tcp_splice_stateGeliang Tang-0/+11
Export struct tcp_splice_state and tcp_splice_data_recv() in net/tcp.h so that they can be used by MPTCP in the next patch. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-3-31332ba70d7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ipv6: colocate inet6_cork in inet_cork_fullEric Dumazet-4/+12
All inet6_cork users also use one inet_cork_full. Reduce number of parameters and increase data locality. This saves ~275 bytes of code on x86_64. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02inet: add dst4_mtu() and dst6_mtu() helpersEric Dumazet-0/+12
With CONFIG_MITIGATION_RETPOLINE=y dst_mtu() is a bit fat, because it is generic. Indeed, clang does not always inline it. Add dst4_mtu() and dst6_mtu() helpers for callers that expect either ipv4_mtu() or ip6_mtu() to be called. These helpers are always inlined. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ipv6: pass proto by value to ipv6_push_nfrag_opts() and ipv6_push_frag_opts()Eric Dumazet-5/+5
With CONFIG_STACKPROTECTOR_STRONG=y, it is better to avoid passing a pointer to an automatic variable. Change these exported functions to return 'u8 proto' instead of void. - ipv6_push_nfrag_opts() - ipv6_push_frag_opts() For instance, replace ipv6_push_frag_opts(skb, opt, &proto); with: proto = ipv6_push_frag_opts(skb, opt, proto); Note that even after this change, ip6_xmit() has to use a stack canary because of @first_hop variable. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: l3mdev: use skb_dst_dev_rcu() in l3mdev_l3_out()Eric Dumazet-3/+4
Extend the RCU section a bit so that we can use the safer skb_dst_dev_rcu() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130191906.3781856-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02wifi: mac80211: Add eMLSR/eMLMR action frame parsing supportLorenzo Bianconi-0/+32
Introduce support in AP mode for parsing of the Operating Mode Notification frame sent by the client to enable/disable MLO eMLSR or eMLMR if supported by both the AP and the client. Add drv_set_eml_op_mode mac80211 callback in order to configure underlay driver with eMLSR/eMLMR info. Tested-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260129-mac80211-emlsr-v4-1-14bdadf57380@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>