linux - Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

Age	Commit message (Collapse)	Author	Files	Lines
2022-09-08	fs: only do a memory barrier for the first set_buffer_uptodate()	Linus Torvalds	1	-0/+11
	Commit d4252071b97d ("add barriers to buffer_uptodate and set_buffer_uptodate") added proper memory barriers to the buffer head BH_Uptodate bit, so that anybody who tests a buffer for being up-to-date will be guaranteed to actually see initialized state. However, that commit didn't _just_ add the memory barrier, it also ended up dropping the "was it already set" logic that the BUFFER_FNS() macro had. That's conceptually the right thing for a generic "this is a memory barrier" operation, but in the case of the buffer contents, we really only care about the memory barrier for the _first_ time we set the bit, in that the only memory ordering protection we need is to avoid anybody seeing uninitialized memory contents. Any other access ordering wouldn't be about the BH_Uptodate bit anyway, and would require some other proper lock (typically BH_Lock or the folio lock). A reader that races with somebody invalidating the buffer head isn't an issue wrt the memory ordering, it's a serialization issue. Now, you'd think that the buffer head operations don't matter in this day and age (and I certainly thought so), but apparently some loads still end up being heavy users of buffer heads. In particular, the kernel test robot reported that not having this bit access optimization in place caused a noticeable direct IO performance regression on ext4: fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec -26.5% regression although you presumably need a fast disk and a lot of cores to actually notice. Link: https://lore.kernel.org/all/Yw8L7HTZ%2FdE2%2Fo9C@xsang-OptiPlex-9020/ Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Fengwei Yin <fengwei.yin@intel.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-09-08	r8169: merge support for chip versions 10, 13, 16	Heiner Kallweit	3	-13/+4
	These chip versions are closely related and all of them have no chip-specific MAC/PHY initialization. Therefore merge support for the three chip versions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/469d27e0-1d06-9b15-6c96-6098b3a52e35@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-08	sch_sfb: Also store skb len before calling child enqueue	Toke Høiland-Jørgensen	1	-1/+2
	Cong Wang noticed that the previous fix for sch_sfb accessing the queued skb after enqueueing it to a child qdisc was incomplete: the SFB enqueue function was also calling qdisc_qstats_backlog_inc() after enqueue, which reads the pkt len from the skb cb field. Fix this by also storing the skb len, and using the stored value to increment the backlog after enqueueing. Fixes: 9efd23297cca ("sch_sfb: Don't assume the skb is still around after enqueueing to child") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Acked-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20220905192137.965549-1-toke@toke.dk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-08	net: phy: lan87xx: change interrupt src of link_up to comm_ready	Arun Ramadoss	1	-4/+54
	Currently phy link up/down interrupt is enabled using the LAN87xx_INTERRUPT_MASK register. In the lan87xx_read_status function, phy link is determined using the T1_MODE_STAT_REG register comm_ready bit. comm_ready bit is set using the loc_rcvr_status & rem_rcvr_status. Whenever the phy link is up, LAN87xx_INTERRUPT_SOURCE link_up bit is set first but comm_ready bit takes some time to set based on local and remote receiver status. As per the current implementation, interrupt is triggered using link_up but the comm_ready bit is still cleared in the read_status function. So, link is always down. Initially tested with the shared interrupt mechanism with switch and internal phy which is working, but after implementing interrupt controller it is not working. It can fixed either by updating the read_status function to read from LAN87XX_INTERRUPT_SOURCE register or enable the interrupt mask for comm_ready bit. But the validation team recommends the use of comm_ready for link detection. This patch fixes by enabling the comm_ready bit for link_up in the LAN87XX_INTERRUPT_MASK_2 register (MISC Bank) and link_down in LAN87xx_INTERRUPT_MASK register. Fixes: 8a1b415d70b7 ("net: phy: added ethtool master-slave configuration support") Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220905152750.5079-1-arun.ramadoss@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-08	net: stmmac: Disable automatic FCS/Pad stripping	Kurt Kanzenbach	6	-40/+6
	The stmmac has the possibility to automatically strip the padding/FCS for IEEE 802.3 type frames. This feature is enabled conditionally. Therefore, the stmmac receive path has to have a determination logic whether the FCS has to be stripped in software or not. In fact, for DSA this ACS feature is disabled and the determination logic doesn't check for it properly. For instance, when using DSA in combination with an older stmmac (pre version 4), the FCS is not stripped by hardware or software which is problematic. So either add another check for DSA to the fast path or simply disable ACS feature completely. The latter approach has been chosen, because most of the time the FCS is stripped in software anyway and it removes conditionals from the receive fast path. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/87v8q8jjgh.fsf@kurt/ Link: https://lore.kernel.org/r/20220905130155.193640-1-kurt@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-07	efi: capsule-loader: Fix use-after-free in efi_capsule_write	Hyunwoo Kim	1	-24/+7
	A race condition may occur if the user calls close() on another thread during a write() operation on the device node of the efi capsule. This is a race condition that occurs between the efi_capsule_write() and efi_capsule_flush() functions of efi_capsule_fops, which ultimately results in UAF. So, the page freeing process is modified to be done in efi_capsule_release() instead of efi_capsule_flush(). Cc: <stable@vger.kernel.org> # v4.9+ Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Link: https://lore.kernel.org/all/20220907102920.GA88602@ubuntu/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-09-07	net: hns3: add support to query and set lane number by ethtool	Hao Chen	10	-23/+50
	When serdes lane support setting 25Gb/s or 50Gb/s speed and user wants to set port speed as 50Gb/s, it can be setted as one 50Gb/s serdes lane or two 25Gb/s serdes lanes. So, this patch adds support to query and set lane number by ethtool to satisfy this scenario. Signed-off-by: Hao Chen <chenhao418@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: hns3: add querying fec statistics	Hao Lan	8	-0/+218
	FEC statistics can be used to check the transmission quality of links. This patch implements the get_fec_stats callback of ethtool_ops to support querying FEC statistics by command "ethtool -I --show-fec eth0". Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: hns3: debugfs add dump dscp map info	Guangbin Huang	3	-1/+67
	This patch add dump the map relation for dscp, priority and TC, and the current tc map mode. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: hns3: support ndo_select_queue()	Guangbin Huang	1	-0/+46
	To support tx packets to select queue according to its dscp field after setting dscp and tc map relationship, this patch implements ndo_select_queue() to set skb->priority according to the user's setting dscp and priority map relationship. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: hns3: add support config dscp map to tc	Guangbin Huang	7	-1/+204
	This patch add support config dscp map to tc by implementing ieee_setapp and ieee_delapp of struct dcbnl_rtnl_ops. Driver will convert mapping relationship from dscp-prio to dscp-tc. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/smc: Fix possible access to freed memory in link clear	Yacan Liu	4	-0/+13
	After modifying the QP to the Error state, all RX WR would be completed with WC in IB_WC_WR_FLUSH_ERR status. Current implementation does not wait for it is done, but destroy the QP and free the link group directly. So there is a risk that accessing the freed memory in tasklet context. Here is a crash example: BUG: unable to handle page fault for address: ffffffff8f220860 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD f7300e067 P4D f7300e067 PUD f7300f063 PMD 8c4e45063 PTE 800ffff08c9df060 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G S OE 5.10.0-0607+ #23 Hardware name: Inspur NF5280M4/YZMB-00689-101, BIOS 4.1.20 07/09/2018 RIP: 0010:native_queued_spin_lock_slowpath+0x176/0x1b0 Code: f3 90 48 8b 32 48 85 f6 74 f6 eb d5 c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 c8 02 00 48 03 04 f5 00 09 98 8e <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32 RSP: 0018:ffffb3b6c001ebd8 EFLAGS: 00010086 RAX: ffffffff8f220860 RBX: 0000000000000246 RCX: 0000000000080000 RDX: ffff91db1f86c800 RSI: 000000000000173c RDI: ffff91db62bace00 RBP: ffff91db62bacc00 R08: 0000000000000000 R09: c00000010000028b R10: 0000000000055198 R11: ffffb3b6c001ea58 R12: ffff91db80e05010 R13: 000000000000000a R14: 0000000000000006 R15: 0000000000000040 FS: 0000000000000000(0000) GS:ffff91db1f840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff8f220860 CR3: 00000001f9580004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> _raw_spin_lock_irqsave+0x30/0x40 mlx5_ib_poll_cq+0x4c/0xc50 [mlx5_ib] smc_wr_rx_tasklet_fn+0x56/0xa0 [smc] tasklet_action_common.isra.21+0x66/0x100 __do_softirq+0xd5/0x29c asm_call_irq_on_stack+0x12/0x20 </IRQ> do_softirq_own_stack+0x37/0x40 irq_exit_rcu+0x9d/0xa0 sysvec_call_function_single+0x34/0x80 asm_sysvec_call_function_single+0x12/0x20 Fixes: bd4ad57718cc ("smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR") Signed-off-by: Yacan Liu <liuyacan@corp.netease.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: sysctl: remove unused variable long_max	Liu Shixin	1	-1/+0
	The variable long_max is replaced by bpf_jit_limit_max and no longer be used. So remove it. No functional change. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: ethernet: mtk_eth_soc: remove mtk_foe_entry_timestamp	Lorenzo Bianconi	1	-11/+0
	Get rid of mtk_foe_entry_timestamp routine since it is no longer used. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skb	Lorenzo Bianconi	1	-0/+3
	Even if max hash configured in hw in mtk_ppe_hash_entry is MTK_PPE_ENTRIES - 1, check theoretical OOB accesses in mtk_ppe_check_skb routine Fixes: c4f033d9e03e9 ("net: ethernet: mtk_eth_soc: rework hardware flow table management") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM	Menglong Dong	5	-24/+87
	As Eric reported, the 'reason' field is not presented when trace the kfree_skb event by perf: $ perf record -e skb:kfree_skb -a sleep 10 $ perf script ip_defrag 14605 [021] 221.614303: skb:kfree_skb: skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1 reason: The cause seems to be passing kernel address directly to TP_printk(), which is not right. As the enum 'skb_drop_reason' is not exported to user space through TRACE_DEFINE_ENUM(), perf can't get the drop reason string from the 'reason' field, which is a number. Therefore, we introduce the macro DEFINE_DROP_REASON(), which is used to define the trace enum by TRACE_DEFINE_ENUM(). With the help of DEFINE_DROP_REASON(), now we can remove the auto-generate that we introduced in the commit ec43908dd556 ("net: skb: use auto-generation to convert skb drop reason to string"), and define the string array 'drop_reasons'. Hmmmm...now we come back to the situation that have to maintain drop reasons in both enum skb_drop_reason and DEFINE_DROP_REASON. But they are both in dropreason.h, which makes it easier. After this commit, now the format of kfree_skb is like this: $ cat /tracing/events/skb/kfree_skb/format name: kfree_skb ID: 1524 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:void * skbaddr; offset:8; size:8; signed:0; field:void * location; offset:16; size:8; signed:0; field:unsigned short protocol; offset:24; size:2; signed:0; field:enum skb_drop_reason reason; offset:28; size:4; signed:0; print fmt: "skbaddr=%p protocol=%u location=%p reason: %s", REC->skbaddr, REC->protocol, REC->location, __print_symbolic(REC->reason, { 1, "NOT_SPECIFIED" }, { 2, "NO_SOCKET" } ...... Fixes: ec43908dd556 ("net: skb: use auto-generation to convert skb drop reason to string") Link: https://lore.kernel.org/netdev/CANn89i+bx0ybvE55iMYf5GJM48WwV1HNpdm9Q6t-HaEstqpCSA@mail.gmail.com/ Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear	Lorenzo Bianconi	1	-1/+1
	Set ib1 state to MTK_FOE_STATE_UNBIND in __mtk_foe_entry_clear routine. Fixes: 33fc42de33278 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/mlx5e: Add support to configure more than one macsec offload device	Lior Nahmanson	1	-46/+175
	Add the ability to add up to 16 MACsec offload interfaces over the same physical interface Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/mlx5e: Add MACsec stats support for Rx/Tx flows	Lior Nahmanson	8	-2/+136
	Add the following statistics: RX successfully decrypted MACsec packets: macsec_rx_pkts : Number of packets decrypted successfully macsec_rx_bytes : Number of bytes decrypted successfully Rx dropped MACsec packets: macsec_rx_pkts_drop : Number of MACsec packets dropped macsec_rx_bytes_drop : Number of MACsec bytes dropped TX successfully encrypted MACsec packets: macsec_tx_pkts : Number of packets encrypted/authenticated successfully macsec_tx_bytes : Number of bytes encrypted/authenticated successfully Tx dropped MACsec packets: macsec_tx_pkts_drop : Number of MACsec packets dropped macsec_tx_bytes_drop : Number of MACsec bytes dropped The above can be seen using: ethtool -S <ifc> \|grep macsec Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/mlx5e: Add MACsec offload SecY support	Lior Nahmanson	1	-0/+229
	Add offload support for MACsec SecY callbacks - add/update/delete. add_secy is called when need to create a new MACsec interface. upd_secy is called when source MAC address or tx SC was changed. del_secy is called when need to destroy the MACsec interface. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/mlx5e: Implement MACsec Rx data path using MACsec skb_metadata_dst	Lior Nahmanson	4	-3/+68
	MACsec driver need to distinguish to which offload device the MACsec is target to, in order to handle them correctly. This can be done by attaching a metadata_dst to a SKB with a SCI, when there is a match on MACsec rule. To achieve that, there is a map between fs_id to SCI, so for each RX SC, there is a unique fs_id allocated when creating RX SC. fs_id passed to device driver as metadata for packets that passed Rx MACsec offload to aid the driver to retrieve the matching SCI. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/mlx5e: Add MACsec RX steering rules	Lior Nahmanson	3	-95/+823
	Rx flow steering consists of two flow tables (FTs). The first FT (crypto table) have one default miss rule so non MACsec offloaded packets bypass the MACSec tables. All others flow table entries (FTEs) are divided to two equal groups size, both of them are for MACsec packets: The first group is for MACsec packets which contains SCI field in the SecTAG header. The second group is for MACsec packets which doesn't contain SCI, where need to match on the source MAC address (only if the SCI is built from default MACsec port). Destination MAC address, ethertype and some of SecTAG fields are also matched for both groups. In case of match, invoke decrypt action on the packet. For each MACsec Rx offloaded SA two rules are created: one with SCI and one without SCI. The second FT (check table) has two fixed rules: One rule is for verifying that the previous offload actions were finished successfully. In this case, need to decap the SecTAG header and forward the packet for further processing. Another default rule for dropping packets that failed in the previous decrypt actions. The MACsec FTs are created on demand when the first MACsec rule is added and destroyed when the last MACsec rule is deleted. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/mlx5: Add MACsec Rx tables support to fs_core	Lior Nahmanson	3	-2/+13
	Add new namespace for MACsec RX flows. Encrypted MACsec packets should be first decrypted and stripped from MACsec header and then continues with the kernel's steering pipeline. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/mlx5e: Add MACsec offload Rx command support	Lior Nahmanson	1	-6/+377
	Add a support for Connect-X MACsec offload Rx SA & SC commands: add, update and delete. SCs are created on demend and aren't limited by number and unique by SCI. Each Rx SA must be associated with Rx SC according to SCI. Follow-up patches will implement the Rx steering. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/mlx5e: Implement MACsec Tx data path using MACsec skb_metadata_dst	Lior Nahmanson	6	-8/+119
	MACsec driver marks Tx packets for device offload using a dedicated skb_metadata_dst which holds a 64 bits SCI number. A previously set rule will match on this number so the correct SA is used for the MACsec operation. As device driver can only provide 32 bits of metadata to flow tables, need to used a mapping from 64 bit to 32 bits marker or id, which is can be achieved by provide a 32 bit unique flow id in the control path, and used a hash table to map 64 bit to the unique id in the datapath. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/mlx5e: Add MACsec TX steering rules	Lior Nahmanson	5	-15/+770
	Tx flow steering consists of two flow tables (FTs). The first FT (crypto table) has two fixed rules: One default miss rule so non MACsec offloaded packets bypass the MACSec tables, another rule to make sure that MACsec key exchange (MKE) traffic passes unencrypted as expected (matched of ethertype). On each new MACsec offload flow, a new MACsec rule is added. This rule is matched on metadata_reg_a (which contains the id of the flow) and invokes the MACsec offload action on match. The second FT (check table) has two fixed rules: One rule for verifying that the previous offload actions were finished successfully and packet need to be transmitted. Another default rule for dropping packets that were failed in the offload actions. The MACsec FTs should be created on demand when the first MACsec rule is added and destroyed when the last MACsec rule is deleted. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/mlx5: Add MACsec Tx tables support to fs_core	Lior Nahmanson	4	-7/+19
	Changed EGRESS_KERNEL namespace to EGRESS_IPSEC and add new namespace for MACsec TX. This namespace should be the last namespace for transmitted packets. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/mlx5: Add MACsec offload Tx command support	Lior Nahmanson	9	-0/+440
	This patch adds support for Connect-X MACsec offload Tx SA commands: add, update and delete. In Connect-X MACsec, a Security Association (SA) is added or deleted via allocating a HW context of an encryption/decryption key and a HW context of a matching SA (MACsec object). When new SA is added: - Use a separate crypto key HW context. - Create a separate MACsec context in HW to include the SA properties. Introduce a new compilation flag MLX5_EN_MACSEC for it. Follow-up patches will implement the Tx steering. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/mlx5: Introduce MACsec Connect-X offload hardware bits and structures	Lior Nahmanson	2	-2/+101
	Add MACsec offload related IFC structs, layouts and enumerations. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/mlx5: Generalize Flow Context for new crypto fields	Lior Nahmanson	4	-11/+20
	In order to support MACsec offload (and maybe some other crypto features in the future), generalize flow action parameters / defines to be used by crypto offlaods other than IPsec. The following changes made: ipsec_obj_id field at flow action context was changed to crypto_obj_id, intreduced a new crypto_type field where IPsec is the default zero type for backward compatibility. Action ipsec_decrypt was changed to crypto_decrypt. Action ipsec_encrypt was changed to crypto_encrypt. IPsec offload code was updated accordingly for backward compatibility. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/mlx5: Removed esp_id from struct mlx5_flow_act	Lior Nahmanson	1	-1/+0
	esp_id is no longer in used Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/macsec: Move some code for sharing with various drivers that implements ↵	Lior Nahmanson	2	-27/+27
	offload Move some MACsec infrastructure like defines and functions, in order to avoid code duplication for future drivers which implements MACsec offload. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ben Ben-Ishay <benishay@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/macsec: Add MACsec skb_metadata_dst Rx Data path support	Lior Nahmanson	1	-0/+6
	Like in the Tx changes, if there are more than one MACsec device with the same MAC address as in the packet's destination MAC, the packet will be forward only to this device and not neccessarly to the desired one. Offloading device drivers will mark offloaded MACsec SKBs with the corresponding SCI in the skb_metadata_dst so the macsec rx handler will know to which port to divert those skbs, instead of wrongly solely relaying on dst MAC address comparison. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net/macsec: Add MACsec skb_metadata_dst Tx Data path support	Lior Nahmanson	3	-0/+29
	In the current MACsec offload implementation, MACsec interfaces shares the same MAC address by default. Therefore, HW can't distinguish from which MACsec interface the traffic originated from. MACsec stack will use skb_metadata_dst to store the SCI value, which is unique per Macsec interface, skb_metadat_dst will be used by the offloading device driver to associate the SKB with the corresponding offloaded interface (SCI). Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in ↵	Vladimir Oltean	1	-2/+2
	vsc9959_sched_speed_set The read-modify-write of QSYS_TAG_CONFIG from vsc9959_sched_speed_set() runs unlocked with respect to the other functions that access it, which are vsc9959_tas_guard_bands_update(), vsc9959_qos_port_tas_set() and vsc9959_tas_clock_adjust(). All the others are under ocelot->tas_lock, so move the vsc9959_sched_speed_set() access under that lock as well, to resolve the concurrency. Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: dsa: felix: disable cut-through forwarding for frames oversized for ↵	Vladimir Oltean	1	-43/+79
	tc-taprio Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605, frames even way larger than 601 octets are transmitted even though these should be considered as oversized, according to the documentation, and dropped. Since oversized frame dropping depends on frame size, which is only known at the EOF stage, and therefore not at SOF when cut-through forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_* into consideration for traffic classes that are cut-through. Since cut-through forwarding has no UAPI to control it, and the driver enables it based on the mantra "if we can, then why not", the strategy is to alter vsc9959_cut_through_fwd() to take into consideration which tc's have oversize frame dropping enabled, and disable cut-through for them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the cut-through determination process. There are 2 strategies for vsc9959_cut_through_fwd() to determine whether a tc has oversized dropping enabled or not. One is to keep a bit mask of traffic classes per port, and the other is to read back from the hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the feature is enabled). We choose reading back from registers, because struct ocelot_port is shared with drivers (ocelot, seville) that don't support either cut-through nor tc-taprio, and we don't have a felix specific extension of struct ocelot_port. Furthermore, reading registers from the Felix hardware is quite cheap, since they are memory-mapped. Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: dsa: felix: tc-taprio intervals smaller than MTU should send at least ↵	Vladimir Oltean	1	-4/+31
	one packet The blamed commit broke tc-taprio schedules such as this one: tc qdisc replace dev $swp1 root taprio \ num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time 0 \ sched-entry S 0x7f 990000 \ sched-entry S 0x80 10000 \ flags 0x2 because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard band added earlier than its 'gate close' event, such that packet overruns won't occur in the worst case of the largest packet possible. Since guard bands are statically determined based on the per-tc QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU, we need to discuss what happens with TC 7 depending on kernel version, since the driver, prior to commit 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port"), did not touch QSYS_QMAXSDU_CFG_, and therefore relied on QSYS_PORT_MAX_SDU. 1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to 1518, and at gigabit this introduces a static guard band (independent of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit time of 20 octets => 160 ns). But this is larger than the time window itself, of 10000 ns. So, the queue system never considers a frame with TC 7 as eligible for transmission, since the gate practically never opens, and these frames are forever stuck in the TX queues and hang the port. 2 (after vsc9959_tas_guard_bands_update): Under the sole goal of enabling oversized frame dropping, we make an effort to set QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays one more role, which we did not take into account: per-tc static guard band, expressed in L2 byte time (auto-adjusted for FCS and L1 overhead). There is a discrepancy between what the driver thinks (that there is no guard band, and 100% of min_gate_len[tc] is available for egress scheduling) and what the hardware actually does (crops the equivalent of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this means that the hardware thinks it has exactly 0 ns for scheduling tc 7. In both cases, even minimum sized Ethernet frames are stuck on egress rather than being considered for scheduling on TC 7, even if they would fit given a proper configuration. Considering the current situation, with vsc9959_tas_guard_bands_update(), frames between 60 octets and 1230 octets in size are not eligible for oversized dropping (because they are smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible for scheduling either, because the min_gate_len[7] (10000 ns) minus the guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets 8 ns per octet == 9840 ns) minus the guard band auto-added for L1 overhead by QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets) leaves 0 ns for scheduling in the queue system proper. Investigating the hardware behavior, it becomes apparent that the queue system needs precisely 33 ns of 'gate open' time in order to consider a frame as eligible for scheduling to a tc. So the solution to this problem is to amend vsc9959_tas_guard_bands_update(), by giving the per-tc guard bands less space by exactly 33 ns, just enough for one frame to be scheduled in that interval. This allows the queue system to make forward progress for that port-tc, and prevents it from hanging. Fixes: 297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode") Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	netfilter: nft_payload: reject out-of-range attributes via policy	Florian Westphal	1	-3/+3
	Now that nla_policy allows range checks for bigendian data make use of this to reject such attributes. At this time, reject happens later from the init or select_ops callbacks, but its prone to errors. In the future, new attributes can be handled via NLA_POLICY_MAX_BE and exiting ones can be converted one by one. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	netlink: introduce NLA_POLICY_MAX_BE	Florian Westphal	2	-4/+36
	netlink allows to specify allowed ranges for integer types. Unfortunately, nfnetlink passes integers in big endian, so the existing NLA_POLICY_MAX() cannot be used. At the moment, nfnetlink users, such as nf_tables, need to resort to programmatic checking via helpers such as nft_parse_u32_check(). This is both cumbersome and error prone. This adds NLA_POLICY_MAX_BE which adds range check support for BE16, BE32 and BE64 integers. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	sfc: support PTP over Ethernet	Íñigo Huguet	1	-2/+19
	The previous patch add support for PTP over IPv6/UDP (only for 8000 series and newer) and this one add support for PTP over 802.3. Tested: sync as master and as slave is correct with ptp4l. PTP over IPv4 and IPv6 still works fine. Suggested-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	sfc: support PTP over IPv6/UDP	Íñigo Huguet	2	-12/+71
	commit bd4a2697e5e2 ("sfc: use hardware tx timestamps for more than PTP") added support for hardware timestamping on TX for cards of the 8000 series and newer, in an effort to provide support for other transports other than IPv4/UDP. However, timestamping was still not working on RX for these other transports. This patch add support for PTP over IPv6/UDP. Tested: sync as master and as slave is correct using ptp4l from linuxptp package, both with IPv4 and IPv6. Suggested-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	sfc: allow more flexible way of adding filters for PTP	Íñigo Huguet	1	-36/+32
	In preparation for the support of PTP over IPv6/UDP and Ethernet in next patches, allow a more flexible way of adding and removing RX filters for PTP. Right now, only 2 filters are allowed, which are the ones needed for PTP over IPv4/UDP. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: dsa: LAN9303: Add basic support for LAN9354	Jerry Ray	3	-6/+12
	Adding support for the LAN9354 device by allowing it to use the LAN9303 DSA driver. These devices have the same underlying access and control methods and from a feature set point of view the LAN9354 is a superset of the LAN9303. The MDIO access method has been tested on a SAMA5D3-EDS board with a LAN9354 RMII daughter card. While the SPI access method should also be the same, it has not been tested and as such is not included at this time. Signed-off-by: Jerry Ray <jerry.ray@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: dsa: LAN9303: Add early read to sync	Jerry Ray	1	-4/+15
	Add initial BYTE_ORDER read to sync the 32-bit accesses over the 16-bit mdio bus to improve driver robustness. The lan9303 expects two mdio read transactions back-to-back to read a 32-bit register. The first read transaction causes the other half of the 32-bit register to get latched. The subsequent read returns the latched second half of the 32-bit read. The BYTE_ORDER register is an exception to this rule. As it is a constant value, there is no need to latch the second half. We read this register first in case there were reads during the boot loader process that might have occurred prior to this driver taking over ownership of accessing this device. This patch has been tested on the SAMA5D3-EDS with a LAN9303 RMII daughter card. Signed-off-by: Jerry Ray <jerry.ray@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: dsa: microchip: add regmap_range for KSZ9896 chip	Romain Naour	1	-0/+215
	Add register validation for KSZ9896. Signed-off-by: Romain Naour <romain.naour@skf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: dsa: microchip: ksz9477: remove 0x033C and 0x033D addresses from ↵	Romain Naour	1	-1/+2
	regmap_access_tables According to the KSZ9477S datasheet, there is no global register at 0x033C and 0x033D addresses. Signed-off-by: Romain Naour <romain.naour@skf.com> Cc: Oleksij Rempel <o.rempel@pengutronix.de> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: dsa: microchip: add KSZ9896 to KSZ9477 I2C driver	Romain Naour	1	-0/+4
	Add support for the KSZ9896 6-port Gigabit Ethernet Switch to the ksz9477 driver. The KSZ9896 supports both SPI (already in) and I2C. Signed-off-by: Romain Naour <romain.naour@skf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	net: dsa: microchip: add KSZ9896 switch support	Romain Naour	3	-0/+39
	Add support for the KSZ9896 6-port Gigabit Ethernet Switch to the ksz9477 driver. Although the KSZ9896 is already listed in the device tree binding documentation since a1c0ed24fe9b (dt-bindings: net: dsa: document additional Microchip KSZ9477 family switches) the chip id (0x00989600) is not recognized by ksz_switch_detect() and rejected by the driver. The KSZ9896 is similar to KSZ9897 but has only one configurable MII/RMII/RGMII/GMII cpu port. Signed-off-by: Romain Naour <romain.naour@skf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07	efi/x86: libstub: remove unused variable	chen zhang	1	-1/+0
	The variable "has_system_memory" is unused in function ‘adjust_memory_range_protection’, remove it. Signed-off-by: chen zhang <chenzhang@kylinos.cn> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-09-06	afs: Return -EAGAIN, not -EREMOTEIO, when a file already locked	David Howells	1	-0/+1
	When trying to get a file lock on an AFS file, the server may return UAEAGAIN to indicate that the lock is already held. This is currently translated by the default path to -EREMOTEIO. Translate it instead to -EAGAIN so that we know we can retry it. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey E Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/166075761334.3533338.2591992675160918098.stgit@warthog.procyon.org.uk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>