diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-03-26 21:48:21 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-03-26 21:48:21 -0700 |
| commit | 1a9239bb4253f9076b5b4b2a1a4e8d7defd77a95 (patch) | |
| tree | 286dda5e84757594218e684b94b01b3a3cac15a2 /tools/testing/selftests/bpf/prog_tests/net_timestamping.c | |
| parent | Merge tag 'zstd-linus-v6.15-rc1' of https://github.com/terrelln/linux (diff) | |
| parent | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (diff) | |
| download | linux-1a9239bb4253f9076b5b4b2a1a4e8d7defd77a95.tar.gz linux-1a9239bb4253f9076b5b4b2a1a4e8d7defd77a95.zip | |
Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Continue Netlink conversions to per-namespace RTNL lock
(IPv4 routing, routing rules, routing next hops, ARP ioctls)
- Continue extending the use of netdev instance locks. As a driver
opt-in protect queue operations and (in due course) ethtool
operations with the instance lock and not RTNL lock.
- Support collecting TCP timestamps (data submitted, sent, acked) in
BPF, allowing for transparent (to the application) and lower
overhead tracking of TCP RPC performance.
- Tweak existing networking Rx zero-copy infra to support zero-copy
Rx via io_uring.
- Optimize MPTCP performance in single subflow mode by 29%.
- Enable GRO on packets which went thru XDP CPU redirect (were queued
for processing on a different CPU). Improving TCP stream
performance up to 2x.
- Improve performance of contended connect() by 200% by searching for
an available 4-tuple under RCU rather than a spin lock. Bring an
additional 229% improvement by tweaking hash distribution.
- Avoid unconditionally touching sk_tsflags on RX, improving
performance under UDP flood by as much as 10%.
- Avoid skb_clone() dance in ping_rcv() to improve performance under
ping flood.
- Avoid FIB lookup in netfilter if socket is available, 20% perf win.
- Rework network device creation (in-kernel) API to more clearly
identify network namespaces and their roles. There are up to 4
namespace roles but we used to have just 2 netns pointer arguments,
interpreted differently based on context.
- Use sysfs_break_active_protection() instead of trylock to avoid
deadlocks between unregistering objects and sysfs access.
- Add a new sysctl and sockopt for capping max retransmit timeout in
TCP.
- Support masking port and DSCP in routing rule matches.
- Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
- Support specifying at what time packet should be sent on AF_XDP
sockets.
- Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin
users.
- Add Netlink YAML spec for WiFi (nl80211) and conntrack.
- Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
which only need to be exported when IPv6 support is built as a
module.
- Age FDB entries based on Rx not Tx traffic in VxLAN, similar to
normal bridging.
- Allow users to specify source port range for GENEVE tunnels.
- netconsole: allow attaching kernel release, CPU ID and task name to
messages as metadata
Driver API:
- Continue rework / fixing of Energy Efficient Ethernet (EEE) across
the SW layers. Delegate the responsibilities to phylink where
possible. Improve its handling in phylib.
- Support symmetric OR-XOR RSS hashing algorithm.
- Support tracking and preserving IRQ affinity by NAPI itself.
- Support loopback mode speed selection for interface selftests.
Device drivers:
- Remove the IBM LCS driver for s390
- Remove the sb1000 cable modem driver
- Add support for SFP module access over SMBus
- Add MCTP transport driver for MCTP-over-USB
- Enable XDP metadata support in multiple drivers
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- add PCIe TLP Processing Hints (TPH) support for new AMD
platforms
- support dumping RoCE queue state for debug
- opt into instance locking
- Intel (100G, ice, idpf):
- ice: rework MSI-X IRQ management and distribution
- ice: support for E830 devices
- iavf: add support for Rx timestamping
- iavf: opt into instance locking
- nVidia/Mellanox:
- mlx4: use page pool memory allocator for Rx
- mlx5: support for one PTP device per hardware clock
- mlx5: support for 200Gbps per-lane link modes
- mlx5: move IPSec policy check after decryption
- AMD/Solarflare:
- support FW flashing via devlink
- Cisco (enic):
- use page pool memory allocator for Rx
- enable 32, 64 byte CQEs
- get max rx/tx ring size from the device
- Meta (fbnic):
- support flow steering and RSS configuration
- report queue stats
- support TCP segmentation
- support IRQ coalescing
- support ring size configuration
- Marvell/Cavium:
- support AF_XDP
- Wangxun:
- support for PTP clock and timestamping
- Huawei (hibmcge):
- checksum offload
- add more statistics
- Ethernet virtual:
- VirtIO net:
- aggressively suppress Tx completions, improve perf by 96%
with 1 CPU and 55% with 2 CPUs
- expose NAPI to IRQ mapping and persist NAPI settings
- Google (gve):
- support XDP in DQO RDA Queue Format
- opt into instance locking
- Microsoft vNIC:
- support BIG TCP
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- cleanup Tx and Tx clock setting and other link-focused
cleanups
- enable SGMII and 2500BASEX mode switching for Intel platforms
- support Sophgo SG2044
- Broadcom switches (b53):
- support for BCM53101
- TI:
- iep: add perout configuration support
- icssg: support XDP
- Cadence (macb):
- implement BQL
- Xilinx (axinet):
- support dynamic IRQ moderation and changing coalescing at
runtime
- implement BQL
- report standard stats
- MediaTek:
- support phylink managed EEE
- Intel:
- igc: don't restart the interface on every XDP program change
- RealTek (r8169):
- support reading registers of internal PHYs directly
- increase max jumbo packet size on RTL8125/RTL8126
- Airoha:
- support for RISC-V NPU packet processing unit
- enable scatter-gather and support MTU up to 9kB
- Tehuti (tn40xx):
- support cards with TN4010 MAC and an Aquantia AQR105 PHY
- Ethernet PHYs:
- support for TJA1102S, TJA1121
- dp83tg720: add randomized polling intervals for link detection
- dp83822: support changing the transmit amplitude voltage
- support for LEDs on 88q2xxx
- CAN:
- canxl: support Remote Request Substitution bit access
- flexcan: add S32G2/S32G3 SoC
- WiFi:
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
- batman-adv: add support for jumbo frames
- WiFi drivers:
- RealTek (rtw88):
- support RTL8814AE and RTL8814AU
- RealTek (rtw89):
- switch using wiphy_lock and wiphy_work
- add BB context to manipulate two PHY as preparation of MLO
- improve BT-coexistence mechanism to play A2DP smoothly
- Intel (iwlwifi):
- add new iwlmld sub-driver for latest HW/FW combinations
- MediaTek (mt76):
- preparation for mt7996 Multi-Link Operation (MLO) support
- Qualcomm/Atheros (ath12k):
- continued work on MLO
- Silabs (wfx):
- Wake-on-WLAN support
- Bluetooth:
- add support for skb TX SND/COMPLETION timestamping
- hci_core: enable buffer flow control for SCO/eSCO
- coredump: log devcd dumps into the monitor
- Bluetooth drivers:
- intel: add support to configure TX power
- nxp: handle bootloader error during cmd5 and cmd7"
* tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits)
unix: fix up for "apparmor: add fine grained af_unix mediation"
mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
net: usb: asix: ax88772: Increase phy_name size
net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string
net: libwx: fix Tx L4 checksum
net: libwx: fix Tx descriptor content for some tunnel packets
atm: Fix NULL pointer dereference
net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card
net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
net: phy: aquantia: add essential functions to aqr105 driver
net: phy: aquantia: search for firmware-name in fwnode
net: phy: aquantia: add probe function to aqr105 for firmware loading
net: phy: Add swnode support to mdiobus_scan
gve: add XDP DROP and PASS support for DQ
gve: update XDP allocation path support RX buffer posting
gve: merge packet buffer size fields
gve: update GQ RX to use buf_size
gve: introduce config-based allocation for XDP
gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics
...
Diffstat (limited to 'tools/testing/selftests/bpf/prog_tests/net_timestamping.c')
| -rw-r--r-- | tools/testing/selftests/bpf/prog_tests/net_timestamping.c | 239 |
1 files changed, 239 insertions, 0 deletions
diff --git a/tools/testing/selftests/bpf/prog_tests/net_timestamping.c b/tools/testing/selftests/bpf/prog_tests/net_timestamping.c new file mode 100644 index 000000000000..dbfd87499b6b --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/net_timestamping.c @@ -0,0 +1,239 @@ +#include <linux/net_tstamp.h> +#include <sys/time.h> +#include <linux/errqueue.h> +#include "test_progs.h" +#include "network_helpers.h" +#include "net_timestamping.skel.h" + +#define CG_NAME "/net-timestamping-test" +#define NSEC_PER_SEC 1000000000LL + +static const char addr4_str[] = "127.0.0.1"; +static const char addr6_str[] = "::1"; +static struct net_timestamping *skel; +static const int cfg_payload_len = 30; +static struct timespec usr_ts; +static u64 delay_tolerance_nsec = 10000000000; /* 10 seconds */ +int SK_TS_SCHED; +int SK_TS_TXSW; +int SK_TS_ACK; + +static int64_t timespec_to_ns64(struct timespec *ts) +{ + return ts->tv_sec * NSEC_PER_SEC + ts->tv_nsec; +} + +static void validate_key(int tskey, int tstype) +{ + static int expected_tskey = -1; + + if (tstype == SCM_TSTAMP_SCHED) + expected_tskey = cfg_payload_len - 1; + + ASSERT_EQ(expected_tskey, tskey, "tskey mismatch"); + + expected_tskey = tskey; +} + +static void validate_timestamp(struct timespec *cur, struct timespec *prev) +{ + int64_t cur_ns, prev_ns; + + cur_ns = timespec_to_ns64(cur); + prev_ns = timespec_to_ns64(prev); + + ASSERT_LT(cur_ns - prev_ns, delay_tolerance_nsec, "latency"); +} + +static void test_socket_timestamp(struct scm_timestamping *tss, int tstype, + int tskey) +{ + static struct timespec prev_ts; + + validate_key(tskey, tstype); + + switch (tstype) { + case SCM_TSTAMP_SCHED: + validate_timestamp(&tss->ts[0], &usr_ts); + SK_TS_SCHED += 1; + break; + case SCM_TSTAMP_SND: + validate_timestamp(&tss->ts[0], &prev_ts); + SK_TS_TXSW += 1; + break; + case SCM_TSTAMP_ACK: + validate_timestamp(&tss->ts[0], &prev_ts); + SK_TS_ACK += 1; + break; + } + + prev_ts = tss->ts[0]; +} + +static void test_recv_errmsg_cmsg(struct msghdr *msg) +{ + struct sock_extended_err *serr = NULL; + struct scm_timestamping *tss = NULL; + struct cmsghdr *cm; + + for (cm = CMSG_FIRSTHDR(msg); + cm && cm->cmsg_len; + cm = CMSG_NXTHDR(msg, cm)) { + if (cm->cmsg_level == SOL_SOCKET && + cm->cmsg_type == SCM_TIMESTAMPING) { + tss = (void *)CMSG_DATA(cm); + } else if ((cm->cmsg_level == SOL_IP && + cm->cmsg_type == IP_RECVERR) || + (cm->cmsg_level == SOL_IPV6 && + cm->cmsg_type == IPV6_RECVERR) || + (cm->cmsg_level == SOL_PACKET && + cm->cmsg_type == PACKET_TX_TIMESTAMP)) { + serr = (void *)CMSG_DATA(cm); + ASSERT_EQ(serr->ee_origin, SO_EE_ORIGIN_TIMESTAMPING, + "cmsg type"); + } + + if (serr && tss) + test_socket_timestamp(tss, serr->ee_info, + serr->ee_data); + } +} + +static bool socket_recv_errmsg(int fd) +{ + static char ctrl[1024 /* overprovision*/]; + char data[cfg_payload_len]; + static struct msghdr msg; + struct iovec entry; + int n = 0; + + memset(&msg, 0, sizeof(msg)); + memset(&entry, 0, sizeof(entry)); + memset(ctrl, 0, sizeof(ctrl)); + + entry.iov_base = data; + entry.iov_len = cfg_payload_len; + msg.msg_iov = &entry; + msg.msg_iovlen = 1; + msg.msg_name = NULL; + msg.msg_namelen = 0; + msg.msg_control = ctrl; + msg.msg_controllen = sizeof(ctrl); + + n = recvmsg(fd, &msg, MSG_ERRQUEUE); + if (n == -1) + ASSERT_EQ(errno, EAGAIN, "recvmsg MSG_ERRQUEUE"); + + if (n >= 0) + test_recv_errmsg_cmsg(&msg); + + return n == -1; +} + +static void test_socket_timestamping(int fd) +{ + while (!socket_recv_errmsg(fd)); + + ASSERT_EQ(SK_TS_SCHED, 1, "SCM_TSTAMP_SCHED"); + ASSERT_EQ(SK_TS_TXSW, 1, "SCM_TSTAMP_SND"); + ASSERT_EQ(SK_TS_ACK, 1, "SCM_TSTAMP_ACK"); + + SK_TS_SCHED = 0; + SK_TS_TXSW = 0; + SK_TS_ACK = 0; +} + +static void test_tcp(int family, bool enable_socket_timestamping) +{ + struct net_timestamping__bss *bss; + char buf[cfg_payload_len]; + int sfd = -1, cfd = -1; + unsigned int sock_opt; + struct netns_obj *ns; + int cg_fd; + int ret; + + cg_fd = test__join_cgroup(CG_NAME); + if (!ASSERT_OK_FD(cg_fd, "join cgroup")) + return; + + ns = netns_new("net_timestamping_ns", true); + if (!ASSERT_OK_PTR(ns, "create ns")) + goto out; + + skel = net_timestamping__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open and load skel")) + goto out; + + if (!ASSERT_OK(net_timestamping__attach(skel), "attach skel")) + goto out; + + skel->links.skops_sockopt = + bpf_program__attach_cgroup(skel->progs.skops_sockopt, cg_fd); + if (!ASSERT_OK_PTR(skel->links.skops_sockopt, "attach cgroup")) + goto out; + + bss = skel->bss; + memset(bss, 0, sizeof(*bss)); + + skel->bss->monitored_pid = getpid(); + + sfd = start_server(family, SOCK_STREAM, + family == AF_INET6 ? addr6_str : addr4_str, 0, 0); + if (!ASSERT_OK_FD(sfd, "start_server")) + goto out; + + cfd = connect_to_fd(sfd, 0); + if (!ASSERT_OK_FD(cfd, "connect_to_fd_server")) + goto out; + + if (enable_socket_timestamping) { + sock_opt = SOF_TIMESTAMPING_SOFTWARE | + SOF_TIMESTAMPING_OPT_ID | + SOF_TIMESTAMPING_TX_SCHED | + SOF_TIMESTAMPING_TX_SOFTWARE | + SOF_TIMESTAMPING_TX_ACK; + ret = setsockopt(cfd, SOL_SOCKET, SO_TIMESTAMPING, + (char *) &sock_opt, sizeof(sock_opt)); + if (!ASSERT_OK(ret, "setsockopt SO_TIMESTAMPING")) + goto out; + + ret = clock_gettime(CLOCK_REALTIME, &usr_ts); + if (!ASSERT_OK(ret, "get user time")) + goto out; + } + + ret = write(cfd, buf, sizeof(buf)); + if (!ASSERT_EQ(ret, sizeof(buf), "send to server")) + goto out; + + if (enable_socket_timestamping) + test_socket_timestamping(cfd); + + ASSERT_EQ(bss->nr_active, 1, "nr_active"); + ASSERT_EQ(bss->nr_snd, 2, "nr_snd"); + ASSERT_EQ(bss->nr_sched, 1, "nr_sched"); + ASSERT_EQ(bss->nr_txsw, 1, "nr_txsw"); + ASSERT_EQ(bss->nr_ack, 1, "nr_ack"); + +out: + if (sfd >= 0) + close(sfd); + if (cfd >= 0) + close(cfd); + net_timestamping__destroy(skel); + netns_free(ns); + close(cg_fd); +} + +void test_net_timestamping(void) +{ + if (test__start_subtest("INET4: bpf timestamping")) + test_tcp(AF_INET, false); + if (test__start_subtest("INET4: bpf and socket timestamping")) + test_tcp(AF_INET, true); + if (test__start_subtest("INET6: bpf timestamping")) + test_tcp(AF_INET6, false); + if (test__start_subtest("INET6: bpf and socket timestamping")) + test_tcp(AF_INET6, true); +} |
