diff options
| author | Jakub Kicinski <kuba@kernel.org> | 2025-06-17 18:50:57 -0700 |
|---|---|---|
| committer | Jakub Kicinski <kuba@kernel.org> | 2025-06-17 18:50:57 -0700 |
| commit | 189bd9c873f0e49f3fb20372407ccda64114d68a (patch) | |
| tree | d3820b13c41be9e9402d8546920cabbc19528852 /drivers/net/ethernet/intel | |
| parent | Merge branch 'net-mlx5e-add-support-for-devmem-and-io_uring-tcp-zero-copy' (diff) | |
| parent | libeth: xdp, xsk: access adjacent u32s as u64 where applicable (diff) | |
| download | linux-189bd9c873f0e49f3fb20372407ccda64114d68a.tar.gz linux-189bd9c873f0e49f3fb20372407ccda64114d68a.zip | |
Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
libeth: add libeth_xdp helper lib
Alexander Lobakin says:
Time to add XDP helpers infra to libeth to greatly simplify adding
XDP to idpf and iavf, as well as improve and extend XDP in ice and
i40e. Any vendor is free to reuse helpers. If this happens, I'm fine
with moving the folder of out intel/.
The helpers greatly simplify building xdp_buff, running a prog,
handling the verdict, implement XDP_TX, .ndo_xdp_xmit, XDP buffer
completion. Same applies to XSk (with XSk xmit instead of
.ndo_xdp_xmit, plus stuff like XSk wakeup).
They are entirely generic with no HW definitions or assumptions.
HW-specific stuff like parsing Rx desc / filling Tx desc is passed
from the driver as inline callbacks.
For now, key assumptions that optimize performance / avoid code
bloat, but might not fit every driver in driver/net/:
* netmem holding the buffers are always order-0;
* driver has separate XDP Tx queues, doesn't use stack queues for
that. For best efficiency, you may want to have nr_cpu_ids XDP
queues, but less (queue sharing) is also supported;
* XDP Tx queues are interrupt-less and use "lazy" cleaning only
when there are less than 1/4 free Tx descriptors of the queue
size;
* main target platforms are 64-bit, although 32-bit is also fully
supported, but the code might be not as optimized for them.
Library code already supports multi-buffer for all kinds of Tx and
both header split and no split for Rx and Tx. Frags can come from
devmem/io_uring etc., direct `struct page *` is used only for header
buffers for which it's always true.
Drivers are free to pass their own Rx hints and XSK xmit hints ops.
XDP_TX and ndo_xdp_xmit use onstack bulk for the frames to be sent
and send them by batches of 16 buffers. This eats ~280 bytes on the
stack, but gives good boosts and allow to greatly optimize the main
sending function leaving it without any error/exception paths.
XSk xmit fills Tx descriptors in the loop unrolled by 8. This was
proven to improve perf on ice and i40e. XDP_TX and ndo_xdp_xmit
doesn't use unrolling as I wasn't able to get any improvements in
those scenenarios from this, while +1 Kb for their sending functions
for nothing doesn't sound reasonable.
XSk wakeup, instead of traditionally used "SW interrupts" provided
by NICs, uses IPI to schedule NAPI on the CPU corresponding to the
given queue pair. It gives better control over CPU distribution and
in general performs way better than "SW interrupts", plus allows us
to not pass any HW-specific callbacks there.
The code is built the way that all callbacks passed from drivers
get inlined; in general, most of hotpath gets inlined. Everything
slow/exception lands to .c files in the libeth folder, doesn't
create copies in the drivers themselves and doesn't overloat
hotpath.
Sure, inlining means that hotpath will be compiled into every driver
that uses the lib, but the core code is written in one place, so no
copying of bugs happens. Fixed once -- works everywhere.
The last commit might look like sorta hack, but it gives really good
boosts and decreases object code size, plus there are checks that
all those wider accesses are fully safe, so I don't feel anything
bad about it.
An example of using libeth_xdp can be found either on my GitHub or
on the mailing lists here ("XDP for idpf"). Macros for building
driver XDP functions lead to that some implementations (XDP_TX,
ndo_xdp_xmit etc.) consist of really only a few lines.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
libeth: xdp, xsk: access adjacent u32s as u64 where applicable
libeth: xsk: add XSkFQ refill and XSk wakeup helpers
libeth: xsk: add XSk Rx processing support
libeth: xsk: add XSk xmit functions
libeth: xsk: add XSk XDP_TX sending helpers
libeth: xdp: add RSS hash hint and XDP features setup helpers
libeth: xdp: add templates for building driver-side callbacks
libeth: xdp: add XDP prog run and verdict result handling
libeth: xdp: add helpers for preparing/processing &libeth_xdp_buff
libeth: xdp: add XDPSQ cleanup timers
libeth: xdp: add XDPSQ locking helpers
libeth: xdp: add XDPSQE completion helpers
libeth: xdp: add .ndo_xdp_xmit() helpers
libeth: xdp: add XDP_TX buffers sending
libeth: support native XDP and register memory model
libeth: convert to netmem
libeth, libie: clean symbol exports up a little
====================
Link: https://patch.msgid.link/20250616201639.710420-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'drivers/net/ethernet/intel')
| -rw-r--r-- | drivers/net/ethernet/intel/iavf/iavf_txrx.c | 14 | ||||
| -rw-r--r-- | drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c | 2 | ||||
| -rw-r--r-- | drivers/net/ethernet/intel/idpf/idpf_txrx.c | 36 | ||||
| -rw-r--r-- | drivers/net/ethernet/intel/libeth/Kconfig | 10 | ||||
| -rw-r--r-- | drivers/net/ethernet/intel/libeth/Makefile | 8 | ||||
| -rw-r--r-- | drivers/net/ethernet/intel/libeth/priv.h | 37 | ||||
| -rw-r--r-- | drivers/net/ethernet/intel/libeth/rx.c | 42 | ||||
| -rw-r--r-- | drivers/net/ethernet/intel/libeth/tx.c | 41 | ||||
| -rw-r--r-- | drivers/net/ethernet/intel/libeth/xdp.c | 451 | ||||
| -rw-r--r-- | drivers/net/ethernet/intel/libeth/xsk.c | 271 | ||||
| -rw-r--r-- | drivers/net/ethernet/intel/libie/rx.c | 7 |
11 files changed, 878 insertions, 41 deletions
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c index 23e786b9793d..aaf70c625655 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c +++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c @@ -723,7 +723,7 @@ static void iavf_clean_rx_ring(struct iavf_ring *rx_ring) for (u32 i = rx_ring->next_to_clean; i != rx_ring->next_to_use; ) { const struct libeth_fqe *rx_fqes = &rx_ring->rx_fqes[i]; - page_pool_put_full_page(rx_ring->pp, rx_fqes->page, false); + libeth_rx_recycle_slow(rx_fqes->netmem); if (unlikely(++i == rx_ring->count)) i = 0; @@ -1197,10 +1197,11 @@ static void iavf_add_rx_frag(struct sk_buff *skb, const struct libeth_fqe *rx_buffer, unsigned int size) { - u32 hr = rx_buffer->page->pp->p.offset; + u32 hr = netmem_get_pp(rx_buffer->netmem)->p.offset; - skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page, - rx_buffer->offset + hr, size, rx_buffer->truesize); + skb_add_rx_frag_netmem(skb, skb_shinfo(skb)->nr_frags, + rx_buffer->netmem, rx_buffer->offset + hr, + size, rx_buffer->truesize); } /** @@ -1214,12 +1215,13 @@ static void iavf_add_rx_frag(struct sk_buff *skb, static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer, unsigned int size) { - u32 hr = rx_buffer->page->pp->p.offset; + struct page *buf_page = __netmem_to_page(rx_buffer->netmem); + u32 hr = buf_page->pp->p.offset; struct sk_buff *skb; void *va; /* prefetch first cache line of first page */ - va = page_address(rx_buffer->page) + rx_buffer->offset; + va = page_address(buf_page) + rx_buffer->offset; net_prefetch(va + hr); /* build an skb around the page buffer */ diff --git a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c index 993c354aa27a..555879b1248d 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c +++ b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c @@ -1006,7 +1006,7 @@ static int idpf_rx_singleq_clean(struct idpf_rx_queue *rx_q, int budget) break; skip_data: - rx_buf->page = NULL; + rx_buf->netmem = 0; IDPF_SINGLEQ_BUMP_RING_IDX(rx_q, ntc); cleaned_count++; diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c index 5cf440e09d0a..cef9dfb877e8 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c +++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c @@ -383,12 +383,12 @@ err_out: */ static void idpf_rx_page_rel(struct libeth_fqe *rx_buf) { - if (unlikely(!rx_buf->page)) + if (unlikely(!rx_buf->netmem)) return; - page_pool_put_full_page(rx_buf->page->pp, rx_buf->page, false); + libeth_rx_recycle_slow(rx_buf->netmem); - rx_buf->page = NULL; + rx_buf->netmem = 0; rx_buf->offset = 0; } @@ -3240,10 +3240,10 @@ idpf_rx_process_skb_fields(struct idpf_rx_queue *rxq, struct sk_buff *skb, void idpf_rx_add_frag(struct idpf_rx_buf *rx_buf, struct sk_buff *skb, unsigned int size) { - u32 hr = rx_buf->page->pp->p.offset; + u32 hr = netmem_get_pp(rx_buf->netmem)->p.offset; - skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buf->page, - rx_buf->offset + hr, size, rx_buf->truesize); + skb_add_rx_frag_netmem(skb, skb_shinfo(skb)->nr_frags, rx_buf->netmem, + rx_buf->offset + hr, size, rx_buf->truesize); } /** @@ -3266,16 +3266,20 @@ static u32 idpf_rx_hsplit_wa(const struct libeth_fqe *hdr, struct libeth_fqe *buf, u32 data_len) { u32 copy = data_len <= L1_CACHE_BYTES ? data_len : ETH_HLEN; + struct page *hdr_page, *buf_page; const void *src; void *dst; - if (!libeth_rx_sync_for_cpu(buf, copy)) + if (unlikely(netmem_is_net_iov(buf->netmem)) || + !libeth_rx_sync_for_cpu(buf, copy)) return 0; - dst = page_address(hdr->page) + hdr->offset + hdr->page->pp->p.offset; - src = page_address(buf->page) + buf->offset + buf->page->pp->p.offset; - memcpy(dst, src, LARGEST_ALIGN(copy)); + hdr_page = __netmem_to_page(hdr->netmem); + buf_page = __netmem_to_page(buf->netmem); + dst = page_address(hdr_page) + hdr->offset + hdr_page->pp->p.offset; + src = page_address(buf_page) + buf->offset + buf_page->pp->p.offset; + memcpy(dst, src, LARGEST_ALIGN(copy)); buf->offset += copy; return copy; @@ -3291,11 +3295,12 @@ static u32 idpf_rx_hsplit_wa(const struct libeth_fqe *hdr, */ struct sk_buff *idpf_rx_build_skb(const struct libeth_fqe *buf, u32 size) { - u32 hr = buf->page->pp->p.offset; + struct page *buf_page = __netmem_to_page(buf->netmem); + u32 hr = buf_page->pp->p.offset; struct sk_buff *skb; void *va; - va = page_address(buf->page) + buf->offset; + va = page_address(buf_page) + buf->offset; prefetch(va + hr); skb = napi_build_skb(va, buf->truesize); @@ -3429,7 +3434,8 @@ static int idpf_rx_splitq_clean(struct idpf_rx_queue *rxq, int budget) if (unlikely(!hdr_len && !skb)) { hdr_len = idpf_rx_hsplit_wa(hdr, rx_buf, pkt_len); - pkt_len -= hdr_len; + /* If failed, drop both buffers by setting len to 0 */ + pkt_len -= hdr_len ? : pkt_len; u64_stats_update_begin(&rxq->stats_sync); u64_stats_inc(&rxq->q_stats.hsplit_buf_ovf); @@ -3446,7 +3452,7 @@ static int idpf_rx_splitq_clean(struct idpf_rx_queue *rxq, int budget) u64_stats_update_end(&rxq->stats_sync); } - hdr->page = NULL; + hdr->netmem = 0; payload: if (!libeth_rx_sync_for_cpu(rx_buf, pkt_len)) @@ -3462,7 +3468,7 @@ payload: break; skip_data: - rx_buf->page = NULL; + rx_buf->netmem = 0; idpf_rx_post_buf_refill(refillq, buf_id); IDPF_RX_BUMP_NTC(rxq, ntc); diff --git a/drivers/net/ethernet/intel/libeth/Kconfig b/drivers/net/ethernet/intel/libeth/Kconfig index 480293b71dbc..2445b979c499 100644 --- a/drivers/net/ethernet/intel/libeth/Kconfig +++ b/drivers/net/ethernet/intel/libeth/Kconfig @@ -1,9 +1,15 @@ # SPDX-License-Identifier: GPL-2.0-only -# Copyright (C) 2024 Intel Corporation +# Copyright (C) 2024-2025 Intel Corporation config LIBETH - tristate + tristate "Common Ethernet library (libeth)" if COMPILE_TEST select PAGE_POOL help libeth is a common library containing routines shared between several drivers, but not yet promoted to the generic kernel API. + +config LIBETH_XDP + tristate "Common XDP library (libeth_xdp)" if COMPILE_TEST + select LIBETH + help + XDP and XSk helpers based on libeth hotpath management. diff --git a/drivers/net/ethernet/intel/libeth/Makefile b/drivers/net/ethernet/intel/libeth/Makefile index 52492b081132..350bc0b38bad 100644 --- a/drivers/net/ethernet/intel/libeth/Makefile +++ b/drivers/net/ethernet/intel/libeth/Makefile @@ -1,6 +1,12 @@ # SPDX-License-Identifier: GPL-2.0-only -# Copyright (C) 2024 Intel Corporation +# Copyright (C) 2024-2025 Intel Corporation obj-$(CONFIG_LIBETH) += libeth.o libeth-y := rx.o +libeth-y += tx.o + +obj-$(CONFIG_LIBETH_XDP) += libeth_xdp.o + +libeth_xdp-y += xdp.o +libeth_xdp-y += xsk.o diff --git a/drivers/net/ethernet/intel/libeth/priv.h b/drivers/net/ethernet/intel/libeth/priv.h new file mode 100644 index 000000000000..9b811d31015c --- /dev/null +++ b/drivers/net/ethernet/intel/libeth/priv.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (C) 2025 Intel Corporation */ + +#ifndef __LIBETH_PRIV_H +#define __LIBETH_PRIV_H + +#include <linux/types.h> + +/* XDP */ + +enum xdp_action; +struct libeth_xdp_buff; +struct libeth_xdp_tx_frame; +struct skb_shared_info; +struct xdp_frame_bulk; + +extern const struct xsk_tx_metadata_ops libeth_xsktmo_slow; + +void libeth_xsk_tx_return_bulk(const struct libeth_xdp_tx_frame *bq, + u32 count); +u32 libeth_xsk_prog_exception(struct libeth_xdp_buff *xdp, enum xdp_action act, + int ret); + +struct libeth_xdp_ops { + void (*bulk)(const struct skb_shared_info *sinfo, + struct xdp_frame_bulk *bq, bool frags); + void (*xsk)(struct libeth_xdp_buff *xdp); +}; + +void libeth_attach_xdp(const struct libeth_xdp_ops *ops); + +static inline void libeth_detach_xdp(void) +{ + libeth_attach_xdp(NULL); +} + +#endif /* __LIBETH_PRIV_H */ diff --git a/drivers/net/ethernet/intel/libeth/rx.c b/drivers/net/ethernet/intel/libeth/rx.c index 66d1d23b8ad2..62521a1f4ec9 100644 --- a/drivers/net/ethernet/intel/libeth/rx.c +++ b/drivers/net/ethernet/intel/libeth/rx.c @@ -1,5 +1,9 @@ // SPDX-License-Identifier: GPL-2.0-only -/* Copyright (C) 2024 Intel Corporation */ +/* Copyright (C) 2024-2025 Intel Corporation */ + +#define DEFAULT_SYMBOL_NAMESPACE "LIBETH" + +#include <linux/export.h> #include <net/libeth/rx.h> @@ -68,7 +72,7 @@ static u32 libeth_rx_hw_len_truesize(const struct page_pool_params *pp, static bool libeth_rx_page_pool_params(struct libeth_fq *fq, struct page_pool_params *pp) { - pp->offset = LIBETH_SKB_HEADROOM; + pp->offset = fq->xdp ? LIBETH_XDP_HEADROOM : LIBETH_SKB_HEADROOM; /* HW-writeable / syncable length per one page */ pp->max_len = LIBETH_RX_PAGE_LEN(pp->offset); @@ -155,11 +159,12 @@ int libeth_rx_fq_create(struct libeth_fq *fq, struct napi_struct *napi) .dev = napi->dev->dev.parent, .netdev = napi->dev, .napi = napi, - .dma_dir = DMA_FROM_DEVICE, }; struct libeth_fqe *fqes; struct page_pool *pool; - bool ret; + int ret; + + pp.dma_dir = fq->xdp ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE; if (!fq->hsplit) ret = libeth_rx_page_pool_params(fq, &pp); @@ -173,20 +178,28 @@ int libeth_rx_fq_create(struct libeth_fq *fq, struct napi_struct *napi) return PTR_ERR(pool); fqes = kvcalloc_node(fq->count, sizeof(*fqes), GFP_KERNEL, fq->nid); - if (!fqes) + if (!fqes) { + ret = -ENOMEM; goto err_buf; + } + + ret = xdp_reg_page_pool(pool); + if (ret) + goto err_mem; fq->fqes = fqes; fq->pp = pool; return 0; +err_mem: + kvfree(fqes); err_buf: page_pool_destroy(pool); - return -ENOMEM; + return ret; } -EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_create, "LIBETH"); +EXPORT_SYMBOL_GPL(libeth_rx_fq_create); /** * libeth_rx_fq_destroy - destroy a &page_pool created by libeth @@ -194,22 +207,23 @@ EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_create, "LIBETH"); */ void libeth_rx_fq_destroy(struct libeth_fq *fq) { + xdp_unreg_page_pool(fq->pp); kvfree(fq->fqes); page_pool_destroy(fq->pp); } -EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_destroy, "LIBETH"); +EXPORT_SYMBOL_GPL(libeth_rx_fq_destroy); /** - * libeth_rx_recycle_slow - recycle a libeth page from the NAPI context - * @page: page to recycle + * libeth_rx_recycle_slow - recycle libeth netmem + * @netmem: network memory to recycle * * To be used on exceptions or rare cases not requiring fast inline recycling. */ -void libeth_rx_recycle_slow(struct page *page) +void __cold libeth_rx_recycle_slow(netmem_ref netmem) { - page_pool_recycle_direct(page->pp, page); + page_pool_put_full_netmem(netmem_get_pp(netmem), netmem, false); } -EXPORT_SYMBOL_NS_GPL(libeth_rx_recycle_slow, "LIBETH"); +EXPORT_SYMBOL_GPL(libeth_rx_recycle_slow); /* Converting abstract packet type numbers into a software structure with * the packet parameters to do O(1) lookup on Rx. @@ -251,7 +265,7 @@ void libeth_rx_pt_gen_hash_type(struct libeth_rx_pt *pt) pt->hash_type |= libeth_rx_pt_xdp_iprot[pt->inner_prot]; pt->hash_type |= libeth_rx_pt_xdp_pl[pt->payload_layer]; } -EXPORT_SYMBOL_NS_GPL(libeth_rx_pt_gen_hash_type, "LIBETH"); +EXPORT_SYMBOL_GPL(libeth_rx_pt_gen_hash_type); /* Module */ diff --git a/drivers/net/ethernet/intel/libeth/tx.c b/drivers/net/ethernet/intel/libeth/tx.c new file mode 100644 index 000000000000..e0167f43d2a8 --- /dev/null +++ b/drivers/net/ethernet/intel/libeth/tx.c @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (C) 2025 Intel Corporation */ + +#define DEFAULT_SYMBOL_NAMESPACE "LIBETH" + +#include <net/libeth/xdp.h> + +#include "priv.h" + +/* Tx buffer completion */ + +DEFINE_STATIC_CALL_NULL(bulk, libeth_xdp_return_buff_bulk); +DEFINE_STATIC_CALL_NULL(xsk, libeth_xsk_buff_free_slow); + +/** + * libeth_tx_complete_any - perform Tx completion for one SQE of any type + * @sqe: Tx buffer to complete + * @cp: polling params + * + * Can be used to complete both regular and XDP SQEs, for example when + * destroying queues. + * When libeth_xdp is not loaded, XDPSQEs won't be handled. + */ +void libeth_tx_complete_any(struct libeth_sqe *sqe, struct libeth_cq_pp *cp) +{ + if (sqe->type >= __LIBETH_SQE_XDP_START) + __libeth_xdp_complete_tx(sqe, cp, static_call(bulk), + static_call(xsk)); + else + libeth_tx_complete(sqe, cp); +} +EXPORT_SYMBOL_GPL(libeth_tx_complete_any); + +/* Module */ + +void libeth_attach_xdp(const struct libeth_xdp_ops *ops) +{ + static_call_update(bulk, ops ? ops->bulk : NULL); + static_call_update(xsk, ops ? ops->xsk : NULL); +} +EXPORT_SYMBOL_GPL(libeth_attach_xdp); diff --git a/drivers/net/ethernet/intel/libeth/xdp.c b/drivers/net/ethernet/intel/libeth/xdp.c new file mode 100644 index 000000000000..d4ac027d9584 --- /dev/null +++ b/drivers/net/ethernet/intel/libeth/xdp.c @@ -0,0 +1,451 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (C) 2025 Intel Corporation */ + +#define DEFAULT_SYMBOL_NAMESPACE "LIBETH_XDP" + +#include <linux/export.h> + +#include <net/libeth/xdp.h> + +#include "priv.h" + +/* XDPSQ sharing */ + +DEFINE_STATIC_KEY_FALSE(libeth_xdpsq_share); +EXPORT_SYMBOL_GPL(libeth_xdpsq_share); + +void __libeth_xdpsq_get(struct libeth_xdpsq_lock *lock, + const struct net_device *dev) +{ + bool warn; + + spin_lock_init(&lock->lock); + lock->share = true; + + warn = !static_key_enabled(&libeth_xdpsq_share); + static_branch_inc(&libeth_xdpsq_share); + + if (warn && net_ratelimit()) + netdev_warn(dev, "XDPSQ sharing enabled, possible XDP Tx slowdown\n"); +} +EXPORT_SYMBOL_GPL(__libeth_xdpsq_get); + +void __libeth_xdpsq_put(struct libeth_xdpsq_lock *lock, + const struct net_device *dev) +{ + static_branch_dec(&libeth_xdpsq_share); + + if (!static_key_enabled(&libeth_xdpsq_share) && net_ratelimit()) + netdev_notice(dev, "XDPSQ sharing disabled\n"); + + lock->share = false; +} +EXPORT_SYMBOL_GPL(__libeth_xdpsq_put); + +void __acquires(&lock->lock) +__libeth_xdpsq_lock(struct libeth_xdpsq_lock *lock) +{ + spin_lock(&lock->lock); +} +EXPORT_SYMBOL_GPL(__libeth_xdpsq_lock); + +void __releases(&lock->lock) +__libeth_xdpsq_unlock(struct libeth_xdpsq_lock *lock) +{ + spin_unlock(&lock->lock); +} +EXPORT_SYMBOL_GPL(__libeth_xdpsq_unlock); + +/* XDPSQ clean-up timers */ + +/** + * libeth_xdpsq_init_timer - initialize an XDPSQ clean-up timer + * @timer: timer to initialize + * @xdpsq: queue this timer belongs to + * @lock: corresponding XDPSQ lock + * @poll: queue polling/completion function + * + * XDPSQ clean-up timers must be set up before using at the queue configuration + * time. Set the required pointers and the cleaning callback. + */ +void libeth_xdpsq_init_timer(struct libeth_xdpsq_timer *timer, void *xdpsq, + struct libeth_xdpsq_lock *lock, + void (*poll)(struct work_struct *work)) +{ + timer->xdpsq = xdpsq; + timer->lock = lock; + + INIT_DELAYED_WORK(&timer->dwork, poll); +} +EXPORT_SYMBOL_GPL(libeth_xdpsq_init_timer); + +/* ``XDP_TX`` bulking */ + +static void __cold +libeth_xdp_tx_return_one(const struct libeth_xdp_tx_frame *frm) +{ + if (frm->len_fl & LIBETH_XDP_TX_MULTI) + libeth_xdp_return_frags(frm->data + frm->soff, true); + + libeth_xdp_return_va(frm->data, true); +} + +static void __cold +libeth_xdp_tx_return_bulk(const struct libeth_xdp_tx_frame *bq, u32 count) +{ + for (u32 i = 0; i < count; i++) { + const struct libeth_xdp_tx_frame *frm = &bq[i]; + + if (!(frm->len_fl & LIBETH_XDP_TX_FIRST)) + continue; + + libeth_xdp_tx_return_one(frm); + } +} + +static void __cold libeth_trace_xdp_exception(const struct net_device *dev, + const struct bpf_prog *prog, + u32 act) +{ + trace_xdp_exception(dev, prog, act); +} + +/** + * libeth_xdp_tx_exception - handle Tx exceptions of XDP frames + * @bq: XDP Tx frame bulk + * @sent: number of frames sent successfully (from this bulk) + * @flags: internal libeth_xdp flags (XSk, .ndo_xdp_xmit etc.) + * + * Cold helper used by __libeth_xdp_tx_flush_bulk(), do not call directly. + * Reports XDP Tx exceptions, frees the frames that won't be sent or adjust + * the Tx bulk to try again later. + */ +void __cold libeth_xdp_tx_exception(struct libeth_xdp_tx_bulk *bq, u32 sent, + u32 flags) +{ + const struct libeth_xdp_tx_frame *pos = &bq->bulk[sent]; + u32 left = bq->count - sent; + + if (!(flags & LIBETH_XDP_TX_NDO)) + libeth_trace_xdp_exception(bq->dev, bq->prog, XDP_TX); + + if (!(flags & LIBETH_XDP_TX_DROP)) { + memmove(bq->bulk, pos, left * sizeof(*bq->bulk)); + bq->count = left; + + return; + } + + if (flags & LIBETH_XDP_TX_XSK) + libeth_xsk_tx_return_bulk(pos, left); + else if (!(flags & LIBETH_XDP_TX_NDO)) + libeth_xdp_tx_return_bulk(pos, left); + else + libeth_xdp_xmit_return_bulk(pos, left, bq->dev); + + bq->count = 0; +} +EXPORT_SYMBOL_GPL(libeth_xdp_tx_exception); + +/* .ndo_xdp_xmit() implementation */ + +u32 __cold libeth_xdp_xmit_return_bulk(const struct libeth_xdp_tx_frame *bq, + u32 count, const struct net_device *dev) +{ + u32 n = 0; + + for (u32 i = 0; i < count; i++) { + const struct libeth_xdp_tx_frame *frm = &bq[i]; + dma_addr_t dma; + + if (frm->flags & LIBETH_XDP_TX_FIRST) + dma = *libeth_xdp_xmit_frame_dma(frm->xdpf); + else + dma = dma_unmap_addr(frm, dma); + + dma_unmap_page(dev->dev.parent, dma, dma_unmap_len(frm, len), + DMA_TO_DEVICE); + + /* Actual xdp_frames are freed by the core */ + n += !!(frm->flags & LIBETH_XDP_TX_FIRST); + } + + return n; +} +EXPORT_SYMBOL_GPL(libeth_xdp_xmit_return_bulk); + +/* Rx polling path */ + +/** + * libeth_xdp_load_stash - recreate an &xdp_buff from libeth_xdp buffer stash + * @dst: target &libeth_xdp_buff to initialize + * @src: source stash + * + * External helper used by libeth_xdp_init_buff(), do not call directly. + * Recreate an onstack &libeth_xdp_buff using the stash saved earlier. + * The only field untouched (rxq) is initialized later in the + * abovementioned function. + */ +void libeth_xdp_load_stash(struct libeth_xdp_buff *dst, + const struct libeth_xdp_buff_stash *src) +{ + dst->data = src->data; + dst->base.data_end = src->data + src->len; + dst->base.data_meta = src->data; + dst->base.data_hard_start = src->data - src->headroom; + + dst->base.frame_sz = src->frame_sz; + dst->base.flags = src->flags; +} +EXPORT_SYMBOL_GPL(libeth_xdp_load_stash); + +/** + * libeth_xdp_save_stash - convert &xdp_buff to a libeth_xdp buffer stash + * @dst: target &libeth_xdp_buff_stash to initialize + * @src: source XDP buffer + * + * External helper used by libeth_xdp_save_buff(), do not call directly. + * Use the fields from the passed XDP buffer to initialize the stash on the + * queue, so that a partially received frame can be finished later during + * the next NAPI poll. + */ +void libeth_xdp_save_stash(struct libeth_xdp_buff_stash *dst, + const struct libeth_xdp_buff *src) +{ + dst->data = src->data; + dst->headroom = src->data - src->base.data_hard_start; + dst->len = src->base.data_end - src->data; + + dst->frame_sz = src->base.frame_sz; + dst->flags = src->base.flags; + + WARN_ON_ONCE(dst->flags != src->base.flags); +} +EXPORT_SYMBOL_GPL(libeth_xdp_save_stash); + +void __libeth_xdp_return_stash(struct libeth_xdp_buff_stash *stash) +{ + LIBETH_XDP_ONSTACK_BUFF(xdp); + + libeth_xdp_load_stash(xdp, stash); + libeth_xdp_return_buff_slow(xdp); + + stash->data = NULL; +} +EXPORT_SYMBOL_GPL(__libeth_xdp_return_stash); + +/** + * libeth_xdp_return_buff_slow - free &libeth_xdp_buff + * @xdp: buffer to free/return + * + * Slowpath version of libeth_xdp_return_buff() to be called on exceptions, + * queue clean-ups etc., without unwanted inlining. + */ +void __cold libeth_xdp_return_buff_slow(struct libeth_xdp_buff *xdp) +{ + __libeth_xdp_return_buff(xdp, false); +} +EXPORT_SYMBOL_GPL(libeth_xdp_return_buff_slow); + +/** + * libeth_xdp_buff_add_frag - add frag to XDP buffer + * @xdp: head XDP buffer + * @fqe: Rx buffer containing the frag + * @len: frag length reported by HW + * + * External helper used by libeth_xdp_process_buff(), do not call directly. + * Frees both head and frag buffers on error. + * + * Return: true success, false on error (no space for a new frag). + */ +bool libeth_xdp_buff_add_frag(struct libeth_xdp_buff *xdp, + const struct libeth_fqe *fqe, + u32 len) +{ + netmem_ref netmem = fqe->netmem; + + if (!xdp_buff_add_frag(&xdp->base, netmem, + fqe->offset + netmem_get_pp(netmem)->p.offset, + len, fqe->truesize)) + goto recycle; + + return true; + +recycle: + libeth_rx_recycle_slow(netmem); + libeth_xdp_return_buff_slow(xdp); + + return false; +} +EXPORT_SYMBOL_GPL(libeth_xdp_buff_add_frag); + +/** + * libeth_xdp_prog_exception - handle XDP prog exceptions + * @bq: XDP Tx bulk + * @xdp: buffer to process + * @act: original XDP prog verdict + * @ret: error code if redirect failed + * + * External helper used by __libeth_xdp_run_prog() and + * __libeth_xsk_run_prog_slow(), do not call directly. + * Reports invalid @act, XDP exception trace event and frees the buffer. + * + * Return: libeth_xdp XDP prog verdict. + */ +u32 __cold libeth_xdp_prog_exception(const struct libeth_xdp_tx_bulk *bq, + struct libeth_xdp_buff *xdp, + enum xdp_action act, int ret) +{ + if (act > XDP_REDIRECT) + bpf_warn_invalid_xdp_action(bq->dev, bq->prog, act); + + libeth_trace_xdp_exception(bq->dev, bq->prog, act); + + if (xdp->base.rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) + return libeth_xsk_prog_exception(xdp, act, ret); + + libeth_xdp_return_buff_slow(xdp); + + return LIBETH_XDP_DROP; +} +EXPORT_SYMBOL_GPL(libeth_xdp_prog_exception); + +/* Tx buffer completion */ + +static void libeth_xdp_put_netmem_bulk(netmem_ref netmem, + struct xdp_frame_bulk *bq) +{ + if (unlikely(bq->count == XDP_BULK_QUEUE_SIZE)) + xdp_flush_frame_bulk(bq); + + bq->q[bq->count++] = netmem; +} + +/** + * libeth_xdp_return_buff_bulk - free &xdp_buff as part of a bulk + * @sinfo: shared info corresponding to the buffer + * @bq: XDP frame bulk to store the buffer + * @frags: whether the buffer has frags + * + * Same as xdp_return_frame_bulk(), but for &libeth_xdp_buff, speeds up Tx + * completion of ``XDP_TX`` buffers and allows to free them in same bulks + * with &xdp_frame buffers. + */ +void libeth_xdp_return_buff_bulk(const struct skb_shared_info *sinfo, + struct xdp_frame_bulk *bq, bool frags) +{ + if (!frags) + goto head; + + for (u32 i = 0; i < sinfo->nr_frags; i++) + libeth_xdp_put_netmem_bulk(skb_frag_netmem(&sinfo->frags[i]), + bq); + +head: + libeth_xdp_put_netmem_bulk(virt_to_netmem(sinfo), bq); +} +EXPORT_SYMBOL_GPL(libeth_xdp_return_buff_bulk); + +/* Misc */ + +/** + * libeth_xdp_queue_threshold - calculate XDP queue clean/refill threshold + * @count: number of descriptors in the queue + * + * The threshold is the limit at which RQs start to refill (when the number of + * empty buffers exceeds it) and SQs get cleaned up (when the number of free + * descriptors goes below it). To speed up hotpath processing, threshold is + * always pow-2, closest to 1/4 of the queue length. + * Don't call it on hotpath, calculate and cache the threshold during the + * queue initialization. + * + * Return: the calculated threshold. + */ +u32 libeth_xdp_queue_threshold(u32 count) +{ + u32 quarter, low, high; + + if (likely(is_power_of_2(count))) + return count >> 2; + + quarter = DIV_ROUND_CLOSEST(count, 4); + low = rounddown_pow_of_two(quarter); + high = roundup_pow_of_two(quarter); + + return high - quarter <= quarter - low ? high : low; +} +EXPORT_SYMBOL_GPL(libeth_xdp_queue_threshold); + +/** + * __libeth_xdp_set_features - set XDP features for netdev + * @dev: &net_device to configure + * @xmo: XDP metadata ops (Rx hints) + * @zc_segs: maximum number of S/G frags the HW can transmit + * @tmo: XSk Tx metadata ops (Tx hints) + * + * Set all the features libeth_xdp supports. Only the first argument is + * necessary; without the third one (zero), XSk support won't be advertised. + * Use the non-underscored versions in drivers instead. + */ +void __libeth_xdp_set_features(struct net_device *dev, + const struct xdp_metadata_ops *xmo, + u32 zc_segs, + const struct xsk_tx_metadata_ops *tmo) +{ + xdp_set_features_flag(dev, + NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | + (zc_segs ? NETDEV_XDP_ACT_XSK_ZEROCOPY : 0) | + NETDEV_XDP_ACT_RX_SG | + NETDEV_XDP_ACT_NDO_XMIT_SG); + dev->xdp_metadata_ops = xmo; + + tmo = tmo == libeth_xsktmo ? &libeth_xsktmo_slow : tmo; + + dev->xdp_zc_max_segs = zc_segs ? : 1; + dev->xsk_tx_metadata_ops = zc_segs ? tmo : NULL; +} +EXPORT_SYMBOL_GPL(__libeth_xdp_set_features); + +/** + * libeth_xdp_set_redirect - toggle the XDP redirect feature + * @dev: &net_device to configure + * @enable: whether XDP is enabled + * + * Use this when XDPSQs are not always available to dynamically enable + * and disable redirect feature. + */ +void libeth_xdp_set_redirect(struct net_device *dev, bool enable) +{ + if (enable) + xdp_features_set_redirect_target(dev, true); + else + xdp_features_clear_redirect_target(dev); +} +EXPORT_SYMBOL_GPL(libeth_xdp_set_redirect); + +/* Module */ + +static const struct libeth_xdp_ops xdp_ops __initconst = { + .bulk = libeth_xdp_return_buff_bulk, + .xsk = libeth_xsk_buff_free_slow, +}; + +static int __init libeth_xdp_module_init(void) +{ + libeth_attach_xdp(&xdp_ops); + + return 0; +} +module_init(libeth_xdp_module_init); + +static void __exit libeth_xdp_module_exit(void) +{ + libeth_detach_xdp(); +} +module_exit(libeth_xdp_module_exit); + +MODULE_DESCRIPTION("Common Ethernet library - XDP infra"); +MODULE_IMPORT_NS("LIBETH"); +MODULE_LICENSE("GPL"); diff --git a/drivers/net/ethernet/intel/libeth/xsk.c b/drivers/net/ethernet/intel/libeth/xsk.c new file mode 100644 index 000000000000..846e902e31b6 --- /dev/null +++ b/drivers/net/ethernet/intel/libeth/xsk.c @@ -0,0 +1,271 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (C) 2025 Intel Corporation */ + +#define DEFAULT_SYMBOL_NAMESPACE "LIBETH_XDP" + +#include <linux/export.h> + +#include <net/libeth/xsk.h> + +#include "priv.h" + +/* ``XDP_TX`` bulking */ + +void __cold libeth_xsk_tx_return_bulk(const struct libeth_xdp_tx_frame *bq, + u32 count) +{ + for (u32 i = 0; i < count; i++) + libeth_xsk_buff_free_slow(bq[i].xsk); +} + +/* XSk TMO */ + +const struct xsk_tx_metadata_ops libeth_xsktmo_slow = { + .tmo_request_checksum = libeth_xsktmo_req_csum, +}; + +/* Rx polling path */ + +/** + * libeth_xsk_buff_free_slow - free an XSk Rx buffer + * @xdp: buffer to free + * + * Slowpath version of xsk_buff_free() to be used on exceptions, cleanups etc. + * to avoid unwanted inlining. + */ +void libeth_xsk_buff_free_slow(struct libeth_xdp_buff *xdp) +{ + xsk_buff_free(&xdp->base); +} +EXPORT_SYMBOL_GPL(libeth_xsk_buff_free_slow); + +/** + * libeth_xsk_buff_add_frag - add frag to XSk Rx buffer + * @head: head buffer + * @xdp: frag buffer + * + * External helper used by libeth_xsk_process_buff(), do not call directly. + * Frees both main and frag buffers on error. + * + * Return: main buffer with attached frag on success, %NULL on error (no space + * for a new frag). + */ +struct libeth_xdp_buff *libeth_xsk_buff_add_frag(struct libeth_xdp_buff *head, + struct libeth_xdp_buff *xdp) +{ + if (!xsk_buff_add_frag(&head->base, &xdp->base)) + goto free; + + return head; + +free: + libeth_xsk_buff_free_slow(xdp); + libeth_xsk_buff_free_slow(head); + + return NULL; +} +EXPORT_SYMBOL_GPL(libeth_xsk_buff_add_frag); + +/** + * libeth_xsk_buff_stats_frags - update onstack RQ stats with XSk frags info + * @rs: onstack stats to update + * @xdp: buffer to account + * + * External helper used by __libeth_xsk_run_pass(), do not call directly. + * Adds buffer's frags count and total len to the onstack stats. + */ +void libeth_xsk_buff_stats_frags(struct libeth_rq_napi_stats *rs, + const struct libeth_xdp_buff *xdp) +{ + libeth_xdp_buff_stats_frags(rs, xdp); +} +EXPORT_SYMBOL_GPL(libeth_xsk_buff_stats_frags); + +/** + * __libeth_xsk_run_prog_slow - process the non-``XDP_REDIRECT`` verdicts + * @xdp: buffer to process + * @bq: Tx bulk for queueing on ``XDP_TX`` + * @act: verdict to process + * @ret: error code if ``XDP_REDIRECT`` failed + * + * External helper used by __libeth_xsk_run_prog(), do not call directly. + * ``XDP_REDIRECT`` is the most common and hottest verdict on XSk, thus + * it is processed inline. The rest goes here for out-of-line processing, + * together with redirect errors. + * + * Return: libeth_xdp XDP prog verdict. + */ +u32 __libeth_xsk_run_prog_slow(struct libeth_xdp_buff *xdp, + const struct libeth_xdp_tx_bulk *bq, + enum xdp_action act, int ret) +{ + switch (act) { + case XDP_DROP: + xsk_buff_free(&xdp->base); + + return LIBETH_XDP_DROP; + case XDP_TX: + return LIBETH_XDP_TX; + case XDP_PASS: + return LIBETH_XDP_PASS; + default: + break; + } + + return libeth_xdp_prog_exception(bq, xdp, act, ret); +} +EXPORT_SYMBOL_GPL(__libeth_xsk_run_prog_slow); + +/** + * libeth_xsk_prog_exception - handle XDP prog exceptions on XSk + * @xdp: buffer to process + * @act: verdict returned by the prog + * @ret: error code if ``XDP_REDIRECT`` failed + * + * Internal. Frees the buffer and, if the queue uses XSk wakeups, stop the + * current NAPI poll when there are no free buffers left. + * + * Return: libeth_xdp's XDP prog verdict. + */ +u32 __cold libeth_xsk_prog_exception(struct libeth_xdp_buff *xdp, + enum xdp_action act, int ret) +{ + const struct xdp_buff_xsk *xsk; + u32 __ret = LIBETH_XDP_DROP; + + if (act != XDP_REDIRECT) + goto drop; + + xsk = container_of(&xdp->base, typeof(*xsk), xdp); + if (xsk_uses_need_wakeup(xsk->pool) && ret == -ENOBUFS) + __ret = LIBETH_XDP_ABORTED; + +drop: + libeth_xsk_buff_free_slow(xdp); + + return __ret; +} + +/* Refill */ + +/** + * libeth_xskfq_create - create an XSkFQ + * @fq: fill queue to initialize + * + * Allocates the FQEs and initializes the fields used by libeth_xdp: number + * of buffers to refill, refill threshold and buffer len. + * + * Return: %0 on success, -errno otherwise. + */ +int libeth_xskfq_create(struct libeth_xskfq *fq) +{ + fq->fqes = kvcalloc_node(fq->count, sizeof(*fq->fqes), GFP_KERNEL, + fq->nid); + if (!fq->fqes) + return -ENOMEM; + + fq->pending = fq->count; + fq->thresh = libeth_xdp_queue_threshold(fq->count); + fq->buf_len = xsk_pool_get_rx_frame_size(fq->pool); + + return 0; +} +EXPORT_SYMBOL_GPL(libeth_xskfq_create); + +/** + * libeth_xskfq_destroy - destroy an XSkFQ + * @fq: fill queue to destroy + * + * Zeroes the used fields and frees the FQEs array. + */ +void libeth_xskfq_destroy(struct libeth_xskfq *fq) +{ + fq->buf_len = 0; + fq->thresh = 0; + fq->pending = 0; + + kvfree(fq->fqes); +} +EXPORT_SYMBOL_GPL(libeth_xskfq_destroy); + +/* .ndo_xsk_wakeup */ + +static void libeth_xsk_napi_sched(void *info) +{ + __napi_schedule_irqoff(info); +} + +/** + * libeth_xsk_init_wakeup - initialize libeth XSk wakeup structure + * @csd: struct to initialize + * @napi: NAPI corresponding to this queue + * + * libeth_xdp uses inter-processor interrupts to perform XSk wakeups. In order + * to do that, the corresponding CSDs must be initialized when creating the + * queues. + */ +void libeth_xsk_init_wakeup(call_single_data_t *csd, struct napi_struct *napi) +{ + INIT_CSD(csd, libeth_xsk_napi_sched, napi); +} +EXPORT_SYMBOL_GPL(libeth_xsk_init_wakeup); + +/** + * libeth_xsk_wakeup - perform an XSk wakeup + * @csd: CSD corresponding to the queue + * @qid: the stack queue index + * + * Try to mark the NAPI as missed first, so that it could be rescheduled. + * If it's not, schedule it on the corresponding CPU using IPIs (or directly + * if already running on it). + */ +void libeth_xsk_wakeup(call_single_data_t *csd, u32 qid) +{ + struct napi_struct *napi = csd->info; + + if (napi_if_scheduled_mark_missed(napi) || + unlikely(!napi_schedule_prep(napi))) + return; + + if (unlikely(qid >= nr_cpu_ids)) + qid %= nr_cpu_ids; + + if (qid != raw_smp_processor_id() && cpu_online(qid)) + smp_call_function_single_async(qid, csd); + else + __napi_schedule(napi); +} +EXPORT_SYMBOL_GPL(libeth_xsk_wakeup); + +/* Pool setup */ + +#define LIBETH_XSK_DMA_ATTR \ + (DMA_ATTR_WEAK_ORDERING | DMA_ATTR_SKIP_CPU_SYNC) + +/** + * libeth_xsk_setup_pool - setup or destroy an XSk pool for a queue + * @dev: target &net_device + * @qid: stack queue index to configure + * @enable: whether to enable or disable the pool + * + * Check that @qid is valid and then map or unmap the pool. + * + * Return: %0 on success, -errno otherwise. + */ +int libeth_xsk_setup_pool(struct net_device *dev, u32 qid, bool enable) +{ + struct xsk_buff_pool *pool; + + pool = xsk_get_pool_from_qid(dev, qid); + if (!pool) + return -EINVAL; + + if (enable) + return xsk_pool_dma_map(pool, dev->dev.parent, + LIBETH_XSK_DMA_ATTR); + else + xsk_pool_dma_unmap(pool, LIBETH_XSK_DMA_ATTR); + + return 0; +} +EXPORT_SYMBOL_GPL(libeth_xsk_setup_pool); diff --git a/drivers/net/ethernet/intel/libie/rx.c b/drivers/net/ethernet/intel/libie/rx.c index 66a9825fe11f..6fda656afa9c 100644 --- a/drivers/net/ethernet/intel/libie/rx.c +++ b/drivers/net/ethernet/intel/libie/rx.c @@ -1,6 +1,9 @@ // SPDX-License-Identifier: GPL-2.0-only -/* Copyright (C) 2024 Intel Corporation */ +/* Copyright (C) 2024-2025 Intel Corporation */ +#define DEFAULT_SYMBOL_NAMESPACE "LIBIE" + +#include <linux/export.h> #include <linux/net/intel/libie/rx.h> /* O(1) converting i40e/ice/iavf's 8/10-bit hardware packet type to a parsed @@ -116,7 +119,7 @@ const struct libeth_rx_pt libie_rx_pt_lut[LIBIE_RX_PT_NUM] = { LIBIE_RX_PT_IP(4), LIBIE_RX_PT_IP(6), }; -EXPORT_SYMBOL_NS_GPL(libie_rx_pt_lut, "LIBIE"); +EXPORT_SYMBOL_GPL(libie_rx_pt_lut); MODULE_DESCRIPTION("Intel(R) Ethernet common library"); MODULE_IMPORT_NS("LIBETH"); |
