diff options
Diffstat (limited to 'Documentation/userspace-api')
| -rw-r--r-- | Documentation/userspace-api/dma-buf-heaps.rst | 7 | ||||
| -rw-r--r-- | Documentation/userspace-api/fwctl/bnxt_fwctl.rst | 74 | ||||
| -rw-r--r-- | Documentation/userspace-api/fwctl/fwctl.rst | 1 | ||||
| -rw-r--r-- | Documentation/userspace-api/fwctl/index.rst | 1 | ||||
| -rw-r--r-- | Documentation/userspace-api/landlock.rst | 45 | ||||
| -rw-r--r-- | Documentation/userspace-api/media/dvb/legacy_dvb_audio.rst | 2 | ||||
| -rw-r--r-- | Documentation/userspace-api/media/v4l/subdev-formats.rst | 20 | ||||
| -rw-r--r-- | Documentation/userspace-api/rseq.rst | 94 |
8 files changed, 227 insertions, 17 deletions
diff --git a/Documentation/userspace-api/dma-buf-heaps.rst b/Documentation/userspace-api/dma-buf-heaps.rst index 05445c83b79a..f56b743cdb36 100644 --- a/Documentation/userspace-api/dma-buf-heaps.rst +++ b/Documentation/userspace-api/dma-buf-heaps.rst @@ -16,6 +16,13 @@ following heaps: - The ``system`` heap allocates virtually contiguous, cacheable, buffers. + - The ``system_cc_shared`` heap allocates virtually contiguous, cacheable, + buffers using shared (decrypted) memory. It is only present on + confidential computing (CoCo) VMs where memory encryption is active + (e.g., AMD SEV, Intel TDX). The allocated pages have the encryption + bit cleared, making them accessible for device DMA without TDISP + support. On non-CoCo VM configurations, this heap is not registered. + - The ``default_cma_region`` heap allocates physically contiguous, cacheable, buffers. Only present if a CMA region is present. Such a region is usually created either through the kernel commandline diff --git a/Documentation/userspace-api/fwctl/bnxt_fwctl.rst b/Documentation/userspace-api/fwctl/bnxt_fwctl.rst new file mode 100644 index 000000000000..97c9b095cf21 --- /dev/null +++ b/Documentation/userspace-api/fwctl/bnxt_fwctl.rst @@ -0,0 +1,74 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================= +fwctl bnxt driver +================= + +:Author: Pavan Chebbi + +Overview +======== + +BNXT driver makes a fwctl service available through an auxiliary_device. +The bnxt_fwctl driver binds to this device and registers itself with the +fwctl subsystem. + +The bnxt_fwctl driver is agnostic to the device firmware internals. It +uses the Upper Layer Protocol (ULP) conduit provided by bnxt to send +HardWare Resource Manager (HWRM) commands to firmware. + +These commands can query or change firmware driven device configurations +and read/write registers that are useful for debugging. + +bnxt_fwctl User API +=================== + +Each RPC request contains the HWRM input structure in the fwctl_rpc +'in' buffer while 'out' will contain the response. + +A typical user application can send a FWCTL_INFO command using ioctl() +to discover bnxt_fwctl's RPC capabilities as shown below: + + ioctl(fd, FWCTL_INFO, &fwctl_info_msg); + +where fwctl_info_msg (of type struct fwctl_info) describes bnxt_info_msg +(of type struct fwctl_info_bnxt). fwctl_info_msg is set up as follows: + + size = sizeof(struct fwctl_info); + flags = 0; + device_data_len = sizeof(bnxt_info_msg); + out_device_data = (__aligned_u64)&bnxt_info_msg; + +The uctx_caps of bnxt_info_msg represents the capabilities as described +in fwctl_bnxt_commands of include/uapi/fwctl/bnxt.h + +The FW RPC itself, FWCTL_RPC can be sent using ioctl() as: + + ioctl(fd, FWCTL_RPC, &fwctl_rpc_msg); + +where fwctl_rpc_msg (of type struct fwctl_rpc) carries the HWRM command +in its 'in' buffer. The HWRM input structures are described in +include/linux/bnxt/hsi.h. An example for HWRM_VER_GET is shown below: + + struct hwrm_ver_get_output resp; + struct fwctl_rpc fwctl_rpc_msg; + struct hwrm_ver_get_input req; + + req.req_type = HWRM_VER_GET; + req.hwrm_intf_maj = HWRM_VERSION_MAJOR; + req.hwrm_intf_min = HWRM_VERSION_MINOR; + req.hwrm_intf_upd = HWRM_VERSION_UPDATE; + req.cmpl_ring = -1; + req.target_id = -1; + + fwctl_rpc_msg.size = sizeof(struct fwctl_rpc); + fwctl_rpc_msg.scope = FWCTL_RPC_DEBUG_READ_ONLY; + fwctl_rpc_msg.in_len = sizeof(req); + fwctl_rpc_msg.out_len = sizeof(resp); + fwctl_rpc_msg.in = (__aligned_u64)&req; + fwctl_rpc_msg.out = (__aligned_u64)&resp; + +An example python3 program that can exercise this interface can be found in +the following git repository: + +https://github.com/Broadcom/fwctl-tools diff --git a/Documentation/userspace-api/fwctl/fwctl.rst b/Documentation/userspace-api/fwctl/fwctl.rst index a74eab8d14c6..826817bfd54d 100644 --- a/Documentation/userspace-api/fwctl/fwctl.rst +++ b/Documentation/userspace-api/fwctl/fwctl.rst @@ -148,6 +148,7 @@ area resulting in clashes will be resolved in favour of a kernel implementation. fwctl User API ============== +.. kernel-doc:: include/uapi/fwctl/bnxt.h .. kernel-doc:: include/uapi/fwctl/fwctl.h .. kernel-doc:: include/uapi/fwctl/mlx5.h .. kernel-doc:: include/uapi/fwctl/pds.h diff --git a/Documentation/userspace-api/fwctl/index.rst b/Documentation/userspace-api/fwctl/index.rst index 316ac456ad3b..8062f7629654 100644 --- a/Documentation/userspace-api/fwctl/index.rst +++ b/Documentation/userspace-api/fwctl/index.rst @@ -10,5 +10,6 @@ to securely construct and execute RPCs inside device firmware. :maxdepth: 1 fwctl + bnxt_fwctl fwctl-cxl pds_fwctl diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst index 13134bccdd39..fd8b78c31f2f 100644 --- a/Documentation/userspace-api/landlock.rst +++ b/Documentation/userspace-api/landlock.rst @@ -8,7 +8,7 @@ Landlock: unprivileged access control ===================================== :Author: Mickaël Salaün -:Date: January 2026 +:Date: March 2026 The goal of Landlock is to enable restriction of ambient rights (e.g. global filesystem or network access) for a set of processes. Because Landlock @@ -77,7 +77,8 @@ to be explicit about the denied-by-default access rights. LANDLOCK_ACCESS_FS_MAKE_SYM | LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_TRUNCATE | - LANDLOCK_ACCESS_FS_IOCTL_DEV, + LANDLOCK_ACCESS_FS_IOCTL_DEV | + LANDLOCK_ACCESS_FS_RESOLVE_UNIX, .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | LANDLOCK_ACCESS_NET_CONNECT_TCP, @@ -127,6 +128,10 @@ version, and only use the available subset of access rights: /* Removes LANDLOCK_SCOPE_* for ABI < 6 */ ruleset_attr.scoped &= ~(LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET | LANDLOCK_SCOPE_SIGNAL); + __attribute__((fallthrough)); + case 6 ... 8: + /* Removes LANDLOCK_ACCESS_FS_RESOLVE_UNIX for ABI < 9 */ + ruleset_attr.handled_access_fs &= ~LANDLOCK_ACCESS_FS_RESOLVE_UNIX; } This enables the creation of an inclusive ruleset that will contain our rules. @@ -197,12 +202,27 @@ similar backwards compatibility check is needed for the restrict flags .. code-block:: c - __u32 restrict_flags = LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON; - if (abi < 7) { - /* Clear logging flags unsupported before ABI 7. */ + __u32 restrict_flags = + LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON | + LANDLOCK_RESTRICT_SELF_TSYNC; + switch (abi) { + case 1 ... 6: + /* Removes logging flags for ABI < 7 */ restrict_flags &= ~(LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF | LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON | LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF); + __attribute__((fallthrough)); + case 7: + /* + * Removes multithreaded enforcement flag for ABI < 8 + * + * WARNING: Without this flag, calling landlock_restrict_self(2) is + * only equivalent if the calling process is single-threaded. Below + * ABI v8 (and as of ABI v8, when not using this flag), a Landlock + * policy would only be enforced for the calling thread and its + * children (and not for all threads, including parents and siblings). + */ + restrict_flags &= ~LANDLOCK_RESTRICT_SELF_TSYNC; } The next step is to restrict the current thread from gaining more privileges @@ -363,8 +383,8 @@ Truncating files The operations covered by ``LANDLOCK_ACCESS_FS_WRITE_FILE`` and ``LANDLOCK_ACCESS_FS_TRUNCATE`` both change the contents of a file and sometimes -overlap in non-intuitive ways. It is recommended to always specify both of -these together. +overlap in non-intuitive ways. It is strongly recommended to always specify +both of these together (either granting both, or granting none). A particularly surprising example is :manpage:`creat(2)`. The name suggests that this system call requires the rights to create and write files. However, @@ -376,6 +396,10 @@ It should also be noted that truncating files does not require the system call, this can also be done through :manpage:`open(2)` with the flags ``O_RDONLY | O_TRUNC``. +At the same time, on some filesystems, :manpage:`fallocate(2)` offers a way to +shorten file contents with ``FALLOC_FL_COLLAPSE_RANGE`` when the file is opened +for writing, sidestepping the ``LANDLOCK_ACCESS_FS_TRUNCATE`` right. + The truncate right is associated with the opened file (see below). Rights associated with file descriptors @@ -685,6 +709,13 @@ enforce Landlock rulesets across all threads of the calling process using the ``LANDLOCK_RESTRICT_SELF_TSYNC`` flag passed to sys_landlock_restrict_self(). +Pathname UNIX sockets (ABI < 9) +------------------------------- + +Starting with the Landlock ABI version 9, it is possible to restrict +connections to pathname UNIX domain sockets (:manpage:`unix(7)`) using +the new ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX`` right. + .. _kernel_support: Kernel support diff --git a/Documentation/userspace-api/media/dvb/legacy_dvb_audio.rst b/Documentation/userspace-api/media/dvb/legacy_dvb_audio.rst index 81b762ef17c4..99ffda355204 100644 --- a/Documentation/userspace-api/media/dvb/legacy_dvb_audio.rst +++ b/Documentation/userspace-api/media/dvb/legacy_dvb_audio.rst @@ -444,7 +444,7 @@ Description ~~~~~~~~~~~ A call to `AUDIO_GET_CAPABILITIES`_ returns an unsigned integer with the -following bits set according to the hardwares capabilities. +following bits set according to the hardware's capabilities. ----- diff --git a/Documentation/userspace-api/media/v4l/subdev-formats.rst b/Documentation/userspace-api/media/v4l/subdev-formats.rst index 896177c5334f..c9999b929773 100644 --- a/Documentation/userspace-api/media/v4l/subdev-formats.rst +++ b/Documentation/userspace-api/media/v4l/subdev-formats.rst @@ -159,14 +159,18 @@ formats in memory (a raw Bayer image won't be magically converted to JPEG just by storing it to memory), there is no one-to-one correspondence between them. -The media bus pixel codes document parallel formats. Should the pixel data be -transported over a serial bus, the media bus pixel code that describes a -parallel format that transfers a sample on a single clock cycle is used. For -instance, both MEDIA_BUS_FMT_BGR888_1X24 and MEDIA_BUS_FMT_BGR888_3X8 are used -on parallel busses for transferring an 8 bits per sample BGR data, whereas on -serial busses the data in this format is only referred to using -MEDIA_BUS_FMT_BGR888_1X24. This is because there is effectively only a single -way to transport that format on the serial busses. +While the media bus pixel codes are named based on how pixels are +transmitted on parallel buses, serial buses do not define separate +codes. By convention, they use the codes that transfer a sample on a +single clock cycle, and whose bit orders from LSB to MSB correspond to +the order in which colour components are transmitted on the serial bus. +For instance, the MIPI CSI-2 24-bit RGB (RGB888) format uses the +MEDIA_BUS_FMT_RGB888_1X24 media bus code because CSI-2 transmits the +blue colour component first, followed by green and red, and +MEDIA_BUS_FMT_RGB888_1X24 defines the first bit of blue at bit 0. +While used for 24-bit RGB data on parallel buses, the +MEDIA_BUS_FMT_RGB888_3X8 or MEDIA_BUS_FMT_BGR888_1X24 codes must not be +used for CSI-2. Packed RGB Formats ^^^^^^^^^^^^^^^^^^ diff --git a/Documentation/userspace-api/rseq.rst b/Documentation/userspace-api/rseq.rst index 3cd27a3c7c7e..8549a6c61531 100644 --- a/Documentation/userspace-api/rseq.rst +++ b/Documentation/userspace-api/rseq.rst @@ -24,6 +24,97 @@ Quick access to CPU number, node ID Allows to implement per CPU data efficiently. Documentation is in code and selftests. :( +Optimized RSEQ V2 +----------------- + +On architectures which utilize the generic entry code and generic TIF bits +the kernel supports runtime optimizations for RSEQ, which also enable +enhanced features like scheduler time slice extensions. + +To enable them a task has to register the RSEQ region with at least the +length advertised by getauxval(AT_RSEQ_FEATURE_SIZE). + +If existing binaries register with RSEQ_ORIG_SIZE (32 bytes), the kernel +keeps the legacy low performance mode enabled to fulfil the expectations +of existing users regarding the original RSEQ implementation behaviour. + +The following table documents the ABI and behavioral guarantees of the +legacy and the optimized V2 mode. + +.. list-table:: RSEQ modes + :header-rows: 1 + + * - Nr + - What + + - Legacy + - Optimized V2 + + * - 1 + - The cpu_id_start, cpu_id, node_id and mm_cid fields (User mode read + only) + .. Legacy + - Updated by the kernel unconditionally after each context switch and + before signal delivery + .. Optimized V2 + - Updated by the kernel if and only if they change, i.e. if the task + is migrated or mm_cid changes + + * - 2 + - The rseq_cs critical section field + .. Legacy + - Evaluated and handled unconditionally after each context switch and + before signal delivery + .. Optimized V2 + - Evaluated and handled conditionally only when user space was + interrupted and was scheduled out or before delivering a signal in + the interrupted context. + + * - 3 + - Read only fields + .. Legacy + - No strict enforcement except in debug mode + .. Optimized V2 + - Strict enforcement + + * - 4 + - membarrier(...RSEQ) + .. Legacy + - All running threads of the process are interrupted and the ID fields + are rewritten and eventually active critical sections are aborted + before they return to user space. All threads which are scheduled + out whether voluntary or not are covered by #1/#2 above. + .. Optimized V2 + - All running threads of the process are interrupted and eventually + active critical sections are aborted before these threads return to + user space. The ID fields are only updated if changed as a + consequence of the interrupt. All threads which are scheduled out + whether voluntary or not are covered by #1/#2 above. + + * - 5 + - Time slice extensions + .. Legacy + - Not supported + .. Optimized V2 + - Supported + +The legacy mode is obviously less performant as it does unconditional +updates and critical section checks even if not strictly required by the +ABI contract. That can't be changed anymore as some users depend on that +observed behavior, which in turn enables them to violate the ABI and +overwrite the cpu_id_start field for their own purposes. This is obviously +discouraged as it renders RSEQ incompatible with the intended usage and +breaks the expectation of other libraries in the same application. + +The ABI compliant optimized v2 mode, which respects the read only fields, +does not require unconditional updates and therefore is way more +performant. The kernel validates the read only fields for compliance. If +user space modifies them, the process is killed. Compliant usage allows +multiple libraries in the same application to benefit from the RSEQ +functionality without disturbing each other. The ABI compliant optimized v2 +mode also enables extended RSEQ features like time slice extensions. + + Scheduler time slice extensions ------------------------------- @@ -37,7 +128,8 @@ The prerequisites for this functionality are: * Enabled at boot time (default is enabled) - * A rseq userspace pointer has been registered for the thread + * A rseq userspace pointer has been registered for the thread in + optimized V2 mode The thread has to enable the functionality via prctl(2):: |
