summaryrefslogtreecommitdiffstats
path: root/Documentation/userspace-api
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/userspace-api')
-rw-r--r--Documentation/userspace-api/dma-buf-heaps.rst7
-rw-r--r--Documentation/userspace-api/fwctl/bnxt_fwctl.rst74
-rw-r--r--Documentation/userspace-api/fwctl/fwctl.rst1
-rw-r--r--Documentation/userspace-api/fwctl/index.rst1
-rw-r--r--Documentation/userspace-api/landlock.rst45
-rw-r--r--Documentation/userspace-api/media/dvb/legacy_dvb_audio.rst2
-rw-r--r--Documentation/userspace-api/media/v4l/subdev-formats.rst20
-rw-r--r--Documentation/userspace-api/rseq.rst94
8 files changed, 227 insertions, 17 deletions
diff --git a/Documentation/userspace-api/dma-buf-heaps.rst b/Documentation/userspace-api/dma-buf-heaps.rst
index 05445c83b79a..f56b743cdb36 100644
--- a/Documentation/userspace-api/dma-buf-heaps.rst
+++ b/Documentation/userspace-api/dma-buf-heaps.rst
@@ -16,6 +16,13 @@ following heaps:
- The ``system`` heap allocates virtually contiguous, cacheable, buffers.
+ - The ``system_cc_shared`` heap allocates virtually contiguous, cacheable,
+ buffers using shared (decrypted) memory. It is only present on
+ confidential computing (CoCo) VMs where memory encryption is active
+ (e.g., AMD SEV, Intel TDX). The allocated pages have the encryption
+ bit cleared, making them accessible for device DMA without TDISP
+ support. On non-CoCo VM configurations, this heap is not registered.
+
- The ``default_cma_region`` heap allocates physically contiguous,
cacheable, buffers. Only present if a CMA region is present. Such a
region is usually created either through the kernel commandline
diff --git a/Documentation/userspace-api/fwctl/bnxt_fwctl.rst b/Documentation/userspace-api/fwctl/bnxt_fwctl.rst
new file mode 100644
index 000000000000..97c9b095cf21
--- /dev/null
+++ b/Documentation/userspace-api/fwctl/bnxt_fwctl.rst
@@ -0,0 +1,74 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+fwctl bnxt driver
+=================
+
+:Author: Pavan Chebbi
+
+Overview
+========
+
+BNXT driver makes a fwctl service available through an auxiliary_device.
+The bnxt_fwctl driver binds to this device and registers itself with the
+fwctl subsystem.
+
+The bnxt_fwctl driver is agnostic to the device firmware internals. It
+uses the Upper Layer Protocol (ULP) conduit provided by bnxt to send
+HardWare Resource Manager (HWRM) commands to firmware.
+
+These commands can query or change firmware driven device configurations
+and read/write registers that are useful for debugging.
+
+bnxt_fwctl User API
+===================
+
+Each RPC request contains the HWRM input structure in the fwctl_rpc
+'in' buffer while 'out' will contain the response.
+
+A typical user application can send a FWCTL_INFO command using ioctl()
+to discover bnxt_fwctl's RPC capabilities as shown below:
+
+ ioctl(fd, FWCTL_INFO, &fwctl_info_msg);
+
+where fwctl_info_msg (of type struct fwctl_info) describes bnxt_info_msg
+(of type struct fwctl_info_bnxt). fwctl_info_msg is set up as follows:
+
+ size = sizeof(struct fwctl_info);
+ flags = 0;
+ device_data_len = sizeof(bnxt_info_msg);
+ out_device_data = (__aligned_u64)&bnxt_info_msg;
+
+The uctx_caps of bnxt_info_msg represents the capabilities as described
+in fwctl_bnxt_commands of include/uapi/fwctl/bnxt.h
+
+The FW RPC itself, FWCTL_RPC can be sent using ioctl() as:
+
+ ioctl(fd, FWCTL_RPC, &fwctl_rpc_msg);
+
+where fwctl_rpc_msg (of type struct fwctl_rpc) carries the HWRM command
+in its 'in' buffer. The HWRM input structures are described in
+include/linux/bnxt/hsi.h. An example for HWRM_VER_GET is shown below:
+
+ struct hwrm_ver_get_output resp;
+ struct fwctl_rpc fwctl_rpc_msg;
+ struct hwrm_ver_get_input req;
+
+ req.req_type = HWRM_VER_GET;
+ req.hwrm_intf_maj = HWRM_VERSION_MAJOR;
+ req.hwrm_intf_min = HWRM_VERSION_MINOR;
+ req.hwrm_intf_upd = HWRM_VERSION_UPDATE;
+ req.cmpl_ring = -1;
+ req.target_id = -1;
+
+ fwctl_rpc_msg.size = sizeof(struct fwctl_rpc);
+ fwctl_rpc_msg.scope = FWCTL_RPC_DEBUG_READ_ONLY;
+ fwctl_rpc_msg.in_len = sizeof(req);
+ fwctl_rpc_msg.out_len = sizeof(resp);
+ fwctl_rpc_msg.in = (__aligned_u64)&req;
+ fwctl_rpc_msg.out = (__aligned_u64)&resp;
+
+An example python3 program that can exercise this interface can be found in
+the following git repository:
+
+https://github.com/Broadcom/fwctl-tools
diff --git a/Documentation/userspace-api/fwctl/fwctl.rst b/Documentation/userspace-api/fwctl/fwctl.rst
index a74eab8d14c6..826817bfd54d 100644
--- a/Documentation/userspace-api/fwctl/fwctl.rst
+++ b/Documentation/userspace-api/fwctl/fwctl.rst
@@ -148,6 +148,7 @@ area resulting in clashes will be resolved in favour of a kernel implementation.
fwctl User API
==============
+.. kernel-doc:: include/uapi/fwctl/bnxt.h
.. kernel-doc:: include/uapi/fwctl/fwctl.h
.. kernel-doc:: include/uapi/fwctl/mlx5.h
.. kernel-doc:: include/uapi/fwctl/pds.h
diff --git a/Documentation/userspace-api/fwctl/index.rst b/Documentation/userspace-api/fwctl/index.rst
index 316ac456ad3b..8062f7629654 100644
--- a/Documentation/userspace-api/fwctl/index.rst
+++ b/Documentation/userspace-api/fwctl/index.rst
@@ -10,5 +10,6 @@ to securely construct and execute RPCs inside device firmware.
:maxdepth: 1
fwctl
+ bnxt_fwctl
fwctl-cxl
pds_fwctl
diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
index 13134bccdd39..fd8b78c31f2f 100644
--- a/Documentation/userspace-api/landlock.rst
+++ b/Documentation/userspace-api/landlock.rst
@@ -8,7 +8,7 @@ Landlock: unprivileged access control
=====================================
:Author: Mickaël Salaün
-:Date: January 2026
+:Date: March 2026
The goal of Landlock is to enable restriction of ambient rights (e.g. global
filesystem or network access) for a set of processes. Because Landlock
@@ -77,7 +77,8 @@ to be explicit about the denied-by-default access rights.
LANDLOCK_ACCESS_FS_MAKE_SYM |
LANDLOCK_ACCESS_FS_REFER |
LANDLOCK_ACCESS_FS_TRUNCATE |
- LANDLOCK_ACCESS_FS_IOCTL_DEV,
+ LANDLOCK_ACCESS_FS_IOCTL_DEV |
+ LANDLOCK_ACCESS_FS_RESOLVE_UNIX,
.handled_access_net =
LANDLOCK_ACCESS_NET_BIND_TCP |
LANDLOCK_ACCESS_NET_CONNECT_TCP,
@@ -127,6 +128,10 @@ version, and only use the available subset of access rights:
/* Removes LANDLOCK_SCOPE_* for ABI < 6 */
ruleset_attr.scoped &= ~(LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
LANDLOCK_SCOPE_SIGNAL);
+ __attribute__((fallthrough));
+ case 6 ... 8:
+ /* Removes LANDLOCK_ACCESS_FS_RESOLVE_UNIX for ABI < 9 */
+ ruleset_attr.handled_access_fs &= ~LANDLOCK_ACCESS_FS_RESOLVE_UNIX;
}
This enables the creation of an inclusive ruleset that will contain our rules.
@@ -197,12 +202,27 @@ similar backwards compatibility check is needed for the restrict flags
.. code-block:: c
- __u32 restrict_flags = LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON;
- if (abi < 7) {
- /* Clear logging flags unsupported before ABI 7. */
+ __u32 restrict_flags =
+ LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON |
+ LANDLOCK_RESTRICT_SELF_TSYNC;
+ switch (abi) {
+ case 1 ... 6:
+ /* Removes logging flags for ABI < 7 */
restrict_flags &= ~(LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF |
LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON |
LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF);
+ __attribute__((fallthrough));
+ case 7:
+ /*
+ * Removes multithreaded enforcement flag for ABI < 8
+ *
+ * WARNING: Without this flag, calling landlock_restrict_self(2) is
+ * only equivalent if the calling process is single-threaded. Below
+ * ABI v8 (and as of ABI v8, when not using this flag), a Landlock
+ * policy would only be enforced for the calling thread and its
+ * children (and not for all threads, including parents and siblings).
+ */
+ restrict_flags &= ~LANDLOCK_RESTRICT_SELF_TSYNC;
}
The next step is to restrict the current thread from gaining more privileges
@@ -363,8 +383,8 @@ Truncating files
The operations covered by ``LANDLOCK_ACCESS_FS_WRITE_FILE`` and
``LANDLOCK_ACCESS_FS_TRUNCATE`` both change the contents of a file and sometimes
-overlap in non-intuitive ways. It is recommended to always specify both of
-these together.
+overlap in non-intuitive ways. It is strongly recommended to always specify
+both of these together (either granting both, or granting none).
A particularly surprising example is :manpage:`creat(2)`. The name suggests
that this system call requires the rights to create and write files. However,
@@ -376,6 +396,10 @@ It should also be noted that truncating files does not require the
system call, this can also be done through :manpage:`open(2)` with the flags
``O_RDONLY | O_TRUNC``.
+At the same time, on some filesystems, :manpage:`fallocate(2)` offers a way to
+shorten file contents with ``FALLOC_FL_COLLAPSE_RANGE`` when the file is opened
+for writing, sidestepping the ``LANDLOCK_ACCESS_FS_TRUNCATE`` right.
+
The truncate right is associated with the opened file (see below).
Rights associated with file descriptors
@@ -685,6 +709,13 @@ enforce Landlock rulesets across all threads of the calling process
using the ``LANDLOCK_RESTRICT_SELF_TSYNC`` flag passed to
sys_landlock_restrict_self().
+Pathname UNIX sockets (ABI < 9)
+-------------------------------
+
+Starting with the Landlock ABI version 9, it is possible to restrict
+connections to pathname UNIX domain sockets (:manpage:`unix(7)`) using
+the new ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX`` right.
+
.. _kernel_support:
Kernel support
diff --git a/Documentation/userspace-api/media/dvb/legacy_dvb_audio.rst b/Documentation/userspace-api/media/dvb/legacy_dvb_audio.rst
index 81b762ef17c4..99ffda355204 100644
--- a/Documentation/userspace-api/media/dvb/legacy_dvb_audio.rst
+++ b/Documentation/userspace-api/media/dvb/legacy_dvb_audio.rst
@@ -444,7 +444,7 @@ Description
~~~~~~~~~~~
A call to `AUDIO_GET_CAPABILITIES`_ returns an unsigned integer with the
-following bits set according to the hardwares capabilities.
+following bits set according to the hardware's capabilities.
-----
diff --git a/Documentation/userspace-api/media/v4l/subdev-formats.rst b/Documentation/userspace-api/media/v4l/subdev-formats.rst
index 896177c5334f..c9999b929773 100644
--- a/Documentation/userspace-api/media/v4l/subdev-formats.rst
+++ b/Documentation/userspace-api/media/v4l/subdev-formats.rst
@@ -159,14 +159,18 @@ formats in memory (a raw Bayer image won't be magically converted to
JPEG just by storing it to memory), there is no one-to-one
correspondence between them.
-The media bus pixel codes document parallel formats. Should the pixel data be
-transported over a serial bus, the media bus pixel code that describes a
-parallel format that transfers a sample on a single clock cycle is used. For
-instance, both MEDIA_BUS_FMT_BGR888_1X24 and MEDIA_BUS_FMT_BGR888_3X8 are used
-on parallel busses for transferring an 8 bits per sample BGR data, whereas on
-serial busses the data in this format is only referred to using
-MEDIA_BUS_FMT_BGR888_1X24. This is because there is effectively only a single
-way to transport that format on the serial busses.
+While the media bus pixel codes are named based on how pixels are
+transmitted on parallel buses, serial buses do not define separate
+codes. By convention, they use the codes that transfer a sample on a
+single clock cycle, and whose bit orders from LSB to MSB correspond to
+the order in which colour components are transmitted on the serial bus.
+For instance, the MIPI CSI-2 24-bit RGB (RGB888) format uses the
+MEDIA_BUS_FMT_RGB888_1X24 media bus code because CSI-2 transmits the
+blue colour component first, followed by green and red, and
+MEDIA_BUS_FMT_RGB888_1X24 defines the first bit of blue at bit 0.
+While used for 24-bit RGB data on parallel buses, the
+MEDIA_BUS_FMT_RGB888_3X8 or MEDIA_BUS_FMT_BGR888_1X24 codes must not be
+used for CSI-2.
Packed RGB Formats
^^^^^^^^^^^^^^^^^^
diff --git a/Documentation/userspace-api/rseq.rst b/Documentation/userspace-api/rseq.rst
index 3cd27a3c7c7e..8549a6c61531 100644
--- a/Documentation/userspace-api/rseq.rst
+++ b/Documentation/userspace-api/rseq.rst
@@ -24,6 +24,97 @@ Quick access to CPU number, node ID
Allows to implement per CPU data efficiently. Documentation is in code and
selftests. :(
+Optimized RSEQ V2
+-----------------
+
+On architectures which utilize the generic entry code and generic TIF bits
+the kernel supports runtime optimizations for RSEQ, which also enable
+enhanced features like scheduler time slice extensions.
+
+To enable them a task has to register the RSEQ region with at least the
+length advertised by getauxval(AT_RSEQ_FEATURE_SIZE).
+
+If existing binaries register with RSEQ_ORIG_SIZE (32 bytes), the kernel
+keeps the legacy low performance mode enabled to fulfil the expectations
+of existing users regarding the original RSEQ implementation behaviour.
+
+The following table documents the ABI and behavioral guarantees of the
+legacy and the optimized V2 mode.
+
+.. list-table:: RSEQ modes
+ :header-rows: 1
+
+ * - Nr
+ - What
+
+ - Legacy
+ - Optimized V2
+
+ * - 1
+ - The cpu_id_start, cpu_id, node_id and mm_cid fields (User mode read
+ only)
+ .. Legacy
+ - Updated by the kernel unconditionally after each context switch and
+ before signal delivery
+ .. Optimized V2
+ - Updated by the kernel if and only if they change, i.e. if the task
+ is migrated or mm_cid changes
+
+ * - 2
+ - The rseq_cs critical section field
+ .. Legacy
+ - Evaluated and handled unconditionally after each context switch and
+ before signal delivery
+ .. Optimized V2
+ - Evaluated and handled conditionally only when user space was
+ interrupted and was scheduled out or before delivering a signal in
+ the interrupted context.
+
+ * - 3
+ - Read only fields
+ .. Legacy
+ - No strict enforcement except in debug mode
+ .. Optimized V2
+ - Strict enforcement
+
+ * - 4
+ - membarrier(...RSEQ)
+ .. Legacy
+ - All running threads of the process are interrupted and the ID fields
+ are rewritten and eventually active critical sections are aborted
+ before they return to user space. All threads which are scheduled
+ out whether voluntary or not are covered by #1/#2 above.
+ .. Optimized V2
+ - All running threads of the process are interrupted and eventually
+ active critical sections are aborted before these threads return to
+ user space. The ID fields are only updated if changed as a
+ consequence of the interrupt. All threads which are scheduled out
+ whether voluntary or not are covered by #1/#2 above.
+
+ * - 5
+ - Time slice extensions
+ .. Legacy
+ - Not supported
+ .. Optimized V2
+ - Supported
+
+The legacy mode is obviously less performant as it does unconditional
+updates and critical section checks even if not strictly required by the
+ABI contract. That can't be changed anymore as some users depend on that
+observed behavior, which in turn enables them to violate the ABI and
+overwrite the cpu_id_start field for their own purposes. This is obviously
+discouraged as it renders RSEQ incompatible with the intended usage and
+breaks the expectation of other libraries in the same application.
+
+The ABI compliant optimized v2 mode, which respects the read only fields,
+does not require unconditional updates and therefore is way more
+performant. The kernel validates the read only fields for compliance. If
+user space modifies them, the process is killed. Compliant usage allows
+multiple libraries in the same application to benefit from the RSEQ
+functionality without disturbing each other. The ABI compliant optimized v2
+mode also enables extended RSEQ features like time slice extensions.
+
+
Scheduler time slice extensions
-------------------------------
@@ -37,7 +128,8 @@ The prerequisites for this functionality are:
* Enabled at boot time (default is enabled)
- * A rseq userspace pointer has been registered for the thread
+ * A rseq userspace pointer has been registered for the thread in
+ optimized V2 mode
The thread has to enable the functionality via prctl(2)::