aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/admin-guide
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst7
-rw-r--r--Documentation/admin-guide/cifs/usage.rst2
-rw-r--r--Documentation/admin-guide/device-mapper/dm-crypt.rst15
-rw-r--r--Documentation/admin-guide/hw-vuln/srso.rst69
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt4
-rw-r--r--Documentation/admin-guide/perf/arm-ni.rst17
-rw-r--r--Documentation/admin-guide/perf/dwc_pcie_pmu.rst16
-rw-r--r--Documentation/admin-guide/perf/hisi-pcie-pmu.rst4
-rw-r--r--Documentation/admin-guide/perf/index.rst1
-rw-r--r--Documentation/admin-guide/pm/amd-pstate.rst15
10 files changed, 126 insertions, 24 deletions
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 86311c2907cd..95c18bc17083 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1717,9 +1717,10 @@ The following nested keys are defined.
entries fault back in or are written out to disk.
memory.zswap.writeback
- A read-write single value file. The default value is "1". The
- initial value of the root cgroup is 1, and when a new cgroup is
- created, it inherits the current value of its parent.
+ A read-write single value file. The default value is "1".
+ Note that this setting is hierarchical, i.e. the writeback would be
+ implicitly disabled for child cgroups if the upper hierarchy
+ does so.
When this is set to 0, all swapping attempts to swapping devices
are disabled. This included both zswap writebacks, and swapping due
diff --git a/Documentation/admin-guide/cifs/usage.rst b/Documentation/admin-guide/cifs/usage.rst
index fd4b56c0996f..c09674a75a9e 100644
--- a/Documentation/admin-guide/cifs/usage.rst
+++ b/Documentation/admin-guide/cifs/usage.rst
@@ -742,7 +742,7 @@ SecurityFlags Flags which control security negotiation and
may use NTLMSSP 0x00080
must use NTLMSSP 0x80080
seal (packet encryption) 0x00040
- must seal (not implemented yet) 0x40040
+ must seal 0x40040
cifsFYI If set to non-zero value, additional debug information
will be logged to the system error log. This field
diff --git a/Documentation/admin-guide/device-mapper/dm-crypt.rst b/Documentation/admin-guide/device-mapper/dm-crypt.rst
index e625830d335e..552c9155165d 100644
--- a/Documentation/admin-guide/device-mapper/dm-crypt.rst
+++ b/Documentation/admin-guide/device-mapper/dm-crypt.rst
@@ -162,13 +162,14 @@ iv_large_sectors
Module parameters::
-max_read_size
-max_write_size
- Maximum size of read or write requests. When a request larger than this size
- is received, dm-crypt will split the request. The splitting improves
- concurrency (the split requests could be encrypted in parallel by multiple
- cores), but it also causes overhead. The user should tune these parameters to
- fit the actual workload.
+
+ max_read_size
+ max_write_size
+ Maximum size of read or write requests. When a request larger than this size
+ is received, dm-crypt will split the request. The splitting improves
+ concurrency (the split requests could be encrypted in parallel by multiple
+ cores), but it also causes overhead. The user should tune these parameters to
+ fit the actual workload.
Example scripts
diff --git a/Documentation/admin-guide/hw-vuln/srso.rst b/Documentation/admin-guide/hw-vuln/srso.rst
index 4bd3ce3ba171..2ad1c05b8c88 100644
--- a/Documentation/admin-guide/hw-vuln/srso.rst
+++ b/Documentation/admin-guide/hw-vuln/srso.rst
@@ -158,3 +158,72 @@ poisoned BTB entry and using that safe one for all function returns.
In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().
+
+Checking the safe RET mitigation actually works
+-----------------------------------------------
+
+In case one wants to validate whether the SRSO safe RET mitigation works
+on a kernel, one could use two performance counters
+
+* PMC_0xc8 - Count of RET/RET lw retired
+* PMC_0xc9 - Count of RET/RET lw retired mispredicted
+
+and compare the number of RETs retired properly vs those retired
+mispredicted, in kernel mode. Another way of specifying those events
+is::
+
+ # perf list ex_ret_near_ret
+
+ List of pre-defined events (to be used in -e or -M):
+
+ core:
+ ex_ret_near_ret
+ [Retired Near Returns]
+ ex_ret_near_ret_mispred
+ [Retired Near Returns Mispredicted]
+
+Either the command using the event mnemonics::
+
+ # perf stat -e ex_ret_near_ret:k -e ex_ret_near_ret_mispred:k sleep 10s
+
+or using the raw PMC numbers::
+
+ # perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
+
+should give the same amount. I.e., every RET retired should be
+mispredicted::
+
+ [root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
+
+ Performance counter stats for 'sleep 10s':
+
+ 137,167 cpu/event=0xc8,umask=0/k
+ 137,173 cpu/event=0xc9,umask=0/k
+
+ 10.004110303 seconds time elapsed
+
+ 0.000000000 seconds user
+ 0.004462000 seconds sys
+
+vs the case when the mitigation is disabled (spec_rstack_overflow=off)
+or not functioning properly, showing usually a lot smaller number of
+mispredicted retired RETs vs the overall count of retired RETs during
+a workload::
+
+ [root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
+
+ Performance counter stats for 'sleep 10s':
+
+ 201,627 cpu/event=0xc8,umask=0/k
+ 4,074 cpu/event=0xc9,umask=0/k
+
+ 10.003267252 seconds time elapsed
+
+ 0.002729000 seconds user
+ 0.000000000 seconds sys
+
+Also, there is a selftest which performs the above, go to
+tools/testing/selftests/x86/ and do::
+
+ make srso
+ ./srso
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8396e015aab3..be010fec7654 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4808,11 +4808,9 @@
profile= [KNL] Enable kernel profiling via /proc/profile
Format: [<profiletype>,]<number>
- Param: <profiletype>: "schedule", "sleep", or "kvm"
+ Param: <profiletype>: "schedule" or "kvm"
[defaults to kernel profiling]
Param: "schedule" - profile schedule points.
- Param: "sleep" - profile D-state sleeping (millisecs).
- Requires CONFIG_SCHEDSTATS
Param: "kvm" - profile VM exits.
Param: <number> - step/bucket size as a power of 2 for
statistical time based profiling.
diff --git a/Documentation/admin-guide/perf/arm-ni.rst b/Documentation/admin-guide/perf/arm-ni.rst
new file mode 100644
index 000000000000..d26a8f697c36
--- /dev/null
+++ b/Documentation/admin-guide/perf/arm-ni.rst
@@ -0,0 +1,17 @@
+====================================
+Arm Network-on Chip Interconnect PMU
+====================================
+
+NI-700 and friends implement a distinct PMU for each clock domain within the
+interconnect. Correspondingly, the driver exposes multiple PMU devices named
+arm_ni_<x>_cd_<y>, where <x> is an (arbitrary) instance identifier and <y> is
+the clock domain ID within that particular instance. If multiple NI instances
+exist within a system, the PMU devices can be correlated with the underlying
+hardware instance via sysfs parentage.
+
+Each PMU exposes base event aliases for the interface types present in its clock
+domain. These require qualifying with the "eventid" and "nodeid" parameters
+to specify the event code to count and the interface at which to count it
+(per the configured hardware ID as reflected in the xxNI_NODE_INFO register).
+The exception is the "cycles" alias for the PMU cycle counter, which is encoded
+with the PMU node type and needs no further qualification.
diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
index d47cd229d710..39b8e1fdd0cd 100644
--- a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
+++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
@@ -46,16 +46,16 @@ Some of the events only exist for specific configurations.
DesignWare Cores (DWC) PCIe PMU Driver
=======================================
-This driver adds PMU devices for each PCIe Root Port named based on the BDF of
+This driver adds PMU devices for each PCIe Root Port named based on the SBDF of
the Root Port. For example,
- 30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
+ 0001:30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
-the PMU device name for this Root Port is dwc_rootport_3018.
+the PMU device name for this Root Port is dwc_rootport_13018.
The DWC PCIe PMU driver registers a perf PMU driver, which provides
description of available events and configuration options in sysfs, see
-/sys/bus/event_source/devices/dwc_rootport_{bdf}.
+/sys/bus/event_source/devices/dwc_rootport_{sbdf}.
The "format" directory describes format of the config fields of the
perf_event_attr structure. The "events" directory provides configuration
@@ -66,16 +66,16 @@ The "perf list" command shall list the available events from sysfs, e.g.::
$# perf list | grep dwc_rootport
<...>
- dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/ [Kernel PMU event]
+ dwc_rootport_13018/Rx_PCIe_TLP_Data_Payload/ [Kernel PMU event]
<...>
- dwc_rootport_3018/rx_memory_read,lane=?/ [Kernel PMU event]
+ dwc_rootport_13018/rx_memory_read,lane=?/ [Kernel PMU event]
Time Based Analysis Event Usage
-------------------------------
Example usage of counting PCIe RX TLP data payload (Units of bytes)::
- $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
+ $# perf stat -a -e dwc_rootport_13018/Rx_PCIe_TLP_Data_Payload/
The average RX/TX bandwidth can be calculated using the following formula:
@@ -88,7 +88,7 @@ Lane Event Usage
Each lane has the same event set and to avoid generating a list of hundreds
of events, the user need to specify the lane ID explicitly, e.g.::
- $# perf stat -a -e dwc_rootport_3018/rx_memory_read,lane=4/
+ $# perf stat -a -e dwc_rootport_13018/rx_memory_read,lane=4/
The driver does not support sampling, therefore "perf record" will not
work. Per-task (without "-a") perf sessions are not supported.
diff --git a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst
index 5541ff40e06a..083ca50de896 100644
--- a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst
+++ b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst
@@ -28,7 +28,9 @@ The "identifier" sysfs file allows users to identify the version of the
PMU hardware device.
The "bus" sysfs file allows users to get the bus number of Root Ports
-monitored by PMU.
+monitored by PMU. Furthermore users can get the Root Ports range in
+[bdf_min, bdf_max] from "bdf_min" and "bdf_max" sysfs attributes
+respectively.
Example usage of perf::
diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index 7eb3dcd6f4da..8502bc174640 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -16,6 +16,7 @@ Performance monitor support
starfive_starlink_pmu
arm-ccn
arm-cmn
+ arm-ni
xgene-pmu
arm_dsu_pmu
thunderx2-pmu
diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst
index d0324d44f548..210a808b74ec 100644
--- a/Documentation/admin-guide/pm/amd-pstate.rst
+++ b/Documentation/admin-guide/pm/amd-pstate.rst
@@ -251,7 +251,9 @@ performance supported in `AMD CPPC Performance Capability <perf_cap_>`_).
In some ASICs, the highest CPPC performance is not the one in the ``_CPC``
table, so we need to expose it to sysfs. If boost is not active, but
still supported, this maximum frequency will be larger than the one in
-``cpuinfo``.
+``cpuinfo``. On systems that support preferred core, the driver will have
+different values for some cores than others and this will reflect the values
+advertised by the platform at bootup.
This attribute is read-only.
``amd_pstate_lowest_nonlinear_freq``
@@ -262,6 +264,17 @@ lowest non-linear performance in `AMD CPPC Performance Capability
<perf_cap_>`_.)
This attribute is read-only.
+``amd_pstate_hw_prefcore``
+
+Whether the platform supports the preferred core feature and it has been
+enabled. This attribute is read-only.
+
+``amd_pstate_prefcore_ranking``
+
+The performance ranking of the core. This number doesn't have any unit, but
+larger numbers are preferred at the time of reading. This can change at
+runtime based on platform conditions. This attribute is read-only.
+
``energy_performance_available_preferences``
A list of all the supported EPP preferences that could be used for