| Age | Commit message (Collapse) | Author | Files | Lines |
|
Convert pbus_size_io() to use pbus_select_window_for_type().
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-17-ilpo.jarvinen@linux.intel.com
|
|
Prior to a BAR resize, __resource_resize_store() loops through the normal
resources of the PCI device and releases those that match to the flags of
the BAR to be resized. This is necessary to allow resizing also the
upstream bridge window as only childless bridge windows can be resized.
While the flags check (mostly) works (if corner cases are ignored), the
more straightforward way is to check if the resources share the bridge
window. Change __resource_resize_store() to do the check using
pbus_select_window().
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-16-ilpo.jarvinen@linux.intel.com
|
|
BAR resizing calls to pci_reassign_bridge_resources(), which attempts to
release any upstream bridge window to allow them to accommodate the new BAR
size. The release can only be performed if there are no other child
resources for the bridge window. Previously the code continued silently
when other child resources were detected.
Add pci_warn() to inform user that a bridge window could not be released
because of child resources. As a small bridge window is often the reason
why BAR resize fails, this warning will help to pinpoint to the cause.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-15-ilpo.jarvinen@linux.intel.com
|
|
pci_reassign_bridge_resources() walks upwards in the PCI bus hierarchy,
locates the relevant bridge window on each level using flags check, and
attempts to release the bridge window. The flags-based check is fragile due
to various fallbacks in the bridge window selection logic. As such, the
algorithm might not locate the correct bridge window.
Refactor pci_reassign_bridge_resources() to determine the correct bridge
window using pbus_select_window(), which contains logic to handle all
fallback cases correctly. Change function prefix to pbus as it now inputs
struct bus and resource for which to locate the bridge window.
The main purpose is to make bridge window selection logic consistent across
the entire PCI core (one step at a time). While this technically also fixes
the commit 8bb705e3e79d ("PCI: Add pci_resize_resource() for resizing
BARs") making the bridge window walk algorithm more robust, the normal
setup having a 64-bit resizable BAR underneath bridge(s) with 64-bit
prefetchable windows does not need to use any fallbacks. As such, the
practical impact is low (requiring BAR resize use case and a non-typical
bridge device).
The way to detect if unrelated resource failed again is left to use the
type based approximation which should not behave worse than before.
Fixes: 8bb705e3e79d ("PCI: Add pci_resize_resource() for resizing BARs")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-14-ilpo.jarvinen@linux.intel.com
|
|
Various places in the PCI core code independently decide into which bridge
window a child resource should be placed. It is hard to see whether these
decisions always end up in agreement, especially in the corner cases, and
in some places it requires complex logic to pass multiple resource types
and/or bridge windows around.
Add pbus_select_window() and pbus_select_window_for_type() for cases where
the former cannot be used so that eventually the same helper can be used to
select the bridge window everywhere. Using the same function ensures the
selected bridge window remains always the same and it can be easily
recalculated in-situ allowing simplifying the interfaces between internal
functions in upcoming changes.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-13-ilpo.jarvinen@linux.intel.com
|
|
include/linux/pci.h provides PCI_BRIDGE_{IO,MEM,PREF_MEM}_WINDOW defines,
however, they're based on the resource array indexing in the pci_dev
struct. The struct pci_bus also has pointers to those same resources but
they start from zeroth index.
Add PCI_BUS_BRIDGE_{IO,MEM,PREF_MEM}_WINDOW defines to get rid of literal
indexing.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-12-ilpo.jarvinen@linux.intel.com
|
|
When a bridge window is found unused or fails to assign, the flags of the
associated resource are cleared. Clearing flags is problematic as it also
removes the type information of the resource which is needed later.
Thus, always preserve the bridge window type flags and use IORESOURCE_UNSET
and IORESOURCE_DISABLED to indicate the status of the bridge window. Also,
when initializing resources, make sure all valid bridge windows do get
their type flags set.
Change various places that relied on resource flags being cleared to check
for IORESOURCE_UNSET and IORESOURCE_DISABLED to allow bridge window
resource to retain their type flags. Add pdev_resource_assignable() and
pdev_resource_should_fit() helpers to filter out disabled bridge windows
during resource fitting; the latter combines more common checks into the
helper.
When reading the bridge windows from the registers, instead of leaving the
resource flags cleared for bridge windows that are not enabled, always
set up the flags and set IORESOURCE_UNSET | IORESOURCE_DISABLED as needed.
When resource fitting or assignment fails for a bridge window resource, or
the bridge window is not needed, mark the resource with IORESOURCE_UNSET or
IORESOURCE_DISABLED, respectively.
Use dummy zero resource in resource_show() for backwards compatibility as
lspci will otherwise misrepresent disabled bridge windows.
This change fixes an issue which highlights the importance of keeping the
resource type flags intact:
At the end of __assign_resources_sorted(), reset_resource() is called,
previously clearing the flags. Later, pci_prepare_next_assign_round()
attempted to release bridge resources using
pci_bus_release_bridge_resources() that calls into
pci_bridge_release_resources() that assumes type flags are still present.
As type flags were cleared, IORESOURCE_MEM_64 was not set leading to
resources under an incorrect bridge window to be released (idx = 1
instead of idx = 2). While the assignments performed later covered this
problem so that the wrongly released resources got assigned in the end,
it was still causing extra release+assign pairs.
There are other reasons why the resource flags should be retained in
upcoming changes too.
Removing the flag reset for non-bridge window resource is left as future
work, in part because it has a much higher regression potential due to
pci_enable_resources() that will start to work also for those resources
then and due to what endpoint drivers might assume about resources.
Despite the Fixes tag, backporting this (at least any time soon) is highly
discouraged. The issue fixed is borderline cosmetic as the later
assignments normally cover the problem entirely. Also there might be
non-obvious dependencies.
Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-11-ilpo.jarvinen@linux.intel.com
|
|
A normal PCI bridge has multiple bridge windows and not all of them are
always required by devices underneath the bridge. If a Root Port or bridge
does not have a device underneath, no bridge windows get assigned. Yet,
pci_enable_resources() is set to fail indiscriminantly on any resource
assignment failure if the resource is not known to be optional.
In practice, the code in pci_enable_resources() is currently largely
dormant. The kernel sets resource flags to zero for any unused bridge
window and resets flags to zero in case of an resource assignment failure,
which short-circuits pci_enable_resources() because of this check:
if (!(r->flags & (IORESOURCE_IO | IORESOURCE_MEM)))
continue;
However, an upcoming change to resource flags will alter how bridge window
resource flags behave activating these long dormants checks in
pci_enable_resources().
While complex logic could be built to selectively enable a bridge only
under some conditions, a few versions of such logic were tried during
development of this change and none of them worked satisfactorily. Thus, I
just gave up and decided to enable any bridge regardless of the bridge
windows as there seems to be no clear benefit from not enabling it, but a
major downside as pcieport will not be probed for the bridge if it's not
enabled.
Therefore, change pci_enable_resources() to not check if bridge window
resources remain unassigned. Resource assignment failures are pretty noisy
already so there is no need to log that for bridge windows in
pci_enable_resources().
Ignoring bridge window failures hopefully prevents an obvious source of
regressions when the upcoming change that no longer clears resource flags
for bridge windows is enacted. I've hit this problem even during my own
testing on multiple occasions so I expect it to be a quite common problem.
This can always be revisited later if somebody thinks the enable check for
bridges is not strict enough, but expect a mind-boggling number of
regressions from such a change.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-10-ilpo.jarvinen@linux.intel.com
|
|
A few places in setup-bus.c call release_resource() directly and end up
duplicating functionality from pci_release_resource() such as parent check,
logging, and clearing the resource. Worse yet, the way the resource is
cleared is inconsistent between different sites.
Convert release_resource() calls into pci_release_resource() to remove code
duplication. This will also make the resource start, end, and flags
behavior consistent, i.e., start address is cleared, and only
IORESOURCE_UNSET is asserted for the resource.
While at it, eliminate the unnecessary initialization of idx variable in
pci_bridge_release_resources().
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-9-ilpo.jarvinen@linux.intel.com
|
|
If clipping or claiming the bridge window fails, the bridge window is left
in a state that does not match the kernel's view on what the bridge window
is.
Disable the bridge window by writing the magic disable value into the Base
and Limit Registers if clipping or claiming failed. To detect if claiming
the resource was successful, add res->parent checks into the bridge setup
functions.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-8-ilpo.jarvinen@linux.intel.com
|
|
When the claim of a resource fails for the full range in
pci_claim_bridge_resource(), clipping the resource to a smaller size is
attempted. If clipping is successful, the new bridge window is programmed
and only as the last step the code attempts to claim the resource again.
The order of the last two steps is slightly illogical and inconsistent with
the assignment call chains.
If claiming the bridge window after clipping fails, the bridge window that
was set up is left in place.
Rework the logic such that the bridge window is claimed before calling the
relevant bridge setup function. This make the behavior consistent with
resource fitting call chains that always assign the bridge window before
programming it.
If claiming the bridge window fails, the clipped bridge window is no longer
set up but pci_claim_bridge_resource() returns without writing the bridge
window at all.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-7-ilpo.jarvinen@linux.intel.com
|
|
Reorder the logic checks in find_bus_resource_of_type() to simplify them.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-6-ilpo.jarvinen@linux.intel.com
|
|
Move find_bus_resource_of_type() earlier in setup-bus.c to be able to call
it in upcoming changes.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-5-ilpo.jarvinen@linux.intel.com
|
|
pci-legacy.c under MIPS has a copy of pci_enable_resources() named as
pcibios_enable_resources(). Having own copy of same functionality could
lead to inconsistencies in behavior, especially now as
pci_enable_resources() and the bridge window resource flags behavior are
going to be altered by upcoming changes.
The check for !r->start && r->end is already covered by the more generic
checks done in pci_enable_resources().
Call pci_enable_resources() from MIPS's pcibios_enable_device() and remove
pcibios_enable_resources().
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Link: https://patch.msgid.link/20250829131113.36754-4-ilpo.jarvinen@linux.intel.com
|
|
Under arch/sparc/ there are multiple copies of pcibios_enable_device() but
none of those seem to do anything extra beyond what pci_enable_resources()
is supposed to do. These functions could lead to inconsistencies in
behavior, especially now as pci_enable_resources() and the bridge window
resource flags behavior are going to be altered by upcoming changes.
Remove all pcibios_enable_device() from arch/sparc/ so that PCI core can
simply call into pci_enable_resources() instead using its __weak version
of pcibios_enable_device().
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-3-ilpo.jarvinen@linux.intel.com
|
|
m68k has a resource enable (check) loop in its pcibios_enable_device()
which for some reason differs from pci_enable_resources(). This could lead
to inconsistencies in behavior, especially now as pci_enable_resources()
and the bridge window resource flags behavior are going to be altered by
upcoming changes.
The check for !r->start && r->end is already covered by the more generic
checks done in pci_enable_resources().
The entire pcibios_enable_device() suspiciously looks copy-paste from some
other arch as also indicated by the preceding comment. However, it also
enables PCI_COMMAND_IO | PCI_COMMAND_MEMORY always for bridges. It is not
clear why that is being done as the commit e93a6bbeb5a5 ("m68k: common PCI
support definitions and code") introducing this code states "Nothing
specific to any PCI implementation in any m68k class CPU hardware yet".
Replace the resource enable loop with a call to pci_enable_resources() and
adjust the Command Register afterwards as it's unclear if that is necessary
or not so keep it for now.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-2-ilpo.jarvinen@linux.intel.com
|
|
Since 96336ec70264 ("PCI: Perform reset_resource() and build fail list in
sync") the failed list is always built and returned to let the caller
decide what to do with the failures. The caller may want to retry resource
fitting and assignment and before that can happen, the resources should be
restored to their original state (a reset effectively clears the struct
resource), which requires returning them to the failed list so the original
state remains stored in the associated struct pci_dev_resource.
Resource resizing is different from the ordinary resource fitting and
assignment in that it only considers part of the resources. This means
failures for other resource types are not relevant at all and should be
ignored. As resize doesn't unassign such unrelated resources, those
resources ending up in the failed list implies assignment of that
resource must have failed before resize too. The check in
pci_reassign_bridge_resources() to decide if the whole assignment is
successful, however, is based on list emptiness which will cause false
negatives when the failed list has resources with an unrelated type.
If the failed list is not empty, call pci_required_resource_failed() and
extend it to be able to filter on specific resource types too (if
provided).
Calling pci_required_resource_failed() at this point is slightly
problematic because the resource itself is reset when the failed list
is constructed in __assign_resources_sorted(). As a result,
pci_resource_is_optional() does not have access to the original
resource flags. This could be worked around by restoring and
re-resetting the resource around the call to pci_resource_is_optional(),
however, it shouldn't cause issue as resource resizing is meant for
64-bit prefetchable resources according to Christian König (see the
Link which unfortunately doesn't point directly to Christian's reply
because lore didn't store that email at all).
Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync")
Link: https://lore.kernel.org/all/c5d1b5d8-8669-5572-75a7-0b480f581ac1@linux.intel.com/
Reported-by: D Scott Phillips <scott@os.amperecomputing.com>
Closes: https://lore.kernel.org/all/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: D Scott Phillips <scott@os.amperecomputing.com>
Reviewed-by: D Scott Phillips <scott@os.amperecomputing.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org # v6.15+
Link: https://patch.msgid.link/20250822123359.16305-4-ilpo.jarvinen@linux.intel.com
|
|
pdev_sort_resources() uses pdev_resources_assignable() helper to decide if
device's resources cannot be assigned, so it ignores class 0
(PCI_CLASS_NOT_DEFINED) devices. pbus_size_mem(), on the other hand, does
not do the same check. This could lead into a situation where a resource
ends up on realloc_head list but is not on the head list, which in turn
prevents emptying the resource from the realloc_head list in
__assign_resources_sorted().
A non-empty realloc_head is unacceptable because it triggers an internal
sanity check as shown in this log with a device that has class 0
(PCI_CLASS_NOT_DEFINED):
pci 0001:01:00.0: [144d:a5a5] type 00 class 0x000000 PCIe Endpoint
pci 0001:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
pci 0001:01:00.0: ROM [mem 0x00000000-0x0000ffff pref]
pcieport 0001:00:00.0: bridge window [mem 0x00100000-0x001fffff] to [bus 01-ff] add_size 100000 add_align 100000
pcieport 0001:00:00.0: bridge window [mem 0x40000000-0x401fffff]: assigned
------------[ cut here ]------------
kernel BUG at drivers/pci/setup-bus.c:2532!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
Call trace:
pci_assign_unassigned_bus_resources+0x110/0x114 (P)
pci_rescan_bus+0x28/0x48
Use pdev_resources_assignable() also within pbus_size_mem() to skip
processing of non-assignable resources which removes the disparity in
between what resources pdev_sort_resources() and pbus_size_mem() consider.
As non-assignable resources are no longer processed, they are not added to
the realloc_head list, thus the sanity check no longer triggers.
This disparity problem is very old but only now became apparent after
2499f5348431 ("PCI: Rework optional resource handling") that made the ROM
resources optional when calculating bridge window sizes which required
adding the resource to the realloc_head list. Previously, bridge windows
were just sized larger than necessary.
Fixes: 2499f5348431 ("PCI: Rework optional resource handling")
Reported-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Closes: https://lore.kernel.org/all/5f103643-5e1c-43c6-b8fe-9617d3b5447c@linaro.org/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v6.15+
Link: https://patch.msgid.link/20250822123359.16305-3-ilpo.jarvinen@linux.intel.com
|
|
When using relaxed tail alignment for the bridge window, pbus_size_mem()
also tries to minimize min_align, which can under certain scenarios end up
increasing min_align from that found by calculate_mem_align().
Ensure min_align is not increased by the relaxed tail alignment.
Eventually, it would be better to add calculate_relaxed_head_align()
similar to calculate_mem_align() which finds out what alignment can be used
for the head without introducing any gaps into the bridge window to give
flexibility on head address too. But that looks relatively complex so it
requires much more testing than fixing the immediate problem causing a
regression.
Fixes: 67f9085596ee ("PCI: Allow relaxed bridge window tail sizing for optional resources")
Reported-by: Rio Liu <rio@r26.me>
Closes: https://lore.kernel.org/all/o2bL8MtD_40-lf8GlslTw-AZpUPzm8nmfCnJKvS8RQ3NOzOW1uq1dVCEfRpUjJ2i7G2WjfQhk2IWZ7oGp-7G-jXN4qOdtnyOcjRR0PZWK5I=@r26.me/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Rio Liu <rio@r26.me>
Cc: stable@vger.kernel.org # v6.15+
Link: https://patch.msgid.link/20250822123359.16305-2-ilpo.jarvinen@linux.intel.com
|
|
|
|
Probe and display L3 Cache topology
Add ability to average an added counter
(useful for pre-integrated "counters", such as Watts)
Break the limit of 64 built-in counters.
Assorted bug fixes and minor feature tweaks
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/
may be readable by all, but
/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/current_freq_khz
may be readable only by root.
Non-root turbostat users see complaints in this scenario.
Fail probe of the interface if we can't read current_freq_khz.
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Original-patch-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
use a macro for PER_THREAD_PARAMS to make adding one later more clear.
no functional change
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Together with the RAPL MSRs, there are more MSRs gone on DMR, including
PLR (Perf Limit Reasons), and IRTL (Package cstate Interrupt Response
Time Limit) MSRs. The configurable TDP info should also be retrieved
from TPMI based Intel Speed Select Technology feature.
Remove the access of these MSRs for DMR. Improve the DMR platform
feature table to make it more readable at the same time.
Fixes: 83075bd59de2 ("tools/power turbostat: Add initial support for DMR")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
External atributes with format "raw" are not printed in summary lines
for nodes/packages (or with option -S). The new format "average"
behaves like "raw" but also adds the summary data
Signed-off-by: Michael Hebenstreit <michael.hebenstreit@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
pkg_base[pkg_id] is a simple array of structure pointers,
let the compiler treat it that way.
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
We have out-grown the ability to use a 64-bit memory location
to inventory every possible built-in counter.
Leverage the the CPU_SET(3) macros to break this barrier.
Also, break the Joules & Watts counters into two,
since we can no longer 'or' them together...
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Explain the meaning of the Totl%C0, Any%C0, GFX%C0, CPUGFX% columns.
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Similar to delta_cpu(), delta_platform() is called in turbostat main
loop. This ensures accurate SysWatt readings in periodic monitoring mode
$ sudo turbostat -S -q --show power -i 1
CoreTmp PkgTmp PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% SysWatt
60 61 6.21 1.13 0.16 0.00 0.00 0.00 13.07
58 61 6.00 1.07 0.18 0.00 0.00 0.00 12.75
58 61 5.74 1.05 0.17 0.00 0.00 0.00 12.22
58 60 6.27 1.11 0.24 0.00 0.00 0.00 13.55
However, delta_platform() is missing for forked program and causes bogus
SysWatt reporting,
$ sudo turbostat -S -q --show power sleep 1
1.004736 sec
CoreTmp PkgTmp PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% SysWatt
57 58 6.05 1.02 0.16 0.00 0.00 0.00 0.03
Add missing delta_platform() for forked program.
Fixes: e5f687b89bc2 ("tools/power turbostat: Add RAPL psys as a built-in counter")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Kernels configured with CONFIG_MULTIUSER=n have no cap_get_proc().
Check for ENOSYS to recognize this case, and continue on to
attempt to access the requested MSRs (such as temperature).
Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
turbostat.c: In function 'parse_int_file':
turbostat.c:5567:19: error: 'PATH_MAX' undeclared (first use in this function)
5567 | char path[PATH_MAX];
| ^~~~~~~~
turbostat.c: In function 'probe_graphics':
turbostat.c:6787:19: error: 'PATH_MAX' undeclared (first use in this function)
6787 | char path[PATH_MAX];
| ^~~~~~~~
Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
$ sudo turbostat --quiet --show junk
turbostat: Counter 'junk' can not be added.
Previously, invalid arguments to --show and --hide were silently ignored
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
If the allocated size exceeds UINT_MAX, then it's necessary to cast
the mr->nr_pages value to size_t to prevent it from overflowing. In
practice this isn't much of a concern as the required memory size will
have been validated upfront, and accounted to the user. And > 4GB sizes
will be necessary to make the lack of a cast a problem, which greatly
exceeds normal user locked_vm settings that are generally in the kb to
mb range. However, if root is used, then accounting isn't done, and
then it's possible to hit this issue.
Link: https://lore.kernel.org/all/6895b298.050a0220.7f033.0059.GAE@google.com/
Cc: stable@vger.kernel.org
Reported-by: syzbot+23727438116feb13df15@syzkaller.appspotmail.com
Fixes: 087f997870a9 ("io_uring/memmap: implement mmap for regions")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Define a new, optional, callback that allows the driver to
specify how the return data buffer is allocated. If that callback
is set, mailbox/pcc.c is now responsible for reading from and
writing to the PCC shared buffer.
This also allows for proper checks of the Commnand complete flag
between the PCC sender and receiver.
For Type 4 channels, initialize the command complete flag prior
to accepting messages.
Since the mailbox does not know what memory allocation scheme
to use for response messages, the client now has an optional
callback that allows it to allocate the buffer for a response
message.
When an outbound message is written to the buffer, the mailbox
checks for the flag indicating the client wants an tx complete
notification via IRQ. Upon receipt of the interrupt It will
pair it with the outgoing message. The expected use is to
free the kernel memory buffer for the previous outgoing message.
Signed-off-by: Adam Young <admiyo@os.amperecomputing.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).
Use the destination buffer size instead to ensure that the string "___"
(three underscores) is copied correctly.
Cc: stable@vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Repeated connections from clients with the same IP address may exhaust
the max connections and prevent other normal client connections.
This patch limit repeated connections from clients with the same IP.
Reported-by: tianshuo han <hantianshuo233@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
There's no need for separate conn_wait and disconn_wait queues.
This will simplify the move to common code, the server code
already a single wait_queue for this.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
_smbd_get_connection
It is already called long before we may hit this cleanup code path.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This matches the timeout for tcp connections.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
vmd_msi_alloc() allocates struct vmd_irq and stashes it into
irq_data->chip_data associated with the VMD's interrupt domain.
vmd_msi_free() extracts the pointer by calling irq_get_chip_data() and
frees it.
irq_get_chip_data() returns the chip_data associated with the top interrupt
domain. This worked in the past because VMD's interrupt domain was the top
domain.
But d7d8ab87e3e7 ("PCI: vmd: Switch to msi_create_parent_irq_domain()")
changed the interrupt domain hierarchy so VMD's interrupt domain is not the
top domain anymore. irq_get_chip_data() now returns the chip_data at the
MSI devices' interrupt domains. It is therefore broken for vmd_msi_free()
to kfree() this chip_data.
Fix by extracting the chip_data associated with the VMD's interrupt domain.
Fixes: d7d8ab87e3e7 ("PCI: vmd: Switch to msi_create_parent_irq_domain()")
Reported-by: Kenneth Crudup <kenny@panix.com>
Closes: https://lore.kernel.org/linux-pci/dfa40e48-8840-4e61-9fda-25cdb3ad81c1@panix.com/
Reported-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Closes: https://lore.kernel.org/linux-pci/ed53280ed15d1140700b96cca2734bf327ee92539e5eb68e80f5bbbf0f01@linux.gnuweeb.org/
Tested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Tested-by: Kenneth Crudup <kenny@panix.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20250807081051.2253962-1-namcao@linutronix.de
|
|
On s390, and, in general, on all platforms where the respective event
supports auxiliary data gathering, the command:
# ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data ]
# ./perf report --stats | grep SAMPLE
#
does not generate samples in the perf.data file. On x86 the command:
# sudo perf record -e intel_pt// -u 0 ls
is broken too.
Looking at the sequence of calls in 'perf record' reveals this
behavior:
1. The event 'cycles' is created and enabled:
record__open()
+-> evlist__apply_filters()
+-> perf_bpf_filter__prepare()
+-> bpf_program.attach_perf_event()
+-> bpf_program.attach_perf_event_opts()
+-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...)
The event 'cycles' is enabled and active now. However the event's
ring-buffer to store the samples generated by hardware is not
allocated yet.
2. The event's fd is mmap()ed to create the ring buffer:
record__open()
+-> record__mmap()
+-> record__mmap_evlist()
+-> evlist__mmap_ex()
+-> perf_evlist__mmap_ops()
+-> mmap_per_cpu()
+-> mmap_per_evsel()
+-> mmap__mmap()
+-> perf_mmap__mmap()
+-> mmap()
This allocates the ring buffer for the event 'cycles'. With mmap()
the kernel creates the ring buffer:
perf_mmap(): kernel function to create the event's ring
| buffer to save the sampled data.
|
+-> ring_buffer_attach(): Allocates memory for ring buffer.
| The PMU has auxiliary data setup function. The
| has_aux(event) condition is true and the PMU's
| stop() is called to stop sampling. It is not
| restarted:
|
| if (has_aux(event))
| perf_event_stop(event, 0);
|
+-> cpumsf_pmu_stop():
Hardware sampling is stopped. No samples are generated and saved
anymore.
3. After the event 'cycles' has been mapped, the event is enabled a
second time in:
__cmd_record()
+-> evlist__enable()
+-> __evlist__enable()
+-> evsel__enable_cpu()
+-> perf_evsel__enable_cpu()
+-> perf_evsel__run_ioctl()
+-> perf_evsel__ioctl()
+-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .)
The second
ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
is just a NOP in this case. The first invocation in (1.) sets the
event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions
perf_ioctl()
+-> _perf_ioctl()
+-> _perf_event_enable()
+-> __perf_event_enable()
return immediately because event::state is already set to
PERF_EVENT_STATE_ACTIVE.
This happens on s390, because the event 'cycles' offers the possibility
to save auxilary data. The PMU callbacks setup_aux() and free_aux() are
defined. Without both callback functions, cpumsf_pmu_stop() is not
invoked and sampling continues.
To remedy this, remove the first invocation of
ioctl(..., PERF_EVENT_IOC_ENABLE, ...).
in step (1.) Create the event in step (1.) and enable it in step (3.)
after the ring buffer has been mapped.
Output after:
# ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.876 MB perf.data ]
# ./perf report --stats | grep SAMPLE
SAMPLE events: 16200 (99.5%)
SAMPLE events: 16200
#
The software event succeeded both before and after the patch:
# ./perf record -e cpu-clock -aB --synth=no -u 0 -- \
./perf test -w thloop 2
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 2.870 MB perf.data ]
# ./perf report --stats | grep SAMPLE
SAMPLE events: 53506 (99.8%)
SAMPLE events: 53506
#
Fixes: b4c658d4d63d61 ("perf target: Remove uid from target")
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Co-developed-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250806162417.19666-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Automatically enabling a perf event after attaching a BPF prog to it is
not always desirable.
Add a new "dont_enable" field to struct bpf_perf_event_opts. While
introducing "enable" instead would be nicer in that it would avoid
a double negation in the implementation, it would make
DECLARE_LIBBPF_OPTS() less efficient.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Co-developed-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250806162417.19666-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
I accidentally added a bug in pptp_xmit() that syzbot caught for us.
Only call ip_rt_put() if a route has been allocated.
BUG: unable to handle page fault for address: ffffffffffffffdb
PGD df3b067 P4D df3b067 PUD df3d067 PMD 0
Oops: Oops: 0002 [#1] SMP KASAN PTI
CPU: 1 UID: 0 PID: 6346 Comm: syz.0.336 Not tainted 6.16.0-next-20250804-syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:arch_atomic_add_return arch/x86/include/asm/atomic.h:85 [inline]
RIP: 0010:raw_atomic_sub_return_release include/linux/atomic/atomic-arch-fallback.h:846 [inline]
RIP: 0010:atomic_sub_return_release include/linux/atomic/atomic-instrumented.h:327 [inline]
RIP: 0010:__rcuref_put include/linux/rcuref.h:109 [inline]
RIP: 0010:rcuref_put+0x172/0x210 include/linux/rcuref.h:173
Call Trace:
<TASK>
dst_release+0x24/0x1b0 net/core/dst.c:167
ip_rt_put include/net/route.h:285 [inline]
pptp_xmit+0x14b/0x1a90 drivers/net/ppp/pptp.c:267
__ppp_channel_push+0xf2/0x1c0 drivers/net/ppp/ppp_generic.c:2166
ppp_channel_push+0x123/0x660 drivers/net/ppp/ppp_generic.c:2198
ppp_write+0x2b0/0x400 drivers/net/ppp/ppp_generic.c:544
vfs_write+0x27b/0xb30 fs/read_write.c:684
ksys_write+0x145/0x250 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: de9c4861fb42 ("pptp: ensure minimal skb length in pptp_xmit()")
Reported-by: syzbot+27d7cfbc93457e472e00@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/689095a5.050a0220.1fc43d.0009.GAE@google.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250807142146.2877060-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Because it's only used in sbitmap.c
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20250807032413.1469456-3-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently elevators will record internal 'async_depth' to throttle
asynchronous requests, and they both calculate shallow_dpeth based on
sb->shift, with the respect that sb->shift is the available tags in one
word.
However, sb->shift is not the availbale tags in the last word, see
__map_depth:
if (index == sb->map_nr - 1)
return sb->depth - (index << sb->shift);
For consequence, if the last word is used, more tags can be get than
expected, for example, assume nr_requests=256 and there are four words,
in the worst case if user set nr_requests=32, then the first word is
the last word, and still use bits per word, which is 64, to calculate
async_depth is wrong.
One the ohter hand, due to cgroup qos, bfq can allow only one request
to be allocated, and set shallow_dpeth=1 will still allow the number
of words request to be allocated.
Fix this problems by using shallow_depth to the whole sbitmap instead
of per word, also change kyber, mq-deadline and bfq to follow this,
a new helper __map_depth_with_shallow() is introduced to calculate
available bits in each word.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20250807032413.1469456-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit 528589947c180 ("nvmet: initialize discovery subsys after debugfs
is initialized") changed nvmet_init() to initialize nvme discovery after
"nvmet" debugfs directory is initialized. The change broke nvmet_exit()
because discovery subsystem now depends on debugfs. Debugfs should be
destroyed after discovery subsystem. Fix nvmet_exit() to do that.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Closes: https://lore.kernel.org/all/CAHj4cs96AfFQpyDKF_MdfJsnOEo=2V7dQgqjFv+k3t7H-=yGhA@mail.gmail.com/
Fixes: 528589947c180 ("nvmet: initialize discovery subsys after debugfs is initialized")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Link: https://lore.kernel.org/r/20250807053507.2794335-1-mkhalfella@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The conversion of all GPIO drivers to using the .set_rv() and
.set_multiple_rv() callbacks from struct gpio_chip (which - unlike their
predecessors - return an integer and allow the controller drivers to
indicate failures to users) is now complete and the legacy ones have
been removed. Rename the new callbacks back to their original names in
one sweeping change.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
With no more users of the legacy GPIO line value setters - .set() and
.set_multiple() - we can now remove them from the kernel.
Link: https://lore.kernel.org/r/20250725074651.14002-1-brgl@bgdev.pl
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|