aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/stackcollapse.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-09-16PCI: Use pbus_select_window_for_type() during IO window sizingIlpo Järvinen1-2/+1
Convert pbus_size_io() to use pbus_select_window_for_type(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-17-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Use pbus_select_window() during BAR resizeIlpo Järvinen1-7/+13
Prior to a BAR resize, __resource_resize_store() loops through the normal resources of the PCI device and releases those that match to the flags of the BAR to be resized. This is necessary to allow resizing also the upstream bridge window as only childless bridge windows can be resized. While the flags check (mostly) works (if corner cases are ignored), the more straightforward way is to check if the resources share the bridge window. Change __resource_resize_store() to do the check using pbus_select_window(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-16-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Warn if bridge window cannot be released when resizing BARIlpo Järvinen1-0/+6
BAR resizing calls to pci_reassign_bridge_resources(), which attempts to release any upstream bridge window to allow them to accommodate the new BAR size. The release can only be performed if there are no other child resources for the bridge window. Previously the code continued silently when other child resources were detected. Add pci_warn() to inform user that a bridge window could not be released because of child resources. As a small bridge window is often the reason why BAR resize fails, this warning will help to pinpoint to the cause. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-15-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Fix finding bridge window in pci_reassign_bridge_resources()Ilpo Järvinen3-22/+20
pci_reassign_bridge_resources() walks upwards in the PCI bus hierarchy, locates the relevant bridge window on each level using flags check, and attempts to release the bridge window. The flags-based check is fragile due to various fallbacks in the bridge window selection logic. As such, the algorithm might not locate the correct bridge window. Refactor pci_reassign_bridge_resources() to determine the correct bridge window using pbus_select_window(), which contains logic to handle all fallback cases correctly. Change function prefix to pbus as it now inputs struct bus and resource for which to locate the bridge window. The main purpose is to make bridge window selection logic consistent across the entire PCI core (one step at a time). While this technically also fixes the commit 8bb705e3e79d ("PCI: Add pci_resize_resource() for resizing BARs") making the bridge window walk algorithm more robust, the normal setup having a 64-bit resizable BAR underneath bridge(s) with 64-bit prefetchable windows does not need to use any fallbacks. As such, the practical impact is low (requiring BAR resize use case and a non-typical bridge device). The way to detect if unrelated resource failed again is left to use the type based approximation which should not behave worse than before. Fixes: 8bb705e3e79d ("PCI: Add pci_resize_resource() for resizing BARs") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-14-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Add bridge window selection functionsIlpo Järvinen2-0/+103
Various places in the PCI core code independently decide into which bridge window a child resource should be placed. It is hard to see whether these decisions always end up in agreement, especially in the corner cases, and in some places it requires complex logic to pass multiple resource types and/or bridge windows around. Add pbus_select_window() and pbus_select_window_for_type() for cases where the former cannot be used so that eventually the same helper can be used to select the bridge window everywhere. Using the same function ensures the selected bridge window remains always the same and it can be easily recalculated in-situ allowing simplifying the interfaces between internal functions in upcoming changes. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-13-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Add defines for bridge window indexingIlpo Järvinen2-3/+11
include/linux/pci.h provides PCI_BRIDGE_{IO,MEM,PREF_MEM}_WINDOW defines, however, they're based on the resource array indexing in the pci_dev struct. The struct pci_bus also has pointers to those same resources but they start from zeroth index. Add PCI_BUS_BRIDGE_{IO,MEM,PREF_MEM}_WINDOW defines to get rid of literal indexing. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-12-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Preserve bridge window resource type flagsIlpo Järvinen5-37/+90
When a bridge window is found unused or fails to assign, the flags of the associated resource are cleared. Clearing flags is problematic as it also removes the type information of the resource which is needed later. Thus, always preserve the bridge window type flags and use IORESOURCE_UNSET and IORESOURCE_DISABLED to indicate the status of the bridge window. Also, when initializing resources, make sure all valid bridge windows do get their type flags set. Change various places that relied on resource flags being cleared to check for IORESOURCE_UNSET and IORESOURCE_DISABLED to allow bridge window resource to retain their type flags. Add pdev_resource_assignable() and pdev_resource_should_fit() helpers to filter out disabled bridge windows during resource fitting; the latter combines more common checks into the helper. When reading the bridge windows from the registers, instead of leaving the resource flags cleared for bridge windows that are not enabled, always set up the flags and set IORESOURCE_UNSET | IORESOURCE_DISABLED as needed. When resource fitting or assignment fails for a bridge window resource, or the bridge window is not needed, mark the resource with IORESOURCE_UNSET or IORESOURCE_DISABLED, respectively. Use dummy zero resource in resource_show() for backwards compatibility as lspci will otherwise misrepresent disabled bridge windows. This change fixes an issue which highlights the importance of keeping the resource type flags intact: At the end of __assign_resources_sorted(), reset_resource() is called, previously clearing the flags. Later, pci_prepare_next_assign_round() attempted to release bridge resources using pci_bus_release_bridge_resources() that calls into pci_bridge_release_resources() that assumes type flags are still present. As type flags were cleared, IORESOURCE_MEM_64 was not set leading to resources under an incorrect bridge window to be released (idx = 1 instead of idx = 2). While the assignments performed later covered this problem so that the wrongly released resources got assigned in the end, it was still causing extra release+assign pairs. There are other reasons why the resource flags should be retained in upcoming changes too. Removing the flag reset for non-bridge window resource is left as future work, in part because it has a much higher regression potential due to pci_enable_resources() that will start to work also for those resources then and due to what endpoint drivers might assume about resources. Despite the Fixes tag, backporting this (at least any time soon) is highly discouraged. The issue fixed is borderline cosmetic as the later assignments normally cover the problem entirely. Also there might be non-obvious dependencies. Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-11-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Enable bridge even if bridge window fails to assignIlpo Järvinen1-13/+17
A normal PCI bridge has multiple bridge windows and not all of them are always required by devices underneath the bridge. If a Root Port or bridge does not have a device underneath, no bridge windows get assigned. Yet, pci_enable_resources() is set to fail indiscriminantly on any resource assignment failure if the resource is not known to be optional. In practice, the code in pci_enable_resources() is currently largely dormant. The kernel sets resource flags to zero for any unused bridge window and resets flags to zero in case of an resource assignment failure, which short-circuits pci_enable_resources() because of this check: if (!(r->flags & (IORESOURCE_IO | IORESOURCE_MEM))) continue; However, an upcoming change to resource flags will alter how bridge window resource flags behave activating these long dormants checks in pci_enable_resources(). While complex logic could be built to selectively enable a bridge only under some conditions, a few versions of such logic were tried during development of this change and none of them worked satisfactorily. Thus, I just gave up and decided to enable any bridge regardless of the bridge windows as there seems to be no clear benefit from not enabling it, but a major downside as pcieport will not be probed for the bridge if it's not enabled. Therefore, change pci_enable_resources() to not check if bridge window resources remain unassigned. Resource assignment failures are pretty noisy already so there is no need to log that for bridge windows in pci_enable_resources(). Ignoring bridge window failures hopefully prevents an obvious source of regressions when the upcoming change that no longer clears resource flags for bridge windows is enacted. I've hit this problem even during my own testing on multiple occasions so I expect it to be a quite common problem. This can always be revisited later if somebody thinks the enable check for bridges is not strict enough, but expect a mind-boggling number of regressions from such a change. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-10-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Use pci_release_resource() instead of release_resource()Ilpo Järvinen3-36/+23
A few places in setup-bus.c call release_resource() directly and end up duplicating functionality from pci_release_resource() such as parent check, logging, and clearing the resource. Worse yet, the way the resource is cleared is inconsistent between different sites. Convert release_resource() calls into pci_release_resource() to remove code duplication. This will also make the resource start, end, and flags behavior consistent, i.e., start address is cleared, and only IORESOURCE_UNSET is asserted for the resource. While at it, eliminate the unnecessary initialization of idx variable in pci_bridge_release_resources(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-9-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Disable non-claimed bridge windowIlpo Järvinen1-13/+12
If clipping or claiming the bridge window fails, the bridge window is left in a state that does not match the kernel's view on what the bridge window is. Disable the bridge window by writing the magic disable value into the Base and Limit Registers if clipping or claiming failed. To detect if claiming the resource was successful, add res->parent checks into the bridge setup functions. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-8-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Always claim bridge window before its setupIlpo Järvinen1-4/+8
When the claim of a resource fails for the full range in pci_claim_bridge_resource(), clipping the resource to a smaller size is attempted. If clipping is successful, the new bridge window is programmed and only as the last step the code attempts to claim the resource again. The order of the last two steps is slightly illogical and inconsistent with the assignment call chains. If claiming the bridge window after clipping fails, the bridge window that was set up is left in place. Rework the logic such that the bridge window is claimed before calling the relevant bridge setup function. This make the behavior consistent with resource fitting call chains that always assign the bridge window before programming it. If claiming the bridge window fails, the clipped bridge window is no longer set up but pci_claim_bridge_resource() returns without writing the bridge window at all. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-7-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Refactor find_bus_resource_of_type() logic checksIlpo Järvinen1-3/+7
Reorder the logic checks in find_bus_resource_of_type() to simplify them. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-6-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Move find_bus_resource_of_type() earlierIlpo Järvinen1-28/+28
Move find_bus_resource_of_type() earlier in setup-bus.c to be able to call it in upcoming changes. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-5-ilpo.jarvinen@linux.intel.com
2025-09-16MIPS: PCI: Use pci_enable_resources()Ilpo Järvinen1-36/+2
pci-legacy.c under MIPS has a copy of pci_enable_resources() named as pcibios_enable_resources(). Having own copy of same functionality could lead to inconsistencies in behavior, especially now as pci_enable_resources() and the bridge window resource flags behavior are going to be altered by upcoming changes. The check for !r->start && r->end is already covered by the more generic checks done in pci_enable_resources(). Call pci_enable_resources() from MIPS's pcibios_enable_device() and remove pcibios_enable_resources(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Link: https://patch.msgid.link/20250829131113.36754-4-ilpo.jarvinen@linux.intel.com
2025-09-16sparc/PCI: Remove pcibios_enable_device() as they do nothing extraIlpo Järvinen3-81/+0
Under arch/sparc/ there are multiple copies of pcibios_enable_device() but none of those seem to do anything extra beyond what pci_enable_resources() is supposed to do. These functions could lead to inconsistencies in behavior, especially now as pci_enable_resources() and the bridge window resource flags behavior are going to be altered by upcoming changes. Remove all pcibios_enable_device() from arch/sparc/ so that PCI core can simply call into pci_enable_resources() instead using its __weak version of pcibios_enable_device(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-3-ilpo.jarvinen@linux.intel.com
2025-09-16m68k/PCI: Use pci_enable_resources() in pcibios_enable_device()Ilpo Järvinen1-28/+11
m68k has a resource enable (check) loop in its pcibios_enable_device() which for some reason differs from pci_enable_resources(). This could lead to inconsistencies in behavior, especially now as pci_enable_resources() and the bridge window resource flags behavior are going to be altered by upcoming changes. The check for !r->start && r->end is already covered by the more generic checks done in pci_enable_resources(). The entire pcibios_enable_device() suspiciously looks copy-paste from some other arch as also indicated by the preceding comment. However, it also enables PCI_COMMAND_IO | PCI_COMMAND_MEMORY always for bridges. It is not clear why that is being done as the commit e93a6bbeb5a5 ("m68k: common PCI support definitions and code") introducing this code states "Nothing specific to any PCI implementation in any m68k class CPU hardware yet". Replace the resource enable loop with a call to pci_enable_resources() and adjust the Command Register afterwards as it's unclear if that is necessary or not so keep it for now. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-2-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Fix failure detection during resource resizeIlpo Järvinen1-8/+18
Since 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync") the failed list is always built and returned to let the caller decide what to do with the failures. The caller may want to retry resource fitting and assignment and before that can happen, the resources should be restored to their original state (a reset effectively clears the struct resource), which requires returning them to the failed list so the original state remains stored in the associated struct pci_dev_resource. Resource resizing is different from the ordinary resource fitting and assignment in that it only considers part of the resources. This means failures for other resource types are not relevant at all and should be ignored. As resize doesn't unassign such unrelated resources, those resources ending up in the failed list implies assignment of that resource must have failed before resize too. The check in pci_reassign_bridge_resources() to decide if the whole assignment is successful, however, is based on list emptiness which will cause false negatives when the failed list has resources with an unrelated type. If the failed list is not empty, call pci_required_resource_failed() and extend it to be able to filter on specific resource types too (if provided). Calling pci_required_resource_failed() at this point is slightly problematic because the resource itself is reset when the failed list is constructed in __assign_resources_sorted(). As a result, pci_resource_is_optional() does not have access to the original resource flags. This could be worked around by restoring and re-resetting the resource around the call to pci_resource_is_optional(), however, it shouldn't cause issue as resource resizing is meant for 64-bit prefetchable resources according to Christian König (see the Link which unfortunately doesn't point directly to Christian's reply because lore didn't store that email at all). Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync") Link: https://lore.kernel.org/all/c5d1b5d8-8669-5572-75a7-0b480f581ac1@linux.intel.com/ Reported-by: D Scott Phillips <scott@os.amperecomputing.com> Closes: https://lore.kernel.org/all/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: D Scott Phillips <scott@os.amperecomputing.com> Reviewed-by: D Scott Phillips <scott@os.amperecomputing.com> Cc: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org # v6.15+ Link: https://patch.msgid.link/20250822123359.16305-4-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Fix pdev_resources_assignable() disparityIlpo Järvinen1-0/+1
pdev_sort_resources() uses pdev_resources_assignable() helper to decide if device's resources cannot be assigned, so it ignores class 0 (PCI_CLASS_NOT_DEFINED) devices. pbus_size_mem(), on the other hand, does not do the same check. This could lead into a situation where a resource ends up on realloc_head list but is not on the head list, which in turn prevents emptying the resource from the realloc_head list in __assign_resources_sorted(). A non-empty realloc_head is unacceptable because it triggers an internal sanity check as shown in this log with a device that has class 0 (PCI_CLASS_NOT_DEFINED): pci 0001:01:00.0: [144d:a5a5] type 00 class 0x000000 PCIe Endpoint pci 0001:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit] pci 0001:01:00.0: ROM [mem 0x00000000-0x0000ffff pref] pcieport 0001:00:00.0: bridge window [mem 0x00100000-0x001fffff] to [bus 01-ff] add_size 100000 add_align 100000 pcieport 0001:00:00.0: bridge window [mem 0x40000000-0x401fffff]: assigned ------------[ cut here ]------------ kernel BUG at drivers/pci/setup-bus.c:2532! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP ... Call trace: pci_assign_unassigned_bus_resources+0x110/0x114 (P) pci_rescan_bus+0x28/0x48 Use pdev_resources_assignable() also within pbus_size_mem() to skip processing of non-assignable resources which removes the disparity in between what resources pdev_sort_resources() and pbus_size_mem() consider. As non-assignable resources are no longer processed, they are not added to the realloc_head list, thus the sanity check no longer triggers. This disparity problem is very old but only now became apparent after 2499f5348431 ("PCI: Rework optional resource handling") that made the ROM resources optional when calculating bridge window sizes which required adding the resource to the realloc_head list. Previously, bridge windows were just sized larger than necessary. Fixes: 2499f5348431 ("PCI: Rework optional resource handling") Reported-by: Tudor Ambarus <tudor.ambarus@linaro.org> Closes: https://lore.kernel.org/all/5f103643-5e1c-43c6-b8fe-9617d3b5447c@linaro.org/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v6.15+ Link: https://patch.msgid.link/20250822123359.16305-3-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Ensure relaxed tail alignment does not increase min_alignIlpo Järvinen1-4/+7
When using relaxed tail alignment for the bridge window, pbus_size_mem() also tries to minimize min_align, which can under certain scenarios end up increasing min_align from that found by calculate_mem_align(). Ensure min_align is not increased by the relaxed tail alignment. Eventually, it would be better to add calculate_relaxed_head_align() similar to calculate_mem_align() which finds out what alignment can be used for the head without introducing any gaps into the bridge window to give flexibility on head address too. But that looks relatively complex so it requires much more testing than fixing the immediate problem causing a regression. Fixes: 67f9085596ee ("PCI: Allow relaxed bridge window tail sizing for optional resources") Reported-by: Rio Liu <rio@r26.me> Closes: https://lore.kernel.org/all/o2bL8MtD_40-lf8GlslTw-AZpUPzm8nmfCnJKvS8RQ3NOzOW1uq1dVCEfRpUjJ2i7G2WjfQhk2IWZ7oGp-7G-jXN4qOdtnyOcjRR0PZWK5I=@r26.me/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Rio Liu <rio@r26.me> Cc: stable@vger.kernel.org # v6.15+ Link: https://patch.msgid.link/20250822123359.16305-2-ilpo.jarvinen@linux.intel.com
2025-08-10Linux 6.17-rc1v6.17-rc1Linus Torvalds1-2/+2
2025-08-09tools/power turbostat: version 2025.09.09Len Brown1-1/+1
Probe and display L3 Cache topology Add ability to average an added counter (useful for pre-integrated "counters", such as Watts) Break the limit of 64 built-in counters. Assorted bug fixes and minor feature tweaks Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat: Handle non-root legacy-uncore sysfs permissionsLen Brown1-1/+2
/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/ may be readable by all, but /sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/current_freq_khz may be readable only by root. Non-root turbostat users see complaints in this scenario. Fail probe of the interface if we can't read current_freq_khz. Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Original-patch-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat: standardize PER_THREAD_PARAMSLen Brown1-20/+22
use a macro for PER_THREAD_PARAMS to make adding one later more clear. no functional change Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat: Fix DMR supportZhang Rui1-14/+15
Together with the RAPL MSRs, there are more MSRs gone on DMR, including PLR (Perf Limit Reasons), and IRTL (Package cstate Interrupt Response Time Limit) MSRs. The configurable TDP info should also be retrieved from TPMI based Intel Speed Select Technology feature. Remove the access of these MSRs for DMR. Improve the DMR platform feature table to make it more readable at the same time. Fixes: 83075bd59de2 ("tools/power turbostat: Add initial support for DMR") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat: add format "average" for external attributesMichael Hebenstreit2-11/+22
External atributes with format "raw" are not printed in summary lines for nodes/packages (or with option -S). The new format "average" behaves like "raw" but also adds the summary data Signed-off-by: Michael Hebenstreit <michael.hebenstreit@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat: delete GET_PKG()Len Brown1-15/+6
pkg_base[pkg_id] is a simple array of structure pointers, let the compiler treat it that way. Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat: probe and display L3 cache topologyLen Brown1-3/+31
Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat: Support more than 64 built-in-countersLen Brown1-154/+406
We have out-grown the ability to use a 64-bit memory location to inventory every possible built-in counter. Leverage the the CPU_SET(3) macros to break this barrier. Also, break the Joules & Watts counters into two, since we can no longer 'or' them together... Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat.8: Document Totl%C0, Any%C0, GFX%C0, CPUGFX% columnsLen Brown1-0/+8
Explain the meaning of the Totl%C0, Any%C0, GFX%C0, CPUGFX% columns. Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08tools/power turbostat: Fix bogus SysWatt for forked programZhang Rui1-0/+1
Similar to delta_cpu(), delta_platform() is called in turbostat main loop. This ensures accurate SysWatt readings in periodic monitoring mode $ sudo turbostat -S -q --show power -i 1 CoreTmp PkgTmp PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% SysWatt 60 61 6.21 1.13 0.16 0.00 0.00 0.00 13.07 58 61 6.00 1.07 0.18 0.00 0.00 0.00 12.75 58 61 5.74 1.05 0.17 0.00 0.00 0.00 12.22 58 60 6.27 1.11 0.24 0.00 0.00 0.00 13.55 However, delta_platform() is missing for forked program and causes bogus SysWatt reporting, $ sudo turbostat -S -q --show power sleep 1 1.004736 sec CoreTmp PkgTmp PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% SysWatt 57 58 6.05 1.02 0.16 0.00 0.00 0.00 0.03 Add missing delta_platform() for forked program. Fixes: e5f687b89bc2 ("tools/power turbostat: Add RAPL psys as a built-in counter") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08tools/power turbostat: Handle cap_get_proc() ENOSYSCalvin Owens1-1/+9
Kernels configured with CONFIG_MULTIUSER=n have no cap_get_proc(). Check for ENOSYS to recognize this case, and continue on to attempt to access the requested MSRs (such as temperature). Signed-off-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08tools/power turbostat: Fix build with muslCalvin Owens1-0/+1
turbostat.c: In function 'parse_int_file': turbostat.c:5567:19: error: 'PATH_MAX' undeclared (first use in this function) 5567 | char path[PATH_MAX]; | ^~~~~~~~ turbostat.c: In function 'probe_graphics': turbostat.c:6787:19: error: 'PATH_MAX' undeclared (first use in this function) 6787 | char path[PATH_MAX]; | ^~~~~~~~ Signed-off-by: Calvin Owens <calvin@wbinvd.org> Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08tools/power turbostat: verify arguments to params --show and --hideLen Brown1-2/+31
$ sudo turbostat --quiet --show junk turbostat: Counter 'junk' can not be added. Previously, invalid arguments to --show and --hide were silently ignored Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08io_uring/memmap: cast nr_pages to size_t before shiftingJens Axboe1-1/+1
If the allocated size exceeds UINT_MAX, then it's necessary to cast the mr->nr_pages value to size_t to prevent it from overflowing. In practice this isn't much of a concern as the required memory size will have been validated upfront, and accounted to the user. And > 4GB sizes will be necessary to make the lack of a cast a problem, which greatly exceeds normal user locked_vm settings that are generally in the kb to mb range. However, if root is used, then accounting isn't done, and then it's possible to hit this issue. Link: https://lore.kernel.org/all/6895b298.050a0220.7f033.0059.GAE@google.com/ Cc: stable@vger.kernel.org Reported-by: syzbot+23727438116feb13df15@syzkaller.appspotmail.com Fixes: 087f997870a9 ("io_uring/memmap: implement mmap for regions") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-07mailbox/pcc: support mailbox management of the shared bufferAdam Young2-4/+127
Define a new, optional, callback that allows the driver to specify how the return data buffer is allocated. If that callback is set, mailbox/pcc.c is now responsible for reading from and writing to the PCC shared buffer. This also allows for proper checks of the Commnand complete flag between the PCC sender and receiver. For Type 4 channels, initialize the command complete flag prior to accepting messages. Since the mailbox does not know what memory allocation scheme to use for response messages, the client now has an optional callback that allows it to allocate the buffer for a response message. When an outbound message is written to the buffer, the mailbox checks for the flag indicating the client wants an tx complete notification via IRQ. Upon receipt of the interrupt It will pair it with the outgoing message. The expected use is to free the kernel memory buffer for the previous outgoing message. Signed-off-by: Adam Young <admiyo@os.amperecomputing.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2025-08-07smb: server: Fix extension string in ksmbd_extract_shortname()Thorsten Blum1-1/+1
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the length of the source string (excluding the NUL terminator) rather than the size of the destination buffer. This results in "__" being copied to 'extension' rather than "___" (two underscores instead of three). Use the destination buffer size instead to ensure that the string "___" (three underscores) is copied correctly. Cc: stable@vger.kernel.org Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07ksmbd: limit repeated connections from clients with the same IPNamjae Jeon2-0/+18
Repeated connections from clients with the same IP address may exhaust the max connections and prevent other normal client connections. This patch limit repeated connections from clients with the same IP. Reported-by: tianshuo han <hantianshuo233@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07smb: client: only use a single wait_queue to monitor smbdirect connection statusStefan Metzmacher2-11/+9
There's no need for separate conn_wait and disconn_wait queues. This will simplify the move to common code, the server code already a single wait_queue for this. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07smb: client: don't call init_waitqueue_head(&info->conn_wait) twice in ↵Stefan Metzmacher1-1/+0
_smbd_get_connection It is already called long before we may hit this cleanup code path. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07smb: client: improve logging in smbd_conn_upcall()Stefan Metzmacher1-4/+10
Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07smb: client: return an error if rdma_connect does not return within 5 secondsStefan Metzmacher1-2/+4
This matches the timeout for tcp connections. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07PCI: vmd: Fix wrong kfree() in vmd_msi_free()Nam Cao1-1/+3
vmd_msi_alloc() allocates struct vmd_irq and stashes it into irq_data->chip_data associated with the VMD's interrupt domain. vmd_msi_free() extracts the pointer by calling irq_get_chip_data() and frees it. irq_get_chip_data() returns the chip_data associated with the top interrupt domain. This worked in the past because VMD's interrupt domain was the top domain. But d7d8ab87e3e7 ("PCI: vmd: Switch to msi_create_parent_irq_domain()") changed the interrupt domain hierarchy so VMD's interrupt domain is not the top domain anymore. irq_get_chip_data() now returns the chip_data at the MSI devices' interrupt domains. It is therefore broken for vmd_msi_free() to kfree() this chip_data. Fix by extracting the chip_data associated with the VMD's interrupt domain. Fixes: d7d8ab87e3e7 ("PCI: vmd: Switch to msi_create_parent_irq_domain()") Reported-by: Kenneth Crudup <kenny@panix.com> Closes: https://lore.kernel.org/linux-pci/dfa40e48-8840-4e61-9fda-25cdb3ad81c1@panix.com/ Reported-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Closes: https://lore.kernel.org/linux-pci/ed53280ed15d1140700b96cca2734bf327ee92539e5eb68e80f5bbbf0f01@linux.gnuweeb.org/ Tested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Tested-by: Kenneth Crudup <kenny@panix.com> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20250807081051.2253962-1-namcao@linutronix.de
2025-08-07perf bpf-filter: Enable events manuallyIlya Leoshkevich1-1/+4
On s390, and, in general, on all platforms where the respective event supports auxiliary data gathering, the command: # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.011 MB perf.data ] # ./perf report --stats | grep SAMPLE # does not generate samples in the perf.data file. On x86 the command: # sudo perf record -e intel_pt// -u 0 ls is broken too. Looking at the sequence of calls in 'perf record' reveals this behavior: 1. The event 'cycles' is created and enabled: record__open() +-> evlist__apply_filters() +-> perf_bpf_filter__prepare() +-> bpf_program.attach_perf_event() +-> bpf_program.attach_perf_event_opts() +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...) The event 'cycles' is enabled and active now. However the event's ring-buffer to store the samples generated by hardware is not allocated yet. 2. The event's fd is mmap()ed to create the ring buffer: record__open() +-> record__mmap() +-> record__mmap_evlist() +-> evlist__mmap_ex() +-> perf_evlist__mmap_ops() +-> mmap_per_cpu() +-> mmap_per_evsel() +-> mmap__mmap() +-> perf_mmap__mmap() +-> mmap() This allocates the ring buffer for the event 'cycles'. With mmap() the kernel creates the ring buffer: perf_mmap(): kernel function to create the event's ring | buffer to save the sampled data. | +-> ring_buffer_attach(): Allocates memory for ring buffer. | The PMU has auxiliary data setup function. The | has_aux(event) condition is true and the PMU's | stop() is called to stop sampling. It is not | restarted: | | if (has_aux(event)) | perf_event_stop(event, 0); | +-> cpumsf_pmu_stop(): Hardware sampling is stopped. No samples are generated and saved anymore. 3. After the event 'cycles' has been mapped, the event is enabled a second time in: __cmd_record() +-> evlist__enable() +-> __evlist__enable() +-> evsel__enable_cpu() +-> perf_evsel__enable_cpu() +-> perf_evsel__run_ioctl() +-> perf_evsel__ioctl() +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .) The second ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); is just a NOP in this case. The first invocation in (1.) sets the event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions perf_ioctl() +-> _perf_ioctl() +-> _perf_event_enable() +-> __perf_event_enable() return immediately because event::state is already set to PERF_EVENT_STATE_ACTIVE. This happens on s390, because the event 'cycles' offers the possibility to save auxilary data. The PMU callbacks setup_aux() and free_aux() are defined. Without both callback functions, cpumsf_pmu_stop() is not invoked and sampling continues. To remedy this, remove the first invocation of ioctl(..., PERF_EVENT_IOC_ENABLE, ...). in step (1.) Create the event in step (1.) and enable it in step (3.) after the ring buffer has been mapped. Output after: # ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2 [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.876 MB perf.data ] # ./perf report --stats | grep SAMPLE SAMPLE events: 16200 (99.5%) SAMPLE events: 16200 # The software event succeeded both before and after the patch: # ./perf record -e cpu-clock -aB --synth=no -u 0 -- \ ./perf test -w thloop 2 [ perf record: Woken up 7 times to write data ] [ perf record: Captured and wrote 2.870 MB perf.data ] # ./perf report --stats | grep SAMPLE SAMPLE events: 53506 (99.8%) SAMPLE events: 53506 # Fixes: b4c658d4d63d61 ("perf target: Remove uid from target") Suggested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250806162417.19666-3-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-07libbpf: Add the ability to suppress perf event enablementIlya Leoshkevich2-6/+11
Automatically enabling a perf event after attaching a BPF prog to it is not always desirable. Add a new "dont_enable" field to struct bpf_perf_event_opts. While introducing "enable" instead would be nicer in that it would avoid a double negation in the implementation, it would make DECLARE_LIBBPF_OPTS() less efficient. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Suggested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250806162417.19666-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-07pptp: fix pptp_xmit() error pathEric Dumazet1-3/+4
I accidentally added a bug in pptp_xmit() that syzbot caught for us. Only call ip_rt_put() if a route has been allocated. BUG: unable to handle page fault for address: ffffffffffffffdb PGD df3b067 P4D df3b067 PUD df3d067 PMD 0 Oops: Oops: 0002 [#1] SMP KASAN PTI CPU: 1 UID: 0 PID: 6346 Comm: syz.0.336 Not tainted 6.16.0-next-20250804-syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 RIP: 0010:arch_atomic_add_return arch/x86/include/asm/atomic.h:85 [inline] RIP: 0010:raw_atomic_sub_return_release include/linux/atomic/atomic-arch-fallback.h:846 [inline] RIP: 0010:atomic_sub_return_release include/linux/atomic/atomic-instrumented.h:327 [inline] RIP: 0010:__rcuref_put include/linux/rcuref.h:109 [inline] RIP: 0010:rcuref_put+0x172/0x210 include/linux/rcuref.h:173 Call Trace: <TASK> dst_release+0x24/0x1b0 net/core/dst.c:167 ip_rt_put include/net/route.h:285 [inline] pptp_xmit+0x14b/0x1a90 drivers/net/ppp/pptp.c:267 __ppp_channel_push+0xf2/0x1c0 drivers/net/ppp/ppp_generic.c:2166 ppp_channel_push+0x123/0x660 drivers/net/ppp/ppp_generic.c:2198 ppp_write+0x2b0/0x400 drivers/net/ppp/ppp_generic.c:544 vfs_write+0x27b/0xb30 fs/read_write.c:684 ksys_write+0x145/0x250 fs/read_write.c:738 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: de9c4861fb42 ("pptp: ensure minimal skb length in pptp_xmit()") Reported-by: syzbot+27d7cfbc93457e472e00@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/689095a5.050a0220.1fc43d.0009.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250807142146.2877060-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-07lib/sbitmap: make sbitmap_get_shallow() internalYu Kuai2-19/+16
Because it's only used in sbitmap.c Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20250807032413.1469456-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-07lib/sbitmap: convert shallow_depth from one word to the whole sbitmapYu Kuai6-73/+52
Currently elevators will record internal 'async_depth' to throttle asynchronous requests, and they both calculate shallow_dpeth based on sb->shift, with the respect that sb->shift is the available tags in one word. However, sb->shift is not the availbale tags in the last word, see __map_depth: if (index == sb->map_nr - 1) return sb->depth - (index << sb->shift); For consequence, if the last word is used, more tags can be get than expected, for example, assume nr_requests=256 and there are four words, in the worst case if user set nr_requests=32, then the first word is the last word, and still use bits per word, which is 64, to calculate async_depth is wrong. One the ohter hand, due to cgroup qos, bfq can allow only one request to be allocated, and set shallow_dpeth=1 will still allow the number of words request to be allocated. Fix this problems by using shallow_depth to the whole sbitmap instead of per word, also change kyber, mq-deadline and bfq to follow this, a new helper __map_depth_with_shallow() is introduced to calculate available bits in each word. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250807032413.1469456-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-07nvmet: exit debugfs after discovery subsystem exitsMohamed Khalfella1-1/+1
Commit 528589947c180 ("nvmet: initialize discovery subsys after debugfs is initialized") changed nvmet_init() to initialize nvme discovery after "nvmet" debugfs directory is initialized. The change broke nvmet_exit() because discovery subsystem now depends on debugfs. Debugfs should be destroyed after discovery subsystem. Fix nvmet_exit() to do that. Reported-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/all/CAHj4cs96AfFQpyDKF_MdfJsnOEo=2V7dQgqjFv+k3t7H-=yGhA@mail.gmail.com/ Fixes: 528589947c180 ("nvmet: initialize discovery subsys after debugfs is initialized") Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Link: https://lore.kernel.org/r/20250807053507.2794335-1-mkhalfella@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-07treewide: rename GPIO set callbacks back to their original namesBartosz Golaszewski282-356/+355
The conversion of all GPIO drivers to using the .set_rv() and .set_multiple_rv() callbacks from struct gpio_chip (which - unlike their predecessors - return an integer and allow the controller drivers to indicate failures to users) is now complete and the legacy ones have been removed. Rename the new callbacks back to their original names in one sweeping change. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-08-07gpio: remove legacy GPIO line value setter callbacksBartosz Golaszewski2-28/+6
With no more users of the legacy GPIO line value setters - .set() and .set_multiple() - we can now remove them from the kernel. Link: https://lore.kernel.org/r/20250725074651.14002-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>