aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/pci/pci-sysfs.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-10-07Merge tag 'pm-6.18-rc1-2' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These are cpufreq fixes and cleanups on top of the material merged previously, a power management core code fix and updates of the runtime PM framework including unit tests, documentation updates and introduction of auto-cleanup macros for runtime PM "resume and get" and "get without resuming" operations. Specifics: - Make cpufreq drivers setting the default CPU transition latency to CPUFREQ_ETERNAL specify a proper default transition latency value instead which addresses a regression introduced during the 6.6 cycle that broke CPUFREQ_ETERNAL handling (Rafael Wysocki) - Make the cpufreq CPPC driver use a proper transition delay value when CPUFREQ_ETERNAL is returned by cppc_get_transition_latency() to indicate an error condition (Rafael Wysocki) - Make cppc_get_transition_latency() return a negative error code to indicate error conditions instead of using CPUFREQ_ETERNAL for this purpose and drop CPUFREQ_ETERNAL that has no other users (Rafael Wysocki, Gopi Krishna Menon) - Fix device leak in the mediatek cpufreq driver (Johan Hovold) - Set target frequency on all CPUs sharing a policy during frequency updates in the tegra186 cpufreq driver and make it initialize all cores to max frequencies (Aaron Kling) - Rust cpufreq helper cleanup (Thorsten Blum) - Make pm_runtime_put*() family of functions return 1 when the given device is already suspended which is consistent with the documentation (Brian Norris) - Add basic kunit tests for runtime PM API contracts and update return values in kerneldoc comments for the runtime PM API (Brian Norris, Dan Carpenter) - Add auto-cleanup macros for runtime PM "resume and get" and "get without resume" operations, use one of them in the PCI core and drop the existing "free" macro introduced for similar purpose, but somewhat cumbersome to use (Rafael Wysocki) - Make the core power management code avoid waiting on device links marked as SYNC_STATE_ONLY which is consistent with the handling of those device links elsewhere (Pin-yen Lin)" * tag 'pm-6.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: docs/zh_CN: Fix malformed table docs/zh_TW: Fix malformed table PM: runtime: Fix error checking for kunit_device_register() PM: runtime: Introduce one more usage counter guard cpufreq: Drop unused symbol CPUFREQ_ETERNAL ACPI: CPPC: Do not use CPUFREQ_ETERNAL as an error value cpufreq: CPPC: Avoid using CPUFREQ_ETERNAL as transition delay cpufreq: Make drivers using CPUFREQ_ETERNAL specify transition latency PM: runtime: Drop DEFINE_FREE() for pm_runtime_put() PCI/sysfs: Use runtime PM guard macro for auto-cleanup PM: runtime: Add auto-cleanup macros for "resume and get" operations cpufreq: tegra186: Initialize all cores to max frequencies cpufreq: tegra186: Set target frequency for all cpus in policy rust: cpufreq: streamline find_supply_names cpufreq: mediatek: fix device leak on probe failure PM: sleep: Do not wait on SYNC_STATE_ONLY device links PM: runtime: Update kerneldoc return codes PM: runtime: Make put{,_sync}() return 1 when already suspended PM: runtime: Add basic kunit tests for API contracts
2025-10-03Merge branch 'pci/misc'Bjorn Helgaas1-0/+21
- Fix whitespace issues (Li Jun) - Fix pci_acpi_preserve_config() memory leak (Nirmoy Das) - Add sysfs 'serial_number' file to expose the Device Serial Number (Matthew Wood) * pci/misc: PCI/sysfs: Expose PCI device serial number PCI/ACPI: Fix pci_acpi_preserve_config() memory leak
2025-10-03Merge branch 'pci/resource'Bjorn Helgaas1-7/+20
- Ensure relaxed tail alignment does not increase min_align when computing bridge window size, to fix a regression (Ilpo Järvinen) - Fix bridge window size computation to fix a regression for devices with undefined PCI class, e.g., Samsung [144d:a5a5] (Ilpo Järvinen) - Fix error handling during resource resize to fix a regression in amdgpu (Ilpo Järvinen) - Align m68k pcibios_enable_device() with other arches (Ilpo Järvinen) - Remove several sparc pcibios_enable_device() implementations that don't do anything beyond what pci_enable_resources() does (Ilpo Järvinen) - Remove mips pcibios_enable_resources() and use pci_enable_resources() instead (Ilpo Järvinen) - Refactor and simplify find_bus_resource_of_type() (Ilpo Järvinen) - Claim bridge windows before setting them up (Ilpo Järvinen) - Disable non-claimed bridge windows so the kernel's view matches the hardware configuration (Ilpo Järvinen) - Use pci_release_resource() instead of release_resource() to reduce code duplication and increase consistency (Ilpo Järvinen) - Enable bridges even if bridge window assignment fails (Ilpo Järvinen) - Preserve bridge window resource type flags when assignment fails because we may need it later (Ilpo Järvinen) - Add bridge window selection functions to make the selection consistent across the several places that do this (Ilpo Järvinen) - Warn if bridge window cannot be released when resizing BAR (Ilpo Järvinen) - Set up bridge resources before enumerating children so we can check whether child resources are inside bridge windows (Ilpo Järvinen) * pci/resource: PCI: Set up bridge resources earlier PCI: Don't print stale information about resource PCI: Alter misleading recursion to pci_bus_release_bridge_resources() PCI: Pass bridge window to pci_bus_release_bridge_resources() PCI: Add pci_setup_one_bridge_window() PCI: Refactor remove_dev_resources() to use pbus_select_window() PCI: Refactor distributing available memory to use loops PCI: Use pbus_select_window_for_type() during mem window sizing PCI: Use pbus_select_window() in space available checker PCI: Rename resource variable from r to res PCI: Use pbus_select_window_for_type() during IO window sizing PCI: Use pbus_select_window() during BAR resize PCI: Warn if bridge window cannot be released when resizing BAR PCI: Fix finding bridge window in pci_reassign_bridge_resources() PCI: Add bridge window selection functions PCI: Add defines for bridge window indexing PCI: Preserve bridge window resource type flags PCI: Enable bridge even if bridge window fails to assign PCI: Use pci_release_resource() instead of release_resource() PCI: Disable non-claimed bridge window PCI: Always claim bridge window before its setup PCI: Refactor find_bus_resource_of_type() logic checks PCI: Move find_bus_resource_of_type() earlier MIPS: PCI: Use pci_enable_resources() sparc/PCI: Remove pcibios_enable_device() as they do nothing extra m68k/PCI: Use pci_enable_resources() in pcibios_enable_device() PCI: Fix failure detection during resource resize PCI: Fix pdev_resources_assignable() disparity PCI: Ensure relaxed tail alignment does not increase min_align
2025-09-29PCI/sysfs: Use runtime PM guard macro for auto-cleanupRafael J. Wysocki1-2/+3
Use the newly introduced pm_runtime_active_try guard to simplify the code and add the proper error handling for PM runtime resume errors. Based on an earlier patch from Takashi Iwai <tiwai@suse.de> [1]. Link: https://patch.msgid.link/20250919163147.4743-3-tiwai@suse.de [1] Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
2025-09-24PCI/sysfs: Ensure devices are powered for config readsBrian Norris1-1/+19
The "max_link_width", "current_link_speed", "current_link_width", "secondary_bus_number", and "subordinate_bus_number" sysfs files all access config registers, but they don't check the runtime PM state. If the device is in D3cold or a parent bridge is suspended, we may see -EINVAL, bogus values, or worse, depending on implementation details. Wrap these access in pci_config_pm_runtime_{get,put}() like most of the rest of the similar sysfs attributes. Notably, "max_link_speed" does not access config registers; it returns a cached value since d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds"). Fixes: 56c1af4606f0 ("PCI: Add sysfs max_link_speed/width, current_link_speed/width, etc") Signed-off-by: Brian Norris <briannorris@google.com> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250924095711.v2.1.Ibb5b6ca1e2c059e04ec53140cd98a44f2684c668@changeid
2025-09-17PCI/sysfs: Expose PCI device serial numberMatthew Wood1-0/+21
Add a single sysfs read-only interface for reading PCI device serial numbers from userspace in a programmatic way. This device attribute uses the same hexadecimal 1-byte dashed formatting as lspci serial number capability output. If a device doesn't support the serial number capability, the serial_number sysfs attribute will not be visible. Signed-off-by: Matthew Wood <thepacketgeek@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mario Limonciello <superm1@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://patch.msgid.link/20250917125815.722952-2-thepacketgeek@gmail.com
2025-09-16PCI: Use pbus_select_window() during BAR resizeIlpo Järvinen1-7/+13
Prior to a BAR resize, __resource_resize_store() loops through the normal resources of the PCI device and releases those that match to the flags of the BAR to be resized. This is necessary to allow resizing also the upstream bridge window as only childless bridge windows can be resized. While the flags check (mostly) works (if corner cases are ignored), the more straightforward way is to check if the resources share the bridge window. Change __resource_resize_store() to do the check using pbus_select_window(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-16-ilpo.jarvinen@linux.intel.com
2025-09-16PCI: Preserve bridge window resource type flagsIlpo Järvinen1-0/+7
When a bridge window is found unused or fails to assign, the flags of the associated resource are cleared. Clearing flags is problematic as it also removes the type information of the resource which is needed later. Thus, always preserve the bridge window type flags and use IORESOURCE_UNSET and IORESOURCE_DISABLED to indicate the status of the bridge window. Also, when initializing resources, make sure all valid bridge windows do get their type flags set. Change various places that relied on resource flags being cleared to check for IORESOURCE_UNSET and IORESOURCE_DISABLED to allow bridge window resource to retain their type flags. Add pdev_resource_assignable() and pdev_resource_should_fit() helpers to filter out disabled bridge windows during resource fitting; the latter combines more common checks into the helper. When reading the bridge windows from the registers, instead of leaving the resource flags cleared for bridge windows that are not enabled, always set up the flags and set IORESOURCE_UNSET | IORESOURCE_DISABLED as needed. When resource fitting or assignment fails for a bridge window resource, or the bridge window is not needed, mark the resource with IORESOURCE_UNSET or IORESOURCE_DISABLED, respectively. Use dummy zero resource in resource_show() for backwards compatibility as lspci will otherwise misrepresent disabled bridge windows. This change fixes an issue which highlights the importance of keeping the resource type flags intact: At the end of __assign_resources_sorted(), reset_resource() is called, previously clearing the flags. Later, pci_prepare_next_assign_round() attempted to release bridge resources using pci_bus_release_bridge_resources() that calls into pci_bridge_release_resources() that assumes type flags are still present. As type flags were cleared, IORESOURCE_MEM_64 was not set leading to resources under an incorrect bridge window to be released (idx = 1 instead of idx = 2). While the assignments performed later covered this problem so that the wrongly released resources got assigned in the end, it was still causing extra release+assign pairs. There are other reasons why the resource flags should be retained in upcoming changes too. Removing the flag reset for non-bridge window resource is left as future work, in part because it has a much higher regression potential due to pci_enable_resources() that will start to work also for those resources then and due to what endpoint drivers might assume about resources. Despite the Fixes tag, backporting this (at least any time soon) is highly discouraged. The issue fixed is borderline cosmetic as the later assignments normally cover the problem entirely. Also there might be non-obvious dependencies. Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250829131113.36754-11-ilpo.jarvinen@linux.intel.com
2025-06-17sysfs: treewide: switch back to attribute_group::bin_attrsThomas Weißschuh1-2/+2
The normal bin_attrs field can now handle const pointers. This makes the _new variant unnecessary. Switch all users back. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250530-sysfs-const-bin_attr-final-v3-4-724bfcf05b99@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-17sysfs: treewide: switch back to bin_attribute::read()/write()Thomas Weißschuh1-4/+4
The bin_attribute argument of bin_attribute::read() is now const. This makes the _new() callbacks unnecessary. Switch all users back. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250530-sysfs-const-bin_attr-final-v3-3-724bfcf05b99@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-04Merge branch 'pci/pm'Bjorn Helgaas1-0/+3
- Add pm_runtime_put() cleanup helper for use with __free() to automatically drop the device usage count when a pointer goes out of scope (Alex Williamson) - Increment PM usage counter when probing reset methods so we don't try to read config space of a powered-off device (Alex Williamson) - Set all devices to D0 during enumeration to ensure ACPI opregion is connected via _REG (Mario Limonciello) * pci/pm: PCI: Explicitly put devices into D0 when initializing PCI: Increment PM usage counter when probing reset methods PM: runtime: Define pm_runtime_put cleanup helper
2025-05-23PCI/AER: Add sysfs attributes for log ratelimitsJon Pan-Doh1-0/+1
Allow userspace to read/write log ratelimits per device (including enable/disable). Create aer/ sysfs directory to store them and any future AER configs. The new sysfs files are: /sys/bus/pci/devices/*/aer/correctable_ratelimit_burst /sys/bus/pci/devices/*/aer/correctable_ratelimit_interval_ms /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_burst /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_interval_ms The default values are ratelimit_burst=10, ratelimit_interval_ms=5000, so if we try to emit more than 10 messages in a 5 second period, some are suppressed. Update AER sysfs ABI filename to reflect the broader scope of AER sysfs attributes (e.g. stats and ratelimits). Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats -> sysfs-bus-pci-devices-aer Tested using aer-inject[1]. Configured correctable log ratelimit to 5. Sent 6 AER errors. Observed 5 errors logged while AER stats (cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) shows 6. Disabled ratelimiting and sent 6 more AER errors. Observed all 6 errors logged and accounted in AER stats (12 total errors). [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git [bhelgaas: note fatal errors are not ratelimited, "aer_report" -> "aer_info", replace ratelimit_log_enable toggle with *_ratelimit_interval_ms] Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Jon Pan-Doh <pandoh@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-21-helgaas@kernel.org
2025-04-23PCI: Increment PM usage counter when probing reset methodsAlex Williamson1-0/+3
We can get different results probing reset methods for a device depending on its power state. For example, reading the PM control register of a device in D3cold will always indicate NoSoftRst+ because we get ~0 data when the config read fails on PCI, preventing us from correctly probing PM reset support. Increment the PM usage counter before any probes and use the cleanup __free facility to automatically drop the usage counter out of scope. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250422230534.2295291-3-alex.williamson@redhat.com
2025-03-27Merge branch 'pci/resource'Bjorn Helgaas1-2/+6
- Use pci_resource_n() to simplify BAR/window resource lookup (Ilpo Järvinen) - Fix typo that repeatedly distributed resources to a bridge instead of iterating over subordinate bridges, which resulted in too little space to assign some BARs (Kai-Heng Feng) - Relax bridge window tail sizing for optional resources, e.g., IOV BARs, to avoid failures when removing and re-adding devices (Ilpo Järvinen) - Fix a double counting error for I/O resources, as we previously did for memory resources (Ilpo Järvinen) - Use resource_set_{range,size}() helpers in more places (Ilpo Järvinen) - Add pci_resource_is_iov() to identify IOV resources (Ilpo Järvinen) - Add pci_resource_num() to look up the BAR number from the resource pointer (Ilpo Järvinen) - Add restore_dev_resource() to simplify code that resources saved device resources (Ilpo Järvinen) - Allow drivers to enable devices even if we haven't assigned optional IOV resources to them (Ilpo Järvinen) - Improve debug output during resource reallocation (Ilpo Järvinen) - Rework handling of optional resources (IOV BARs, ROMs) to reduce failures if we can't allocate them (Ilpo Järvinen) - Move declarations of pci_rescan_bus_bridge_resize(), pci_reassign_bridge_resources(), and CardBus-related sizes from include/linux/pci.h to drivers/pci/pci.h since they're not used outside the PCI core (Ilpo Järvinen) - Make pci_setup_bridge() static (Ilpo Järvinen) - Fix a NULL dereference in the SR-IOV VF creation error path (Shay Drory) - Fix s390 mmio_read/write syscalls, which didn't cause page faults in some cases, which broke vfio-pci lazy mapping on first access (Niklas Schnelle) - Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which was disabled only for s390 (Niklas Schnelle) - Support mmap of PCI resources on s390 except for ISM devices (Niklas Schnelle) * pci/resource: s390/pci: Support mmap() of PCI resources except for ISM devices s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP s390/pci: Fix s390_mmio_read/write syscall page fault handling PCI: Fix NULL dereference in SR-IOV VF creation error path PCI: Move cardbus IO size declarations into pci/pci.h PCI: Make pci_setup_bridge() static PCI: Move resource reassignment func declarations into pci/pci.h PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.h PCI: Fix BAR resizing when VF BARs are assigned PCI: Do not claim to release resource falsely PCI: Increase Resizable BAR support from 512 GB to 128 TB PCI: Rework optional resource handling PCI: Perform reset_resource() and build fail list in sync PCI: Use res->parent to check if resource is assigned PCI: Add debug print when releasing resources before retry PCI: Indicate optional resource assignment failures PCI: Always have realloc_head in __assign_resources_sorted() PCI: Extend enable to check for any optional resource PCI: Add restore_dev_resource() PCI: Remove incorrect comment from pci_reassign_resource() PCI: Consolidate assignment loop next round preparation PCI: Rename retval to ret PCI: Use while loop and break instead of gotos PCI: Refactor pdev_sort_resources() & __dev_sort_resources() PCI: Converge return paths in __assign_resources_sorted() PCI: Add dev & res local variables to resource assignment funcs PCI: Add pci_resource_num() helper PCI: Check resource_size() separately PCI: Add pci_resource_is_iov() to identify IOV resources PCI: Use resource_set_{range,size}() helpers PCI: Use SZ_* instead of literals in setup-bus.c PCI: Fix old_size lower bound in calculate_iosize() too PCI: Allow relaxed bridge window tail sizing for optional resources PCI: Simplify size1 assignment logic PCI: Use min_align, not unrelated add_align, for size0 PCI: Remove add_align overwrite unrelated to size0 PCI: Use downstream bridges for distributing resources PCI: Cleanup dev->resource + resno to use pci_resource_n()
2025-03-21PCI/DOE: Expose DOE features via sysfsAlistair Francis1-0/+3
PCIe r6.0 added support for Data Object Exchange (DOE). When DOE is supported, the DOE Discovery Feature must be implemented per PCIe r6.1, sec 6.30.1.1. DOE allows a requester to obtain information about the other DOE features supported by the device. The kernel already queries the DOE features supported and caches the values. Expose the values in sysfs to allow user space to determine which DOE features are supported by the PCIe device. By exposing the information to userspace, tools like lspci can relay the information to users. By listing all of the supported features we can allow userspace to parse the list, which might include vendor specific features as well as yet to be supported features. As the DOE Discovery feature must always be supported we treat it as a special named attribute case. This allows the usual PCI attribute_group handling to correctly create the doe_features directory when registering pci_doe_sysfs_group (otherwise it doesn't and sysfs_add_file_to_group() will seg fault). After this patch is supported you can see something like this when attaching a DOE device: $ ls /sys/devices/pci0000:00/0000:00:02.0//doe* 0001:01 0001:02 doe_discovery Link: https://lore.kernel.org/r/20250306075211.1855177-3-alistair@alistair23.me Signed-off-by: Alistair Francis <alistair@alistair23.me> [bhelgaas: drop pci_doe_sysfs_init() stub return, make DEVICE_ATTR_RO(doe_discovery) static] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-03-21s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAPNiklas Schnelle1-0/+4
The ability to map PCI resources to user-space is controlled by global defines. For vfio there is VFIO_PCI_MMAP which is only disabled on s390 and controls mapping of PCI resources using vfio-pci with a fallback option via the pread()/pwrite() interface. For the PCI core there is ARCH_GENERIC_PCI_MMAP_RESOURCE which enables a generic implementation for mapping PCI resources plus the newer sysfs interface. Then there is HAVE_PCI_MMAP which can be used with custom definitions of pci_mmap_resource_range() and the historical /proc/bus/pci interface. Both mechanisms are all or nothing. For s390 mapping PCI resources is possible and useful for testing and certain applications such as QEMU's vfio-pci based user-space NVMe driver. For certain devices, however access to PCI resources via mappings to user-space is not possible and these must be excluded from the general PCI resource mapping mechanisms. Introduce pdev->non_mappable_bars to indicate that a PCI device's BARs can not be accessed via mappings to user-space. In the future this enables per-device restrictions of PCI resource mapping. For now, set this flag for all PCI devices on s390 in line with the existing, general disable of PCI resource mapping. As s390 is the only user of the VFI_PCI_MMAP Kconfig options this can already be replaced with a check of this new flag. Also add similar checks in the other code protected by HAVE_PCI_MMAP respectively ARCH_GENERIC_PCI_MMAP in preparation for enabling these for supported devices. Link: https://lore.kernel.org/lkml/20250212132808.08dcf03c.alex.williamson@redhat.com/ Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-2-c5c0f1d26efd@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20PCI: Fix BAR resizing when VF BARs are assignedIlpo Järvinen1-2/+2
__resource_resize_store() attempts to release all resources of the device before attempting the resize. The loop, however, only covers standard BARs (< PCI_STD_NUM_BARS). If a device has VF BARs that are assigned, pci_reassign_bridge_resources() finds the bridge window still has some assigned child resources and returns -NOENT which makes pci_resize_resource() to detect an error and abort the resize. Change the release loop to cover all resources up to VF BARs which allows the resize operation to release the bridge windows and attempt to assigned them again with the different size. If SR-IOV is enabled, disallow resize as it requires releasing also IOV resources. Link: https://lore.kernel.org/r/20250320142837.8027-1-ilpo.jarvinen@linux.intel.com Fixes: 91fa127794ac ("PCI: Expose PCIe Resizable BAR support via sysfs") Reported-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2025-01-23Merge branch 'pci/misc'Bjorn Helgaas1-21/+21
- Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM ACPI hotplug driver (Thomas Weißschuh) - Update PCI_EXP_LNKCAP_SLS comment (Lukas Wunner) - Drop superfluous pm_wakeup.h include (Wolfram Sang) - Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong Zhang) - Correct documentation of the 'config_acs=' kernel parameter (Akihiko Odaki) * pci/misc: Documentation: Fix pci=config_acs= example PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT PCI: Don't include 'pm_wakeup.h' directly PCI: Update code comment on PCI_EXP_LNKCAP_SLS for PCIe r3.0 PCI/ACPI: Constify 'struct bin_attribute' PCI/P2PDMA: Constify 'struct bin_attribute' PCI/VPD: Constify 'struct bin_attribute' PCI/sysfs: Constify 'struct bin_attribute'
2025-01-15PCI/sysfs: Remove unnecessary zero in initializerIlpo Järvinen1-1/+1
Providing empty initializer for an array is enough to set its elements to zero. Thus, remove the redundant 0 from the initializer. Link: https://lore.kernel.org/r/20241028174046.1736-4-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-01-15PCI/sysfs: Use __free() in reset_method_store()Ilpo Järvinen1-11/+7
Use __free() from cleanup.h to handle freeing options in reset_method_store() as it simplifies the code flow. Link: https://lore.kernel.org/r/20241028174046.1736-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-01-15PCI/sysfs: Move reset related sysfs code to correct fileIlpo Järvinen1-0/+112
Most PCI sysfs code and structs are in a dedicated file but a few reset related things remain in pci.c. Move also them to pci-sysfs.c and drop pci_dev_reset_method_attr_is_visible() as it is 100% duplicate of pci_dev_reset_attr_is_visible(). Link: https://lore.kernel.org/r/20241028174046.1736-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2024-12-03PCI/sysfs: Constify 'struct bin_attribute'Thomas Weißschuh1-21/+21
The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Link: https://lore.kernel.org/r/20241202-sysfs-const-bin_attr-pci-v1-1-c32360f495a7@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-11-29Merge tag 'driver-core-6.13-rc1' of ↵Linus Torvalds1-19/+23
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is a small set of driver core changes for 6.13-rc1. Nothing major for this merge cycle, except for the two simple merge conflicts are here just to make life interesting. Included in here are: - sysfs core changes and preparations for more sysfs api cleanups that can come through all driver trees after -rc1 is out - fw_devlink fixes based on many reports and debugging sessions - list_for_each_reverse() removal, no one was using it! - last-minute seq_printf() format string bug found and fixed in many drivers all at once. - minor bugfixes and changes full details in the shortlog" * tag 'driver-core-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (35 commits) Fix a potential abuse of seq_printf() format string in drivers cpu: Remove spurious NULL in attribute_group definition s390/con3215: Remove spurious NULL in attribute_group definition perf: arm-ni: Remove spurious NULL in attribute_group definition driver core: Constify bin_attribute definitions sysfs: attribute_group: allow registration of const bin_attribute firmware_loader: Fix possible resource leak in fw_log_firmware_info() drivers: core: fw_devlink: Fix excess parameter description in docstring driver core: class: Correct WARN() message in APIs class_(for_each|find)_device() cacheinfo: Use of_property_present() for non-boolean properties cdx: Fix cdx_mmap_resource() after constifying attr in ->mmap() drivers: core: fw_devlink: Make the error message a bit more useful phy: tegra: xusb: Set fwnode for xusb port devices drm: display: Set fwnode for aux bus devices driver core: fw_devlink: Stop trying to optimize cycle detection logic driver core: Constify attribute arguments of binary attributes sysfs: bin_attribute: add const read/write callback variants sysfs: implement all BIN_ATTR_* macros in terms of __BIN_ATTR() sysfs: treewide: constify attribute callback of bin_attribute::llseek() sysfs: treewide: constify attribute callback of bin_attribute::mmap() ...
2024-11-13PCI: Add 'reset_subordinate' to reset hierarchy below bridgeKeith Busch1-0/+26
The "bus" and "cxl_bus" reset methods reset a device by asserting Secondary Bus Reset on the bridge leading to the device. These only work if the device is the only device below the bridge. Add a sysfs 'reset_subordinate' attribute on bridges that can assert Secondary Bus Reset regardless of how many devices are below the bridge. This resets all the devices below a bridge in a single command, including the locking and config space save/restore that reset methods normally do. This may be the only way to reset devices that don't support other reset methods (ACPI, FLR, PM reset, etc). Link: https://lore.kernel.org/r/20241025222755.3756162-1-kbusch@meta.com Signed-off-by: Keith Busch <kbusch@kernel.org> [bhelgaas: commit log, add capable(CAP_SYS_ADMIN) check] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Amey Narkhede <ameynarkhede03@gmail.com>
2024-11-05sysfs: treewide: constify attribute callback of bin_attribute::llseek()Thomas Weißschuh1-1/+1
The llseek() callbacks should not modify the struct bin_attribute passed as argument. Enforce this by marking the argument as const. As there are not many callback implementers perform this change throughout the tree at once. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-7-71110628844c@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05sysfs: treewide: constify attribute callback of bin_attribute::mmap()Thomas Weißschuh1-5/+5
The mmap() callbacks should not modify the struct bin_attribute passed as argument. Enforce this by marking the argument as const. As there are not many callback implementers perform this change throughout the tree at once. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl Acked-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-6-71110628844c@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05sysfs: treewide: constify attribute callback of bin_is_visible()Thomas Weißschuh1-1/+1
The is_bin_visible() callbacks should not modify the struct bin_attribute passed as argument. Enforce this by marking the argument as const. As there are not many callback implementers perform this change throughout the tree at once. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-5-71110628844c@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05PCI/sysfs: Calculate bin_attribute size through bin_size()Thomas Weißschuh1-12/+16
Stop abusing the is_bin_visible() callback to calculate the attribute size. Instead use the new, dedicated bin_size() one. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-3-71110628844c@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-09s390/pci: Stop usurping pdev->dev.groupsLukas Wunner1-0/+5
Bjorn suggests using pdev->dev.groups for attribute_groups constructed on PCI device enumeration: "Is it feasible to build an attribute group in pci_doe_init() and add it to dev->groups so device_add() will automatically add them?" https://lore.kernel.org/r/20231019165829.GA1381099@bhelgaas Unfortunately on s390, pcibios_device_add() usurps pdev->dev.groups for arch-specific attribute_groups, preventing its use for anything else. Introduce an ARCH_PCI_DEV_GROUPS macro which arches can define in <asm/pci.h>. The macro is visible in drivers/pci/pci-sysfs.c through the inclusion of <linux/pci.h>, which in turn includes <asm/pci.h>. On s390, define the macro to the three attribute_groups previously assigned to pdev->dev.groups. Thereby pdev->dev.groups is made available for use by the PCI core. As a side effect, arch/s390/pci/pci_sysfs.c no longer needs to be compiled into the kernel if CONFIG_SYSFS=n. Link: https://lore.kernel.org/r/7b970f7923e373d1b23784721208f93418720485.1722870934.git.lukas@wunner.de Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
2024-03-05PCI/sysfs: Demacrofy pci_dev_resource_resize_attr(n) functionsIlpo Järvinen1-64/+74
pci_dev_resource_resize_attr(n) macro is invoked for six resources, creating a large footprint function for each resource. Rework the macro to only create a function that calls a helper function so the compiler can decide if it warrants to inline the function or not. With x86_64 defconfig, this saves roughly 2.5kB: $ scripts/bloat-o-meter drivers/pci/pci-sysfs.o{.old,.new} add/remove: 1/0 grow/shrink: 0/6 up/down: 512/-2934 (-2422) Function old new delta __resource_resize_store - 512 +512 resource5_resize_store 503 14 -489 resource4_resize_store 503 14 -489 resource3_resize_store 503 14 -489 resource2_resize_store 503 14 -489 resource1_resize_store 503 14 -489 resource0_resize_store 500 11 -489 Total: Before=13399, After=10977, chg -18.08% (The compiler seemingly chose to still inline __resource_resize_show() which is fine, those functions are not very complex/large.) Link: https://lore.kernel.org/r/20240222114607.1837-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-03-05PCI/sysfs: Compile pci-sysfs.c only if CONFIG_SYSFS=yLukas Wunner1-28/+1
It is possible to enable CONFIG_PCI but disable CONFIG_SYSFS and for space-constrained devices such as routers, such a configuration may actually make sense. However pci-sysfs.c is compiled even if CONFIG_SYSFS is disabled, unnecessarily increasing the kernel's size. To rectify that: * Move pci_mmap_fits() to mmap.c. It is not only needed by pci-sysfs.c, but also proc.c. * Move pci_dev_type to probe.c and make it private. It references pci_dev_attr_groups in pci-sysfs.c. Make that public instead for consistency with pci_dev_groups, pcibus_groups and pci_bus_groups, which are likewise public and referenced by struct definitions in pci-driver.c and probe.c. * Define pci_dev_groups, pci_dev_attr_groups, pcibus_groups and pci_bus_groups to NULL if CONFIG_SYSFS is disabled. Provide empty static inlines for pci_{create,remove}_legacy_files() and pci_{create,remove}_sysfs_dev_files(). Result: vmlinux size is reduced by 122996 bytes in my arm 32-bit test build. Link: https://lore.kernel.org/r/85ca95ae8e4d57ccf082c5c069b8b21eb141846e.1698668982.git.lukas@wunner.de Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
2023-11-03Merge tag 'driver-core-6.7-rc1' of ↵Linus Torvalds1-1/+25
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of driver core updates for 6.7-rc1. Nothing major in here at all, just a small number of changes including: - minor cleanups and updates from Andy Shevchenko - __counted_by addition - firmware_loader update for aborting loads cleaner - other minor changes, details in the shortlog - documentation update All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (21 commits) firmware_loader: Abort all upcoming firmware load request once reboot triggered firmware_loader: Refactor kill_pending_fw_fallback_reqs() Documentation: security-bugs.rst: linux-distros relaxed their rules driver core: Release all resources during unbind before updating device links driver core: class: remove boilerplate code driver core: platform: Annotate struct irq_affinity_devres with __counted_by resource: Constify resource crosscheck APIs resource: Unify next_resource() and next_resource_skip_children() resource: Reuse for_each_resource() macro PCI: Implement custom llseek for sysfs resource entries kernfs: sysfs: support custom llseek method for sysfs entries debugfs: Fix __rcu type comparison warning device property: Replace custom implementation of COUNT_ARGS() drivers: base: test: Make property entry API test modular driver core: Add missing parameter description to __fwnode_link_add() device property: Clarify usage scope of some struct fwnode_handle members devres: rename the first parameter of devm_add_action(_or_reset) driver core: platform: Unify the firmware node type check driver core: platform: Use temporary variable in platform_device_add() driver core: platform: Refactor error path in a couple places ...
2023-10-28Merge branch 'pci/field-get'Bjorn Helgaas1-3/+2
- Use FIELD_GET()/FIELD_PREP() when possible throughout drivers/pci/ (Ilpo Järvinen, Bjorn Helgaas) - Rework DPC control programming for clarity (Ilpo Järvinen) * pci/field-get: PCI/portdrv: Use FIELD_GET() PCI/VC: Use FIELD_GET() PCI/PTM: Use FIELD_GET() PCI/PME: Use FIELD_GET() PCI/ATS: Use FIELD_GET() PCI/ATS: Show PASID Capability register width in bitmasks PCI: Use FIELD_GET() in Sapphire RX 5600 XT Pulse quirk PCI: Use FIELD_GET() PCI/MSI: Use FIELD_GET/PREP() PCI/DPC: Use defines with DPC reason fields PCI/DPC: Use defined fields with DPC_CTL register PCI/DPC: Use FIELD_GET() PCI: hotplug: Use FIELD_GET/PREP() PCI: dwc: Use FIELD_GET/PREP() PCI: cadence: Use FIELD_GET() PCI: Use FIELD_GET() to extract Link Width PCI: mvebu: Use FIELD_PREP() with Link Width PCI: tegra194: Use FIELD_GET()/FIELD_PREP() with Link Width fields # Conflicts: # drivers/pci/controller/dwc/pcie-tegra194.c
2023-10-28Merge branch 'pci/vga'Bjorn Helgaas1-4/+3
- Add pci_is_vga() helper, which checks for both PCI_CLASS_DISPLAY_VGA and PCI_CLASS_NOT_DEFINED_VGA (which catches ancient devices built before Class Codes were defined) (Sui Jingfeng) - Use the new pci_is_vga() to identify devices for the VGA arbiter, the sysfs "boot_vga" attribute, and the virtio and qxl drivers (SUi Jingfeng) * pci/vga: drm/qxl: Use pci_is_vga() to identify VGA devices drm/virtio: Use pci_is_vga() to identify VGA devices PCI/sysfs: Enable 'boot_vga' attribute via pci_is_vga() PCI/VGA: Select VGA devices earlier PCI/VGA: Use pci_is_vga() to identify VGA devices PCI: Add pci_is_vga() helper
2023-10-17PCI: Use FIELD_GET() to extract Link WidthIlpo Järvinen1-3/+2
Use FIELD_GET() to extract PCIe Negotiated and Maximum Link Width fields instead of custom masking and shifting. Link: https://lore.kernel.org/r/20230919125648.1920-7-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: drop duplicate include of <linux/bitfield.h>] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2023-10-06PCI/sysfs: Enable 'boot_vga' attribute via pci_is_vga()Sui Jingfeng1-4/+3
Enable the 'boot_vga' sysfs attribute via pci_is_vga(). This exposes 'boot_vga' for old PCI_CLASS_NOT_DEFINED_VGA (0x0001) devices as well as for the PCI_CLASS_DISPLAY_VGA (0x0300) devices where it was previously exposed. Link: https://lore.kernel.org/r/20230830111532.444535-4-sui.jingfeng@linux.dev Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: "Maciej W. Rozycki" <macro@orcam.me.uk>
2023-10-05PCI: Implement custom llseek for sysfs resource entriesValentine Sinitsyn1-1/+25
Since commit 636b21b50152 ("PCI: Revoke mappings like devmem"), mmappable sysfs entries have started to receive their f_mapping from the iomem pseudo filesystem, so that CONFIG_IO_STRICT_DEVMEM is honored in sysfs (and procfs) as well as in /dev/[k]mem. This resulted in a userspace-visible regression: 1. Open a sysfs PCI resource file (eg. /sys/bus/pci/devices/*/resource0) 2. Use lseek(fd, 0, SEEK_END) to determine its size Expected result: a PCI region size is returned. Actual result: 0 is returned. The reason is that PCI resource files residing in sysfs use generic_file_llseek(), which relies on f_mapping->host inode to get the file size. As f_mapping is now redefined, f_mapping->host points to an anonymous zero-sized iomem_inode which has nothing to do with sysfs file in question. Implement a custom llseek method for sysfs PCI resources, which is almost the same as proc_bus_pci_lseek() used for procfs entries. This makes sysfs and procfs entries consistent with regards to seeking, but also introduces userspace-visible changes to seeking PCI resources in sysfs: - SEEK_DATA and SEEK_HOLE are no longer supported; - Seeking past the end of the file is prohibited while previously offsets up to MAX_NON_LFS were accepted (reading from these offsets was always invalid). Signed-off-by: Valentine Sinitsyn <valesini@yandex-team.ru> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20230925084013.309399-2-valesini@yandex-team.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-29PCI/sysfs: Protect driver's D3cold preference from user spaceLukas Wunner1-4/+1
struct pci_dev contains two flags which govern whether the device may suspend to D3cold: * no_d3cold provides an opt-out for drivers (e.g. if a device is known to not wake from D3cold) * d3cold_allowed provides an opt-out for user space (default is true, user space may set to false) Since commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend"), the user space setting overwrites the driver setting. Essentially user space is trusted to know better than the driver whether D3cold is working. That feels unsafe and wrong. Assume that the change was introduced inadvertently and do not overwrite no_d3cold when d3cold_allowed is modified. Instead, consider d3cold_allowed in addition to no_d3cold when choosing a suspend state for the device. That way, user space may opt out of D3cold if the driver hasn't, but it may no longer force an opt in if the driver has opted out. Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend") Link: https://lore.kernel.org/r/b8a7f4af2b73f6b506ad8ddee59d747cbf834606.1695025365.git.lukas@wunner.de Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Cc: stable@vger.kernel.org # v4.8+
2023-07-18PCI/sysfs: Make I/O resource depend on HAS_IOPORTNiklas Schnelle1-0/+4
If legacy I/O spaces are not supported simply return an error when trying to access them via pci_resource_io(). This allows inb() and friends to become undefined when they are known at compile time to be non-functional in a later patch. Co-developed-by: Arnd Bergmann <arnd@kernel.org> Link: https://lore.kernel.org/r/20230703135255.2202721-3-schnelle@linux.ibm.com Signed-off-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-03-23driver core: bus: mark the struct bus_type for sysfs callbacks as constantGreg Kroah-Hartman1-1/+1
struct bus_type should never be modified in a sysfs callback as there is nothing in the structure to modify, and frankly, the structure is almost never used in a sysfs callback, so mark it as constant to allow struct bus_type to be moved to read-only memory. Cc: "David S. Miller" <davem@davemloft.net> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Bounine <alex.bou9@gmail.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Ben Widawsky <bwidawsk@kernel.org> Cc: Dexuan Cui <decui@microsoft.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Harald Freudenberger <freude@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hu Haowen <src.res@email.cn> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Stuart Yoder <stuyoder@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Acked-by: Ilya Dryomov <idryomov@gmail.com> # rbd Acked-by: Ira Weiny <ira.weiny@intel.com> # cxl Reviewed-by: Alex Shi <alexs@kernel.org> Acked-by: Iwona Winiarska <iwona.winiarska@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi Link: https://lore.kernel.org/r/20230313182918.1312597-23-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-14Merge tag 'pci-v6.2-changes' of ↵Linus Torvalds1-4/+9
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Squash portdrv_{core,pci}.c into portdrv.c to ease maintenance and make more things static. - Make portdrv bind to Switch Ports that have AER. Previously, if these Ports lacked MSI/MSI-X, portdrv failed to bind, which meant the Ports couldn't be suspended to low-power states. AER on these Ports doesn't use interrupts, and the AER driver doesn't need to claim them. - Assign PCI domain IDs using ida_alloc(), which makes host bridge add/remove work better. Resource management: - To work better with recent BIOSes that use EfiMemoryMappedIO for PCI host bridge apertures, remove those regions from the E820 map (E820 entries normally prevent us from allocating BARs). In v5.19, we added some quirks to disable E820 checking, but that's not very maintainable. EfiMemoryMappedIO means the OS needs to map the region for use by EFI runtime services; it shouldn't prevent OS from using it. PCIe native device hotplug: - Build pciehp by default if USB4 is enabled, since Thunderbolt/USB4 PCIe tunneling depends on native PCIe hotplug. - Enable Command Completed Interrupt only if supported to avoid user confusion from lspci output that says this is enabled but not supported. - Prevent pciehp from binding to Switch Upstream Ports; this happened because of interaction with acpiphp and caused devices below the Upstream Port to disappear. Power management: - Convert AGP drivers to generic power management. We hope to remove legacy power management from the PCI core eventually. Virtualization: - Fix pci_device_is_present(), which previously always returned "false" for VFs, causing virtio hangs when unbinding the driver. Miscellaneous: - Convert drivers to gpiod API to prepare for dropping some legacy code. - Fix DOE fencepost error for the maximum data object length. Baikal-T1 PCIe controller driver: - Add driver and DT bindings. Broadcom STB PCIe controller driver: - Enable Multi-MSI. - Delay 100ms after PERST# deassert to allow power and clocks to stabilize. - Configure Read Completion Boundary to 64 bytes. Freescale i.MX6 PCIe controller driver: - Initialize PHY before deasserting core reset to fix a regression in v6.0 on boards where the PHY provides the reference. - Fix imx6sx and imx8mq clock names in DT schema. Intel VMD host bridge driver: - Fix Secondary Bus Reset on VMD bridges, which allows reset of NVMe SSDs in VT-d pass-through scenarios. - Disable MSI remapping, which gets re-enabled by firmware during suspend/resume. MediaTek PCIe Gen3 controller driver: - Add MT7986 and MT8195 support. Qualcomm PCIe controller driver: - Add SC8280XP/SA8540P basic interconnect support. Rockchip DesignWare PCIe controller driver: - Base DT schema on common Synopsys schema. Synopsys DesignWare PCIe core: - Collect DT items shared between Root Port and Endpoint (PERST GPIO, PHY info, clocks, resets, link speed, number of lanes, number of iATU windows, interrupt info, etc) to snps,dw-pcie-common.yaml. - Add dma-ranges support for Root Ports and Endpoints. - Consolidate DT resource retrieval for "dbi", "dbi2", "atu", etc. to reduce code duplication. - Add generic names for clocks and resets to encourage more consistent naming across drivers using DesignWare IP. - Stop advertising PTM Responder role for Endpoints, which aren't allowed to be responders. TI J721E PCIe driver: - Add j721s2 host mode ID to DT schema. - Add interrupt properties to DT schema. Toshiba Visconti PCIe controller driver: - Fix interrupts array max constraints in DT schema" * tag 'pci-v6.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (95 commits) x86/PCI: Use pr_info() when possible x86/PCI: Fix log message typo x86/PCI: Tidy E820 removal messages PCI: Skip allocate_resource() if too little space available efi/x86: Remove EfiMemoryMappedIO from E820 map PCI/portdrv: Allow AER service only for Root Ports & RCECs PCI: xilinx-nwl: Fix coding style violations PCI: mvebu: Switch to using gpiod API PCI: pciehp: Enable Command Completed Interrupt only if supported PCI: aardvark: Switch to using devm_gpiod_get_optional() dt-bindings: PCI: mediatek-gen3: add support for mt7986 dt-bindings: PCI: mediatek-gen3: add SoC based clock config dt-bindings: PCI: qcom: Allow 'dma-coherent' property PCI: mt7621: Add sentinel to quirks table PCI: vmd: Fix secondary bus reset for Intel bridges PCI: endpoint: pci-epf-vntb: Fix sparse ntb->reg build warning PCI: endpoint: pci-epf-vntb: Fix sparse build warning for epf_db PCI: endpoint: pci-epf-vntb: Replace hardcoded 4 with sizeof(u32) PCI: endpoint: pci-epf-vntb: Remove unused epf_db_phy struct member PCI: endpoint: pci-epf-vntb: Fix call pci_epc_mem_free_addr() in error path ...
2022-11-14PCI: Allow drivers to request exclusive config regionsIra Weiny1-0/+7
PCI config space access from user space has traditionally been unrestricted with writes being an understood risk for device operation. Unfortunately, device breakage or odd behavior from config writes lacks indicators that can leave driver writers confused when evaluating failures. This is especially true with the new PCIe Data Object Exchange (DOE) mailbox protocol where backdoor shenanigans from user space through things such as vendor defined protocols may affect device operation without complete breakage. A prior proposal restricted read and writes completely.[1] Greg and Bjorn pointed out that proposal is flawed for a couple of reasons. First, lspci should always be allowed and should not interfere with any device operation. Second, setpci is a valuable tool that is sometimes necessary and it should not be completely restricted.[2] Finally methods exist for full lock of device access if required. Even though access should not be restricted it would be nice for driver writers to be able to flag critical parts of the config space such that interference from user space can be detected. Introduce pci_request_config_region_exclusive() to mark exclusive config regions. Such regions trigger a warning and kernel taint if accessed via user space. Create pci_warn_once() to restrict the user from spamming the log. [1] https://lore.kernel.org/all/161663543465.1867664.5674061943008380442.stgit@dwillia2-desk3.amr.corp.intel.com/ [2] https://lore.kernel.org/all/YF8NGeGv9vYcMfTV@kroah.com/ Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20220926215711.2893286-2-ira.weiny@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-09PCI/sysfs: Fix double free in error pathSascha Hauer1-4/+9
When pci_create_attr() fails, pci_remove_resource_files() is called which will iterate over the res_attr[_wc] arrays and frees every non NULL entry. To avoid a double free here set the array entry only after it's clear we successfully initialized it. Fixes: b562ec8f74e4 ("PCI: Don't leak memory if sysfs_create_bin_file() fails") Link: https://lore.kernel.org/r/20221007070735.GX986@pengutronix.de/ Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org
2022-10-05PCI: Expose PCIe Resizable BAR support via sysfsAlex Williamson1-0/+108
Add a simple sysfs interface to Resizable BAR support, largely for the purposes of assigning such devices to a VM through VFIO. Resizable BARs present a difficult feature to expose to a VM through emulation, as resizing a BAR is done on the host. It can fail, and often does, but we have no means via emulation of a PCIe REBAR capability to handle the error cases. A vfio-pci specific ioctl interface is also cumbersome as there are often multiple devices within the same bridge aperture and handling them is a challenge. In the interface proposed here, expanding a BAR potentially requires such devices to be soft-removed during the resize operation and rescanned after, in order for all the necessary resources to be released. A pci-sysfs interface is also more universal than a vfio specific interface. Please see the ABI documentation update for usage. Link: https://lore.kernel.org/r/166336088796.3597940.14973499936692558556.stgit@omen Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Cc: Krzysztof Wilczyński <kw@linux.com>
2022-04-22PCI: Use driver_set_override() instead of open-codingKrzysztof Kozlowski1-24/+4
Use a helper to set driver_override to the reduce amount of duplicated code. Make the driver_override field const char, because it is not modified by the core and it matches other subsystems. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220419113435.246203-6-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-22PCI: Remove unused assignmentsBjorn Helgaas1-6/+1
Remove variables and assignments that are never used. Found by Krzysztof using cppcheck, e.g., $ cppcheck --enable=all --force uselessAssignmentPtrArg drivers/pci/proc.c:102 Assignment of function parameter has no effect outside the function. Did you forget dereferencing it? unreadVariable drivers/pci/setup-bus.c:1528 Variable 'old_flags' is assigned a value that is never used. Reported-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20220313192933.434746-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-12-09PCI/sysfs: Use pci_irq_vector()Thomas Gleixner1-5/+2
instead of fiddling with MSI descriptors. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211206210224.265589103@linutronix.de
2021-11-06Merge tag 'pci-v5.16-changes' of ↵Linus Torvalds1-15/+36
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Conserve IRQs by setting up portdrv IRQs only when there are users (Jan Kiszka) - Rework and simplify _OSC negotiation for control of PCIe features (Joerg Roedel) - Remove struct pci_dev.driver pointer since it's redundant with the struct device.driver pointer (Uwe Kleine-König) Resource management: - Coalesce contiguous host bridge apertures from _CRS to accommodate BARs that cover more than one aperture (Kai-Heng Feng) Sysfs: - Check CAP_SYS_ADMIN before parsing user input (Krzysztof Wilczyński) - Return -EINVAL consistently from "store" functions (Krzysztof Wilczyński) - Use sysfs_emit() in endpoint "show" functions to avoid buffer overruns (Kunihiko Hayashi) PCIe native device hotplug: - Ignore Link Down/Up caused by resets during error recovery so endpoint drivers can remain bound to the device (Lukas Wunner) Virtualization: - Avoid bus resets on Atheros QCA6174, where they hang the device (Ingmar Klein) - Work around Pericom PI7C9X2G switch packet drop erratum by using store and forward mode instead of cut-through (Nathan Rossi) - Avoid trying to enable AtomicOps on VFs; the PF setting applies to all VFs (Selvin Xavier) MSI: - Document that /sys/bus/pci/devices/.../irq contains the legacy INTx interrupt or the IRQ of the first MSI (not MSI-X) vector (Barry Song) VPD: - Add pci_read_vpd_any() and pci_write_vpd_any() to access anywhere in the possible VPD space; use these to simplify the cxgb3 driver (Heiner Kallweit) Peer-to-peer DMA: - Add (not subtract) the bus offset when calculating DMA address (Wang Lu) ASPM: - Re-enable LTR at Downstream Ports so they don't report Unsupported Requests when reset or hot-added devices send LTR messages (Mingchuang Qiao) Apple PCIe controller driver: - Add driver for Apple M1 PCIe controller (Alyssa Rosenzweig, Marc Zyngier) Cadence PCIe controller driver: - Return success when probe succeeds instead of falling into error path (Li Chen) HiSilicon Kirin PCIe controller driver: - Reorganize PHY logic and add support for external PHY drivers (Mauro Carvalho Chehab) - Support PERST# GPIOs for HiKey970 external PEX 8606 bridge (Mauro Carvalho Chehab) - Add Kirin 970 support (Mauro Carvalho Chehab) - Make driver removable (Mauro Carvalho Chehab) Intel VMD host bridge driver: - If IOMMU supports interrupt remapping, leave VMD MSI-X remapping enabled (Adrian Huang) - Number each controller so we can tell them apart in /proc/interrupts (Chunguang Xu) - Avoid building on UML because VMD depends on x86 bare metal APIs (Johannes Berg) Marvell Aardvark PCIe controller driver: - Define macros for PCI_EXP_DEVCTL_PAYLOAD_* (Pali Rohár) - Set Max Payload Size to 512 bytes per Marvell spec (Pali Rohár) - Downgrade PIO Response Status messages to debug level (Marek Behún) - Preserve CRS SV (Config Request Retry Software Visibility) bit in emulated Root Control register (Pali Rohár) - Fix issue in configuring reference clock (Pali Rohár) - Don't clear status bits for masked interrupts (Pali Rohár) - Don't mask unused interrupts (Pali Rohár) - Avoid code repetition in advk_pcie_rd_conf() (Marek Behún) - Retry config accesses on CRS response (Pali Rohár) - Simplify emulated Root Capabilities initialization (Pali Rohár) - Fix several link training issues (Pali Rohár) - Fix link-up checking via LTSSM (Pali Rohár) - Fix reporting of Data Link Layer Link Active (Pali Rohár) - Fix emulation of W1C bits (Marek Behún) - Fix MSI domain .alloc() method to return zero on success (Marek Behún) - Read entire 16-bit MSI vector in MSI handler, not just low 8 bits (Marek Behún) - Clear Root Port I/O Space, Memory Space, and Bus Master Enable bits at startup; PCI core will set those as necessary (Pali Rohár) - When operating as a Root Port, set class code to "PCI Bridge" instead of the default "Mass Storage Controller" (Pali Rohár) - Add emulation for PCI_BRIDGE_CTL_BUS_RESET since aardvark doesn't implement this per spec (Pali Rohár) - Add emulation of option ROM BAR since aardvark doesn't implement this per spec (Pali Rohár) MediaTek MT7621 PCIe controller driver: - Add MediaTek MT7621 PCIe host controller driver and DT binding (Sergio Paracuellos) Qualcomm PCIe controller driver: - Add SC8180x compatible string (Bjorn Andersson) - Add endpoint controller driver and DT binding (Manivannan Sadhasivam) - Restructure to use of_device_get_match_data() (Prasad Malisetty) - Add SC7280-specific pcie_1_pipe_clk_src handling (Prasad Malisetty) Renesas R-Car PCIe controller driver: - Remove unnecessary includes (Geert Uytterhoeven) Rockchip DesignWare PCIe controller driver: - Add DT binding (Simon Xue) Socionext UniPhier Pro5 controller driver: - Serialize INTx masking/unmasking (Kunihiko Hayashi) Synopsys DesignWare PCIe controller driver: - Run dwc .host_init() method before registering MSI interrupt handler so we can deal with pending interrupts left by bootloader (Bjorn Andersson) - Clean up Kconfig dependencies (Andy Shevchenko) - Export symbols to allow more modular drivers (Luca Ceresoli) TI DRA7xx PCIe controller driver: - Allow host and endpoint drivers to be modules (Luca Ceresoli) - Enable external clock if present (Luca Ceresoli) TI J721E PCIe driver: - Disable PHY when probe fails after initializing it (Christophe JAILLET) MicroSemi Switchtec management driver: - Return error to application when command execution fails because an out-of-band reset has cleared the device BARs, Memory Space Enable, etc (Kelvin Cao) - Fix MRPC error status handling issue (Kelvin Cao) - Mask out other bits when reading of management VEP instance ID (Kelvin Cao) - Return EOPNOTSUPP instead of ENOTSUPP from sysfs show functions (Kelvin Cao) - Add check of event support (Logan Gunthorpe) Miscellaneous: - Remove unused pci_pool wrappers, which have been replaced by dma_pool (Cai Huoqing) - Use 'unsigned int' instead of bare 'unsigned' (Krzysztof Wilczyński) - Use kstrtobool() directly, sans strtobool() wrapper (Krzysztof Wilczyński) - Fix some sscanf(), sprintf() format mismatches (Krzysztof Wilczyński) - Update PCI subsystem information in MAINTAINERS (Krzysztof Wilczyński) - Correct some misspellings (Krzysztof Wilczyński)" * tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (137 commits) PCI: Add ACS quirk for Pericom PI7C9X2G switches PCI: apple: Configure RID to SID mapper on device addition iommu/dart: Exclude MSI doorbell from PCIe device IOVA range PCI: apple: Implement MSI support PCI: apple: Add INTx and per-port interrupt support PCI: kirin: Allow removing the driver PCI: kirin: De-init the dwc driver PCI: kirin: Disable clkreq during poweroff sequence PCI: kirin: Move the power-off code to a common routine PCI: kirin: Add power_off support for Kirin 960 PHY PCI: kirin: Allow building it as a module PCI: kirin: Add MODULE_* macros PCI: kirin: Add Kirin 970 compatible PCI: kirin: Support PERST# GPIOs for HiKey970 external PEX 8606 bridge PCI: apple: Set up reference clocks when probing PCI: apple: Add initial hardware bring-up PCI: of: Allow matching of an interrupt-map local to a PCI device of/irq: Allow matching of an interrupt-map local to an interrupt controller irqdomain: Make of_phandle_args_to_fwspec() generally available PCI: Do not enable AtomicOps on VFs ...
2021-11-05Merge branch 'pci/sysfs'Bjorn Helgaas1-14/+13
- Check for CAP_SYS_ADMIN before validating sysfs user input, not after (Krzysztof Wilczyński) - Always return -EINVAL from sysfs "store" functions for invalid user input instead of -EINVAL sometimes and -ERANGE others (Krzysztof Wilczyński) - Use kstrtobool() directly instead of the strtobool() wrapper (Krzysztof Wilczyński) * pci/sysfs: PCI: Use kstrtobool() directly, sans strtobool() wrapper PCI/sysfs: Return -EINVAL consistently from "store" functions PCI/sysfs: Check CAP_SYS_ADMIN before parsing user input # Conflicts: # drivers/pci/iov.c
2021-10-18PCI/sysfs: Explicitly show first MSI IRQ for 'irq'Barry Song1-1/+23
The sysfs "irq" file contains the legacy INTx IRQ. Or, if the device has MSI enabled, it contains the first MSI IRQ instead. Previously this file showed the pci_dev.irq value directly. But we'd prefer to use pci_dev.irq only for the INTx IRQ and decouple that from any MSI or MSI-X IRQs. If the device has MSI enabled, explicitly look up and show the first MSI IRQ in the sysfs "irq" file. Otherwise, show the INTx IRQ. This removes the requirement that msi_capability_init() set pci_dev.irq to the first MSI IRQ when enabling MSI and pci_msi_shutdown() restore the INTx IRQ when disabling MSI. [bhelgaas: commit log] Link: https://lore.kernel.org/r/20210825102636.52757-3-21cnbao@gmail.com Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>