| Age | Commit message (Collapse) | Author | Files | Lines |
|
Cross-merge networking fixes after downstream PR (net-6.17-rc8).
Conflicts:
tools/testing/selftests/drivers/net/bonding/Makefile
87951b566446 selftests: bonding: add test for passive LACP mode
c2377f1763e9 selftests: bonding: add test for LACP actor port priority
Adjacent changes:
drivers/net/ethernet/cadence/macb.h
fca3dc859b20 net: macb: remove illusion about TBQPH/RBQPH being per-queue
89934dbf169e net: macb: Add TAPRIO traffic scheduling support
drivers/net/ethernet/cadence/macb_main.c
fca3dc859b20 net: macb: remove illusion about TBQPH/RBQPH being per-queue
89934dbf169e net: macb: Add TAPRIO traffic scheduling support
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The Allwinner A523 SoC family has a second Ethernet controller, called
the GMAC200 in the BSP and T527 datasheet, and referred to as GMAC1 for
numbering. This controller, according to BSP sources, is fully
compatible with a slightly newer version of the Synopsys DWMAC core.
The glue layer around the controller is the same as found around older
DWMAC cores on Allwinner SoCs. The only slight difference is that since
this is the second controller on the SoC, the register for the clock
delay controls is at a different offset. Last, the integration includes
a dedicated clock gate for the memory bus and the whole thing is put in
a separately controllable power domain.
Add a new driver for this hardware supporting the integration layer.
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250925191600.3306595-3-wens@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Fix compiler error caused by the round_rate() to determine_rate()
migration.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509280327.jsapR0Ww-lkp@intel.com/
Signed-off-by: Brian Masney <bmasney@redhat.com>
Fixes: e9f039c08cdc ("clk: microchip: core: convert from round_rate() to determine_rate()")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Derive 'Zeroable' for all structs and unions generated by 'bindgen'
where possible and corresponding cleanups. To do so, add the
'pin-init' crate as a dependency to 'bindings' and 'uapi'.
It also includes its first use in the 'cpufreq' module, with more
to come in the next cycle.
- Add warning to the 'rustdoc' target to detect broken 'srctree/'
links and fix existing cases.
- Remove support for unused (since v6.16) host '#[test]'s,
simplifying the 'rusttest' target. Tests should generally run
within KUnit.
'kernel' crate:
- Add 'ptr' module with a new 'Alignment' type, which is always a
power of two and is used to validate that a given value is a valid
alignment and to perform masking and alignment operations:
// Checked at build time.
assert_eq!(Alignment::new::<16>().as_usize(), 16);
// Checked at runtime.
assert_eq!(Alignment::new_checked(15), None);
assert_eq!(Alignment::of::<u8>().log2(), 0);
assert_eq!(0x25u8.align_down(Alignment::new::<0x10>()), 0x20);
assert_eq!(0x5u8.align_up(Alignment::new::<0x10>()), Some(0x10));
assert_eq!(u8::MAX.align_up(Alignment::new::<0x10>()), None);
It also includes its first use in Nova.
- Add 'core::mem::{align,size}_of{,_val}' to the prelude, matching
Rust 1.80.0.
- Keep going with the steps on our migration to the standard library
'core::ffi::CStr' type (use 'kernel::{fmt, prelude::fmt!}' and use
upstream method names).
- 'error' module: improve 'Error::from_errno' and 'to_result'
documentation, including examples/tests.
- 'sync' module: extend 'aref' submodule documentation now that it
exists, and more updates to complete the ongoing move of 'ARef' and
'AlwaysRefCounted' to 'sync::aref'.
- 'list' module: add an example/test for 'ListLinksSelfPtr' usage.
- 'alloc' module:
- Implement 'Box::pin_slice()', which constructs a pinned slice of
elements.
- Provide information about the minimum alignment guarantees of
'Kmalloc', 'Vmalloc' and 'KVmalloc'.
- Take minimum alignment guarantees of allocators for
'ForeignOwnable' into account.
- Remove the 'allocator_test' (including 'Cmalloc').
- Add doctest for 'Vec::as_slice()'.
- Constify various methods.
- 'time' module:
- Add methods on 'HrTimer' that can only be called with exclusive
access to an unarmed timer, or from timer callback context.
- Add arithmetic operations to 'Instant' and 'Delta'.
- Add a few convenience and access methods to 'HrTimer' and
'Instant'.
'macros' crate:
- Reduce collections in 'quote!' macro.
And a few other cleanups and improvements"
* tag 'rust-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (58 commits)
gpu: nova-core: use Alignment for alignment-related operations
rust: add `Alignment` type
rust: macros: reduce collections in `quote!` macro
rust: acpi: use `core::ffi::CStr` method names
rust: of: use `core::ffi::CStr` method names
rust: net: use `core::ffi::CStr` method names
rust: miscdevice: use `core::ffi::CStr` method names
rust: kunit: use `core::ffi::CStr` method names
rust: firmware: use `core::ffi::CStr` method names
rust: drm: use `core::ffi::CStr` method names
rust: cpufreq: use `core::ffi::CStr` method names
rust: configfs: use `core::ffi::CStr` method names
rust: auxiliary: use `core::ffi::CStr` method names
drm/panic: use `core::ffi::CStr` method names
rust: device: use `kernel::{fmt,prelude::fmt!}`
rust: sync: use `kernel::{fmt,prelude::fmt!}`
rust: seq_file: use `kernel::{fmt,prelude::fmt!}`
rust: kunit: use `kernel::{fmt,prelude::fmt!}`
rust: file: use `kernel::{fmt,prelude::fmt!}`
rust: device: use `kernel::{fmt,prelude::fmt!}`
...
|
|
The bitmap allocated with bitmap_zalloc() in otx2_probe() was not
released in otx2_remove(). Unbinding and rebinding the driver therefore
triggers a kmemleak warning:
unreferenced object (size 8):
backtrace:
bitmap_zalloc
otx2_probe
Call bitmap_free() in the remove path to fix the leak.
Fixes: efabce290151 ("octeontx2-pf: AF_XDP zero copy receive support")
Signed-off-by: Bo Sun <bo@mboxify.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The bitmap allocated with bitmap_zalloc() in otx2vf_probe() was not
released in otx2vf_remove(). Unbinding and rebinding the driver therefore
triggers a kmemleak warning:
unreferenced object (size 8):
backtrace:
bitmap_zalloc
otx2vf_probe
Call bitmap_free() in the remove path to fix the leak.
Fixes: efabce290151 ("octeontx2-pf: AF_XDP zero copy receive support")
Signed-off-by: Bo Sun <bo@mboxify.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The ->set/create/modify_rxfh() callbacks now pass a valid extack instead
of NULL through netlink [1]. In case of an error, reflect it through
extack instead of a dmesg print.
[1]
commit c0ae03588bbb ("ethtool: rss: initial RSS_SET (indirection table handling)")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759094723-843774-8-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Group RSS-related parameters into a dedicated mlx5e_rss_params
struct. Pass this struct instead of individual arguments when
initializing RSS.
No functional changes.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759094723-843774-7-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce a dedicated structure to group RSS initialization parameters
that are only used during RSS creation, and drop the "init" prefix
from pkt_merge_param.
No functional changes.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759094723-843774-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The mdev parameter is not used in mlx5e_rss_params_indir_init, so drop
it from the function and update all callers accordingly.
No functional changes.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759094723-843774-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Enhance error messages in MLX5 QoS scheduling depth validation by
including the actual values that caused the validation to fail.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759094723-843774-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a PF enters switchdev mode, its netdevice becomes the uplink
representor but remains in its current network namespace. All other
representors (VFs, SFs) are created in the netns of the devlink
instance.
If the PF's netns has been moved and differs from the devlink's netns,
enabling switchdev mode would create a state where the OVS control
plane (ovs-vsctl) cannot manage the switch because the PF uplink
representor and the other representors are split across different
namespaces.
To prevent this inconsistent configuration, block the request to enter
switchdev mode if the PF netdevice's netns does not match the netns of
its devlink instance.
As part of this change, the PF's netns is first marked as immutable.
This prevents race conditions where the netns could be changed after
the check is performed but before the mode transition is complete, and
it aligns the PF's behavior with that of the final uplink representor.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759094723-843774-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The existing solution of complex matchers splits the match parameters
across two, and exactly two, matchers. For some rather extreme cases
(e.g. IPv6-in-IPv6 tunnels), even two matchers are not enough.
Generalize complex matchers to up to 4 submatchers, and allow easy
extension to more if needed. This resulted in rewriting a large part
of the high-level complex matchers logic, but the original concepts
were rock solid and still hold.
Key characteristics of the new implementation:
* Rework complex matchers to include multiple submatchers. All
submatchers but the first are isolated, in keeping with the existing
paradigm of handing off to specialized matchers that are not otherwise
reachable by regular rules.
* Similarly, rework complex rules to allow splitting them into more than
two simple rules. Rules continue to be refcounted to allow for
multiple complex rules matching on identical parts of the match
params.
* Rely on the match tag, as opposed to the entire match_param, to hash
subrules. This results in lower memory usage.
* Prefer to split the original user-supplied match parameters rather
than the internal field descriptors. This avoids the awkward
transition back and forth between the two formats.
* Allow splitting multi-dword fields across matchers. The only
restrictions that the new implementation impose are: a) any fragment
of an IP address must be accompanied by a match on the IP version; and
b) a single lower dword of an IPv6 address cannot be present in a
submatcher as it would be interpreted as an IPv4 address.
* Employ a greedy algorithm to split the match params, as opposed to
complete search. The results are not optimal, but the algorithm is now
linear compared to exponential. Consequently, we see complex matcher
creation time drops two orders of magnitude in our tests.
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759094723-843774-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With CONFIG_HYPERV and CONFIG_HYPERV_VMBUS separated, change CONFIG_HYPERV
to bool from tristate. CONFIG_HYPERV now becomes the core Hyper-V
hypervisor support, such as hypercalls, clocks/timers, Confidential
Computing setup, PCI passthru, etc. that doesn't involve VMBus or VMBus
devices.
Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
At present VMBus driver is hinged off of CONFIG_HYPERV which entails
lot of builtin code and encompasses too much. It's not always clear
what depends on builtin hv code and what depends on VMBus. Setting
CONFIG_HYPERV as a module and fudging the Makefile to switch to builtin
adds even more confusion. VMBus is an independent module and should have
its own config option. Also, there are scenarios like baremetal dom0/root
where support is built in with CONFIG_HYPERV but without VMBus. Lastly,
there are more features coming down that use CONFIG_HYPERV and add more
dependencies on it.
So, create a fine grained HYPERV_VMBUS option and update Kconfigs for
dependency on VMBus.
Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull clocksource updates from Thomas Gleixner:
- Further preparations for modular clocksource/event drivers
- The usual device tree updates to support new chip variants and the
related changes to thise drivers
- Avoid a 64-bit division in the TEGRA186 driver, which caused a build
fail on 32-bit machines.
- Small fixes, improvements and cleanups all over the place
* tag 'timers-clocksource-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
dt-bindings: timer: exynos4210-mct: Add compatible for ARTPEC-9 SoC
clocksource/drivers/sh_cmt: Split start/stop of clock source and events
clocksource/drivers/clps711x: Fix resource leaks in error paths
clocksource/drivers/arm_global_timer: Add auto-detection for initial prescaler values
clocksource/drivers/ingenic-sysost: Convert from round_rate() to determine_rate()
clocksource/drivers/timer-tegra186: Don't print superfluous errors
clocksource/drivers/timer-rtl-otto: Simplify documentation
clocksource/drivers/timer-rtl-otto: Do not interfere with interrupts
clocksource/drivers/timer-rtl-otto: Drop set_counter function
clocksource/drivers/timer-rtl-otto: Work around dying timers
clocksource/drivers/timer-ti-dm : Capture functionality for OMAP DM timer
clocksource/drivers/arm_arch_timer_mmio: Add MMIO clocksource
clocksource/drivers/arm_arch_timer_mmio: Switch over to standalone driver
clocksource/drivers/arm_arch_timer: Add standalone MMIO driver
ACPI: GTDT: Generate platform devices for MMIO timers
clocksource/drivers/nxp-pit: Add NXP Automotive s32g2 / s32g3 support
dt: bindings: fsl,vf610-pit: Add compatible for s32g2 and s32g3
clocksource/drivers/vf-pit: Rename the VF PIT to NXP PIT
clocksource/drivers/vf-pit: Unify the function name for irq ack
clocksource/drivers/vf-pit: Consolidate calls to pit_*_disable/enable
...
|
|
Fix two minor typos in vmbus_drv.c:
- Correct "reponsible" -> "responsible" in a comment.
- Add missing newline in pr_err() message ("channeln" -> "channel\n").
These are cosmetic changes only and do not affect functionality.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
The sysfs attributes out_read_index and out_write_index in
vmbus_drv.c currently use %d to print outbound.current_read_index
and outbound.current_write_index.
These fields are u32 values, so printing them with %d (signed) is
not logically correct. Update the format specifier to %u to
correctly match their type.
No functional change, only fixes the sysfs output format.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
The target_cpu_store() function parses the target CPU from the sysfs
buffer using sscanf(). The format string currently uses "%uu", which
is redundant. The compiler ignores the extra "u", so there is no
incorrect parsing at runtime.
Update the format string to use "%u" for clarity and consistency.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
- Address the inconsistent shutdown sequence of per CPU clockevents on
CPU hotplug, which only removed it from the core but failed to invoke
the actual device driver shutdown callback. This kept the timer
active, which prevented power savings and caused pointless noise in
virtualization.
- Encapsulate the open coded access to the hrtimer clock base, which is
a private implementation detail, so that the implementation can be
changed without breaking a lot of usage sites.
- Enhance the debug output of the clocksource watchdog to provide
better information for analysis.
- The usual set of cleanups and enhancements all over the place
* tag 'timers-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Fix spelling mistakes in comments
clocksource: Print durations for sync check unconditionally
LoongArch: Remove clockevents shutdown call on offlining
tick: Do not set device to detached state in tick_shutdown()
hrtimer: Reorder branches in hrtimer_clockid_to_base()
hrtimer: Remove hrtimer_clock_base:: Get_time
hrtimer: Use hrtimer_cb_get_time() helper
media: pwm-ir-tx: Avoid direct access to hrtimer clockbase
ALSA: hrtimer: Avoid direct access to hrtimer clockbase
lib: test_objpool: Avoid direct access to hrtimer clockbase
sched/core: Avoid direct access to hrtimer clockbase
timers/itimer: Avoid direct access to hrtimer clockbase
posix-timers: Avoid direct access to hrtimer clockbase
jiffies: Remove obsolete SHIFTED_HZ comment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq chip driver updates from Thomas Gleixner:
- Use the startup/shutdown callbacks for the PCI/MSI per device
interrupt domains.
This allows us to initialize the RISCV PLIC interrupt hierarchy
correctly and provides a mechanism to decouple the masking and
unmasking during run-time from the expensive PCI mask and unmask when
the underlying MSI provider implementation allows the interrupt to be
masked.
- Initialize the RISCV PLIC MSI interrupt hierarchy correctly so that
the affinity assignment works correctly by switching it over to the
startup/shutdown scheme
- Allow MSI providers to opt out from masking a PCI/MSI interrupt at
the PCI device during operation when the provider can mask the
interrupt at the underlying interrupt chip. This reduces the overhead
in scenarios where disable_irq()/enable_irq() is utilized frequently
by a driver.
The PCI/MSI device level [un]masking is only required on startup and
shutdown in this case.
- Remove the conditional mask/unmask logic in the PCI/MSI layer as this
is now handled unconditionally.
- Replace the hardcoded interrupt routing in the Loongson EIOINTC
interrupt driver to respect the firmware settings and spread them out
to different CPU interrupt inputs so that the demultiplexing handler
only needs to read only a single 64-bit status register instead of
four, which significantly reduces the overhead in VMs as the status
register access causes a VM exit.
- Add support for the new AST2700 SCU interrupt controllers
- Use the legacy interrupt domain setup for the Loongson PCH-LPC
interrupt controller, which resembles the x86 legacy PIC setup and
has the same hardcoded legacy requirements.
- The usual set of cleanups, fixes and improvements all over the place
* tag 'irq-drivers-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
irqchip/loongson-pch-lpc: Use legacy domain for PCH-LPC IRQ controller
PCI/MSI: Remove the conditional parent [un]mask logic
irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
irqchip/aspeed-scu-ic: Add support for AST2700 SCU interrupt controllers
dt-bindings: interrupt-controller: aspeed: Add AST2700 SCU IC compatibles
dt-bindings: mfd: aspeed: Add AST2700 SCU compatibles
irqchip/aspeed-scu-ic: Refactor driver to support variant-based initialization
irqchip/gic-v5: Fix error handling in gicv5_its_irq_domain_alloc()
irqchip/gic-v5: Fix loop in gicv5_its_create_itt_two_level() cleanup path
irqchip/gic-v5: Delete a stray tab
irqchip/sg2042-msi: Set irq type according to DT configuration
riscv: sophgo: dts: sg2044: Change msi irq type to IRQ_TYPE_EDGE_RISING
riscv: sophgo: dts: sg2042: Change msi irq type to IRQ_TYPE_EDGE_RISING
irqchip/gic-v2m: Handle Multiple MSI base IRQ Alignment
irqchip/renesas-rzg2l: Remove dev_err_probe() if error is -ENOMEM
irqchip: Use int type to store negative error codes
irqchip/gic-v5: Remove the redundant ITS cache invalidation
PCI/MSI: Check MSI_FLAG_PCI_MSI_MASK_PARENT in cond_[startup|shutdown]_parent()
irqchip/loongson-eiointc: Add multiple interrupt pin routing support
irqchip/loongson-eiointc: Route interrupt parsed from bios table
...
|
|
Move away from the legacy MSI domain setup, switch to use
msi_create_parent_irq_domain().
While doing the conversion, I noticed that hv_irq_compose_msi_msg() is
doing more than it is supposed to (composing message content). The
interrupt allocation bits should be moved into hv_msi_domain_alloc().
However, I have no hardware to test this change, therefore I leave a TODO
note.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Use the kernel's common "entry virt" APIs to handle pending work prior to
(re)entering guest mode, now that the virt APIs don't have a superfluous
dependency on KVM.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Check for NEED_RESCHED_LAZY, not just NEED_RESCHED, prior to transferring
control to a guest. Failure to check for lazy resched can unnecessarily
delay rescheduling until the next tick when using a lazy preemption model.
Note, ideally both the checking and processing of TIF bits would be handled
in common code, to avoid having to keep three separate paths synchronized,
but defer such cleanups to the future to keep the fix as standalone as
possible.
Cc: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Cc: Mukesh R <mrathor@linux.microsoft.com>
Fixes: 621191d709b1 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Write combining is an optimization feature in CPUs that is frequently
used by modern devices to generate 32 or 64 byte TLPs at the PCIe level.
These large TLPs allow certain optimizations in the driver to HW
communication that improve performance. As WC is unpredictable and
optional the HW designs all tolerate cases where combining doesn't
happen and simply experience a performance degradation.
Unfortunately many virtualization environments on all architectures have
done things that completely disable WC inside the VM with no generic way
to detect this. For example WC was fully blocked in ARM64 KVM until
commit 8c47ce3e1d2c ("KVM: arm64: Set io memory s2 pte as normalnc for
vfio pci device").
Trying to use WC when it is known not to work has a measurable
performance cost (~5%). Long ago mlx5 developed an boot time algorithm
to test if WC is available or not by using unique mlx5 HW features to
measure how many large TLPs the device is receiving. The SW generates a
large number of combining opportunities and if any succeed then WC is
declared working.
In mlx5 the WC optimization feature is never used by the kernel except
for the boot time test. The WC is only used by userspace in rdma-core.
Sadly modern ARM CPUs, especially NVIDIA Grace, have a combining
implementation that is very unreliable compared to pretty much
everything prior. This is being fixed architecturally in new CPUs with a
new ST64B instruction, but current shipping devices suffer this problem.
Unreliable means the SW can present thousands of combining opportunities
and the HW will not combine for any of them, which creates a performance
degradation, and critically fails the mlx5 boot test. However, the CPU
is very sensitive to the instruction sequence used, with the better
options being sufficiently good that the performance loss from the
unreliable CPU is not measurable.
Broadly there are several options, from worst to best:
1) A C loop doing a u64 memcpy.
This was used prior to commit ef302283ddfc
("IB/mlx5: Use __iowrite64_copy() for write combining stores")
and failed almost all the time on Grace CPUs.
2) ARM64 assembly with consecutive 8 byte stores. This was implemented
as an arch-generic __iowriteXX_copy() family of functions suitable
for performance use in drivers for WC. commit ead79118dae6
("arm64/io: Provide a WC friendly __iowriteXX_copy()") provided the
ARM implementation.
3) ARM64 assembly with consecutive 16 byte stores. This was rejected
from kernel use over fears of virtualization failures. Common ARM
VMMs will crash if STP is used against emulated memory.
4) A single NEON store instruction. Userspace has used this option for a
very long time, it performs well.
5) For future silicon the new ST64B instruction is guaranteed to
generate a 64 byte TLP 100% of the time
The past upgrade from #1 to #2 was thought to be sufficient to solve
this problem. However, more testing on more systems shows that #3 is
still problematic at a low frequency and the kernel test fails.
Thus, make the mlx5 use the same instructions as userspace during the
boot time WC self test. This way the WC test matches the userspace and
will properly detect the ability of HW to support the WC workload that
userspace will generate. While #4 still has imperfect combining
performance, it is substantially better than #2, and does actually give
a performance win to applications. Self-test failures with #2 are like
3/10 boots, on some systems, #4 has never seen a boot failure.
There is no real general use case for a NEON based WC flow in the
kernel. This is not suitable for any performance path work as getting
into/out of a NEON context is fairly expensive compared to the gain of
WC. Future CPUs are going to fix this issue by using an new ARM
instruction and __iowriteXX_copy() will be updated to use that
automatically, probably using the ALTERNATES mechanism.
Since this problem is constrained to mlx5's unique situation of needing
a non-performance code path to duplicate what mlx5 userspace is doing as
a matter of self-testing, implement it as a one line inline assembly in
the driver directly.
Lastly, this was concluded from the discussion with ARM maintainers
which confirms that this is the best approach for the solution:
https://lore.kernel.org/r/aHqN_hpJl84T1Usi@arm.com
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759093688-841357-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The pm_domain cleanup can not be devres managed as it uses struct
simplefb_par which is allocated within struct fb_info by
framebuffer_alloc(). This allocation is explicitly freed by
unregister_framebuffer() in simplefb_remove().
Devres managed cleanup runs after the device remove call and thus can no
longer access struct simplefb_par.
Call simplefb_detach_genpds() explicitly from simplefb_destroy() like
the cleanup functions for clocks and regulators.
Fixes an use after free on M2 Mac mini during
aperture_remove_conflicting_devices() using the downstream asahi kernel
with Debian's kernel config. For unknown reasons this started to
consistently dereference an invalid pointer in v6.16.3 based kernels.
[ 6.736134] BUG: KASAN: slab-use-after-free in simplefb_detach_genpds+0x58/0x220
[ 6.743545] Read of size 4 at addr ffff8000304743f0 by task (udev-worker)/227
[ 6.750697]
[ 6.752182] CPU: 6 UID: 0 PID: 227 Comm: (udev-worker) Tainted: G S 6.16.3-asahi+ #16 PREEMPTLAZY
[ 6.752186] Tainted: [S]=CPU_OUT_OF_SPEC
[ 6.752187] Hardware name: Apple Mac mini (M2, 2023) (DT)
[ 6.752189] Call trace:
[ 6.752190] show_stack+0x34/0x98 (C)
[ 6.752194] dump_stack_lvl+0x60/0x80
[ 6.752197] print_report+0x17c/0x4d8
[ 6.752201] kasan_report+0xb4/0x100
[ 6.752206] __asan_report_load4_noabort+0x20/0x30
[ 6.752209] simplefb_detach_genpds+0x58/0x220
[ 6.752213] devm_action_release+0x50/0x98
[ 6.752216] release_nodes+0xd0/0x2c8
[ 6.752219] devres_release_all+0xfc/0x178
[ 6.752221] device_unbind_cleanup+0x28/0x168
[ 6.752224] device_release_driver_internal+0x34c/0x470
[ 6.752228] device_release_driver+0x20/0x38
[ 6.752231] bus_remove_device+0x1b0/0x380
[ 6.752234] device_del+0x314/0x820
[ 6.752238] platform_device_del+0x3c/0x1e8
[ 6.752242] platform_device_unregister+0x20/0x50
[ 6.752246] aperture_detach_platform_device+0x1c/0x30
[ 6.752250] aperture_detach_devices+0x16c/0x290
[ 6.752253] aperture_remove_conflicting_devices+0x34/0x50
...
[ 6.752343]
[ 6.967409] Allocated by task 62:
[ 6.970724] kasan_save_stack+0x3c/0x70
[ 6.974560] kasan_save_track+0x20/0x40
[ 6.978397] kasan_save_alloc_info+0x40/0x58
[ 6.982670] __kasan_kmalloc+0xd4/0xd8
[ 6.986420] __kmalloc_noprof+0x194/0x540
[ 6.990432] framebuffer_alloc+0xc8/0x130
[ 6.994444] simplefb_probe+0x258/0x2378
...
[ 7.054356]
[ 7.055838] Freed by task 227:
[ 7.058891] kasan_save_stack+0x3c/0x70
[ 7.062727] kasan_save_track+0x20/0x40
[ 7.066565] kasan_save_free_info+0x4c/0x80
[ 7.070751] __kasan_slab_free+0x6c/0xa0
[ 7.074675] kfree+0x10c/0x380
[ 7.077727] framebuffer_release+0x5c/0x90
[ 7.081826] simplefb_destroy+0x1b4/0x2c0
[ 7.085837] put_fb_info+0x98/0x100
[ 7.089326] unregister_framebuffer+0x178/0x320
[ 7.093861] simplefb_remove+0x3c/0x60
[ 7.097611] platform_remove+0x60/0x98
[ 7.101361] device_remove+0xb8/0x160
[ 7.105024] device_release_driver_internal+0x2fc/0x470
[ 7.110256] device_release_driver+0x20/0x38
[ 7.114529] bus_remove_device+0x1b0/0x380
[ 7.118628] device_del+0x314/0x820
[ 7.122116] platform_device_del+0x3c/0x1e8
[ 7.126302] platform_device_unregister+0x20/0x50
[ 7.131012] aperture_detach_platform_device+0x1c/0x30
[ 7.136157] aperture_detach_devices+0x16c/0x290
[ 7.140779] aperture_remove_conflicting_devices+0x34/0x50
...
Reported-by: Daniel Huhardeaux <tech@tootai.net>
Cc: stable@vger.kernel.org
Fixes: 92a511a568e44 ("fbdev/simplefb: Add support for generic power-domains")
Signed-off-by: Janne Grunau <j@jannau.net>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
There are systems which want to wait for as long as it takes for the
stopped video memory to answer. Mapping it out helps to avoid that
while the system is running but standby still hangs somehow. So just
leave the memory on in standby same as it was before my change.
Signed-off-by: Zsolt Kajtar <soci@c64.rulez.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Change the 'ret' variable in of_platform_mb862xx_probe() from unsigned long
to int, as it needs to store either negative error codes or zero.
Storing the negative error codes in unsigned type, doesn't cause an issue
at runtime but can be confusing. Additionally, assigning negative error
codes to unsigned type may trigger a GCC warning when the -Wsign-conversion
flag is enabled.
No effect on runtime.
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Use string_choices.h helpers instead of hard-coded strings.
Signed-off-by: Chelsy Ratnawat <chelsyratnawat2001@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
It could be triggered on 32 bit big endian machines at 32 bpp in the
pattern realignment. In this case just return early as the result is
an identity.
Signed-off-by: Zsolt Kajtar <soci@c64.rulez.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
With the right setup S3 cards can display 1 and 2 BPP packed pixel
modes, even in high resolutions. So this patch makes them available.
The 4 BPP packed pixel mode had one pixel column of garbage on the
left side due to how the shift register works, this is fixed now.
There was a limitation that only 8 pixel wide fonts could be used at 4
BPP. Since the CFB routines were updated to handle reverse pixel
ordering correctly that limitation doesn't exists and was removed now.
In 4 BPP interleaved planes mode font widths of multiply of 8 are
accepted now, not just 8 pixels.
The horizontal screen position will not move as much between modes as it
used to. That was caused by the various amount of pipeline delay which
is compensated now as much as possible.
While adjusting the code direct port access of PEL registers was
corrected. Should work now on systems where these are memory mapped.
I've noticed that when in 1 BPP mode the console is used with Unicode
fonts erasing might be done with non-blanks. That's a bug in the VT code
and so not part of this patch.
Signed-off-by: Zsolt Kajtar <soci@c64.rulez.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
This patch implements power saving for S3 cards by powering down the
RAMDAC and stopping MCLK and DCLK while the card is supposed to be
suspended.
The RAMDAC is also disabled while the screen is blanked and the DCLK
in stopped while in DPMS power off.
The practical difference it makes is that on a machine with such a
card the display will be placed in DPMS power off while standby is
activated (due to stopped DCLK). Same like when using other cards with
implemented power saving functionality.
Without it on my setup the connected display powers up and stays that
way showing VT63 while in standby. Sort of annoying as before standby
it's specifically placed into DPMS off in Xorg for a while.
The used functionality should exists for sure on Trio32 to Aurora64V
(according to the documentation) so I think it's generally applicable.
I'm using this on S3 Trio 3D and S3 Virge DX.
Signed-off-by: Zsolt Kajtar <soci@c64.rulez.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Use vmalloc_array() instead of vmalloc() to simplify the function
xenfb_probe().
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV and apic updates from Borislav Petkov:
- Add functionality to provide runtime firmware updates for the non-x86
parts of an AMD platform like the security processor (ASP) firmware,
modules etc, for example. The intent being that these updates are
interim, live fixups before a proper BIOS update can be attempted
- Add guest support for AMD's Secure AVIC feature which gives encrypted
guests the needed protection against a malicious hypervisor
generating unexpected interrupts and injecting them into such guest,
thus interfering with its operation in an unexpected and negative
manner.
The advantage of this scheme is that the guest determines which
interrupts and when to accept them vs leaving that to the benevolence
(or not) of the hypervisor
- Strictly separate the startup code from the rest of the kernel where
former is executed from the initial 1:1 mapping of memory.
The problem was that the toolchain-generated version of the code was
being executed from a different mapping of memory than what was
"assumed" during code generation, needing an ever-growing pile of
fixups for absolute memory references which are invalid in the early,
1:1 memory mapping during boot.
The major advantage of this is that there's no need to check the 1:1
mapping portion of the code for absolute relocations anymore and get
rid of the RIP_REL_REF() macro sprinkling all over the place.
For more info, see Ard's very detailed writeup on this [1]
- The usual cleanups and fixes
Link: https://lore.kernel.org/r/CAMj1kXEzKEuePEiHB%2BHxvfQbFz0sTiHdn4B%2B%2BzVBJ2mhkPkQ4Q@mail.gmail.com [1]
* tag 'x86_apic_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
x86/boot: Drop erroneous __init annotation from early_set_pages_state()
crypto: ccp - Add AMD Seamless Firmware Servicing (SFS) driver
crypto: ccp - Add new HV-Fixed page allocation/free API
x86/sev: Add new dump_rmp parameter to snp_leak_pages() API
x86/startup/sev: Document the CPUID flow in the boot #VC handler
objtool: Ignore __pi___cfi_ prefixed symbols
x86/sev: Zap snp_abort()
x86/apic/savic: Do not use snp_abort()
x86/boot: Get rid of the .head.text section
x86/boot: Move startup code out of __head section
efistub/x86: Remap inittext read-execute when needed
x86/boot: Create a confined code area for startup code
x86/kbuild: Incorporate boot/startup/ via Kbuild makefile
x86/boot: Revert "Reject absolute references in .head.text"
x86/boot: Check startup code for absence of absolute relocations
objtool: Add action to check for absence of absolute relocations
x86/sev: Export startup routines for later use
x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object
x86/sev: Provide PIC aliases for SEV related data objects
x86/boot: Provide PIC aliases for 5-level paging related constants
...
|
|
Fix this copy and paste bug. Return "err" instead of
PTR_ERR(haptics->regmap).
Fixes: 52e06d564ce6 ("Input: aw86927 - add driver for Awinic AW86927")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/aNvMPTnOovdBitdP@stanley.mountain
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
- Make UMIP instruction detection more robust
- Correct and cleanup AMD CPU topology detection; document the relevant
CPUID leaves topology parsing precedence on AMD
- Add support for running the kernel as guest on FreeBSD's Bhyve
hypervisor
- Cleanups and improvements
* tag 'x86_cpu_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/umip: Fix decoding of register forms of 0F 01 (SGDT and SIDT aliases)
x86/umip: Check that the instruction opcode is at least two bytes
Documentation/x86/topology: Detail CPUID leaves used for topology enumeration
x86/cpu/topology: Define AMD64_CPUID_EXT_FEAT MSR
x86/cpu/topology: Check for X86_FEATURE_XTOPOLOGY instead of passing has_xtopology
x86/cpu/cacheinfo: Simplify cacheinfo_amd_init_llc_id() using _cpuid4_info
x86/cpu: Rename and move CPU model entry for Diamond Rapids
x86/cpu: Detect FreeBSD Bhyve hypervisor
|
|
Call sysfs_update_group() after reading the device descriptor to ensure
the HID sysfs attributes are visible when the feature is supported.
Fixes: ae7795a8c258 ("scsi: ufs: core: Add HID support")
Signed-off-by: Daniel Lee <chullee@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
- Add support for new AMD family 0x1a models to amd64_edac
- Add an EDAC driver for the AMD VersalNET memory controller which
reports hw errors from different IP blocks in the fabric using an
IPC-type transport
- Drop the silly static number of memory controllers in the Intel EDAC
drivers (skx, i10nm) in favor of a flexible array so that former
doesn't need to be increased with every new generation which adds
more memory controllers; along with a proper refactoring
- Add support for two Alder Lake-S SOCs to ie31200_edac
- Add an EDAC driver for ADM Cortex A72 cores, and specifically for
reporting L1 and L2 cache errors
- Last but not least, the usual fixes, cleanups and improvements all
over the subsystem
* tag 'edac_updates_for_v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (23 commits)
EDAC/versalnet: Return the correct error in mc_probe()
EDAC/mc_sysfs: Increase legacy channel support to 16
EDAC/amd64: Add support for AMD family 1Ah-based newer models
EDAC: Add a driver for the AMD Versal NET DDR controller
dt-bindings: memory-controllers: Add support for Versal NET EDAC
RAS: Export log_non_standard_event() to drivers
cdx: Export Symbols for MCDI RPC and Initialization
cdx: Split mcdi.h and reorganize headers
EDAC/skx_common: Use topology_physical_package_id() instead of open coding
EDAC: Fix wrong executable file modes for C source files
EDAC/altera: Use dev_fwnode()
EDAC/skx_common: Remove unused *NUM*_IMC macros
EDAC/i10nm: Reallocate skx_dev list if preconfigured cnt != runtime cnt
EDAC/skx_common: Remove redundant upper bound check for res->imc
EDAC/skx_common: Make skx_dev->imc[] a flexible array
EDAC/skx_common: Swap memory controller index mapping
EDAC/skx_common: Move mc_mapping to be a field inside struct skx_imc
EDAC/{skx_common,skx}: Use configuration data, not global macros
EDAC/i10nm: Skip DIMM enumeration on a disabled memory controller
EDAC/ie31200: Add two more Intel Alder Lake-S SoCs for EDAC support
...
|
|
Use kvm_is_gpa_in_memslot() to check the validity of the notification
indicator byte address instead of open coding equivalent logic in the VFIO
AP driver.
Opportunistically use a dedicated wrapper that exists and is exported
expressly for the VFIO AP module. kvm_is_gpa_in_memslot() is generally
unsuitable for use outside of KVM; other drivers typically shouldn't rely
on KVM's memslots, and using the API requires kvm->srcu (or slots_lock) to
be held for the entire duration of the usage, e.g. to avoid TOCTOU bugs.
handle_pqap() is a bit of a special case, as it's explicitly invoked from
KVM with kvm->srcu already held, and the VFIO AP driver is in many ways an
extension of KVM that happens to live in a separate module.
Providing a dedicated API for the VFIO AP driver will allow restricting
the vast majority of generic KVM's exports to KVM submodules (e.g. to x86's
kvm-{amd,intel}.ko vendor mdoules).
No functional change intended.
Acked-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20250919003303.1355064-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
HEAD
KVM SEV-SNP CipherText Hiding support for 6.18
Add support for SEV-SNP's CipherText Hiding, an opt-in feature that prevents
unauthorized CPU accesses from reading the ciphertext of SNP guest private
memory, e.g. to attempt an offline attack. Instead of ciphertext, the CPU
will always read back all FFs when CipherText Hiding is enabled.
Add new module parameter to the KVM module to enable CipherText Hiding and
control the number of ASIDs that can be used for VMs with CipherText Hiding,
which is in effect the number of SNP VMs. When CipherText Hiding is enabled,
the shared SEV-ES/SEV-SNP ASID space is split into separate ranges for SEV-ES
and SEV-SNP guests, i.e. ASIDs that can be used for CipherText Hiding cannot
be used to run SEV-ES guests.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.18
1. Add PTW feature detection on new hardware.
2. Add sign extension with kernel MMIO/IOCSR emulation.
3. Improve in-kernel IPI emulation.
4. Improve in-kernel PCH-PIC emulation.
5. Move kvm_iocsr tracepoint out of generic code.
|
|
KVM/riscv changes for 6.18
- Added SBI FWFT extension for Guest/VM with misaligned
delegation and pointer masking PMLEN features
- Added ONE_REG interface for SBI FWFT extension
- Added Zicbop and bfloat16 extensions for Guest/VM
- Enabled more common KVM selftests for RISC-V such as
access_tracking_perf_test, dirty_log_perf_test,
memslot_modification_stress_test, memslot_perf_test,
mmu_stress_test, and rseq_test
- Added SBI v3.0 PMU enhancements in KVM and perf driver
|
|
Don't mix dma fence lock with the active_job lock. Use fence_lock to
protect the dma fence used by drm scheduler when signalling a job
completion and queue_lock to protect concurrent access to active bin job
in OOM and stats collection for a given file priv. The issue was
uncovered when PREEMPT_RT on with a system freeze when opening multiple
Chromium tabs on Raspberry Pi 5.
Link: https://github.com/raspberrypi/linux/issues/7035
Fixes: fa6a20c87470 ("drm/v3d: Address race-condition between per-fd GPU stats and fd release")
Signed-off-by: Melissa Wen <mwen@igalia.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Melissa Wen <melissa.srw@gmail.com>
Link: https://lore.kernel.org/r/20250916172022.2779837-1-mwen@igalia.com
|
|
- quicki2c: support ACPI config for advanced features: max input size
and interrupt delay (Xinpeng Sun)
- some more str_true_false() conversions
|
|
- make use of str_true_or_false() (Liu Song)
|
|
- Make use of devm API for steelseries (Jeongjun Park)
|
|
- Add support for audio jack handling on DualSense (Cristian Ciocaltea)
|
|
- hid-pidff improvements and fixes (Tomasz Pakuła)
|
|
- Fix Z13 folio touchpad not binding to hid-multitouch (Antheas
Kapenekakis)
|
|
- Implement haptic touchpad support (Angela Czubak and Jonathan Denose)
|