aboutsummaryrefslogtreecommitdiffstats
path: root/arch/arm64 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-10-31Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds1-2/+3
Pull bpf fixes from Alexei Starovoitov: - Mark migrate_disable/enable() as always_inline to avoid issues with partial inlining (Yonghong Song) - Fix powerpc stack register definition in libbpf bpf_tracing.h (Andrii Nakryiko) - Reject negative head_room in __bpf_skb_change_head (Daniel Borkmann) - Conditionally include dynptr copy kfuncs (Malin Jonsson) - Sync pending IRQ work before freeing BPF ring buffer (Noorain Eqbal) - Do not audit capability check in x86 do_jit() (Ondrej Mosnacek) - Fix arm64 JIT of BPF_ST insn when it writes into arena memory (Puranjay Mohan) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf/arm64: Fix BPF_ST into arena memory bpf: Make migrate_disable always inline to avoid partial inlining bpf: Reject negative head_room in __bpf_skb_change_head bpf: Conditionally include dynptr copy kfuncs libbpf: Fix powerpc's stack register definition in bpf_tracing.h bpf: Do not audit capability check in do_jit() bpf: Sync pending IRQ work before freeing ring buffer
2025-10-31bpf/arm64: Fix BPF_ST into arena memoryPuranjay Mohan1-2/+3
The arm64 JIT supports BPF_ST with BPF_PROBE_MEM32 (arena) by using the tmp2 register to hold the dst + arena_vm_base value and using tmp2 as the new dst register. But this is broken because in case is_lsi_offset() returns false the tmp2 will be clobbered by emit_a64_mov_i(1, tmp2, off, ctx); and hence the emitted store instruction will be of the form: strb w10, [x11, x11] Fix this by using the third temporary register to hold the dst + arena_vm_base. Fixes: 339af577ec05 ("bpf: Add arm64 JIT support for PROBE_MEM32 pseudo instructions.") Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20251030121715.55214-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-24Merge tag 'soc-fixes-6.18-2' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "The main change this time is an update to the MAINTAINERS file, listing Krzysztof Kozlowski, Alexandre Belloni, and Linus Walleij as additional maintainers for the SoC tree, in order to go back to a group maintainership. Drew Fustini joins as an additional reviewer for the SoC tree. Thanks to all of you for volunteering to help out. On the actual bugfixes, we have a few correctness changes for firmware drivers (qtee, arm-ffa, scmi) and two devicetree fixes for Raspberry Pi" * tag 'soc-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: soc: officially expand maintainership team firmware: arm_scmi: Fix premature SCMI_XFER_FLAG_IS_RAW clearing in raw mode firmware: arm_scmi: Skip RAW initialization on failure include: trace: Fix inflight count helper on failed initialization firmware: arm_scmi: Account for failed debug initialization ARM: dts: broadcom: rpi: Switch to V3D firmware clock arm64: dts: broadcom: bcm2712: Define VGIC interrupt firmware: arm_ffa: Add support for IMPDEF value in the memory access descriptor tee: QCOMTEE should depend on ARCH_QCOM tee: qcom: return -EFAULT instead of -EINVAL if copy_from_user() fails tee: qcom: prevent potential off by one read
2025-10-23Merge tag 'arm64-fixes' of ↵Linus Torvalds2-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Do not make a clean PTE dirty in pte_mkwrite() The Arm architecture, for backwards compatibility reasons (ARMv8.0 before in-hardware dirty bit management - DBM), uses the PTE_RDONLY bit to mean !dirty while the PTE_WRITE bit means DBM enabled. The arm64 pte_mkwrite() simply clears the PTE_RDONLY bit and this inadvertently makes the PTE pte_hw_dirty(). Most places making a PTE writable also invoke pte_mkdirty() but do_swap_page() does not and we end up with dirty, freshly swapped in, writeable pages. - Do not warn if the destination page is already MTE-tagged in copy_highpage() In the majority of the cases, a destination page copied into is freshly allocated without the PG_mte_tagged flag set. However, the folio migration may be restarted if __folio_migrate_mapping() failed, triggering the benign WARN_ON_ONCE(). * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mte: Do not warn if the page is already tagged in copy_highpage() arm64, mm: avoid always making PTE dirty in pte_mkwrite()
2025-10-23arm64: mte: Do not warn if the page is already tagged in copy_highpage()Catalin Marinas1-3/+8
The arm64 copy_highpage() assumes that the destination page is newly allocated and not MTE-tagged (PG_mte_tagged unset) and warns accordingly. However, following commit 060913999d7a ("mm: migrate: support poisoned recover from migrate folio"), folio_mc_copy() is called before __folio_migrate_mapping(). If the latter fails (-EAGAIN), the copy will be done again to the same destination page. Since copy_highpage() already set the PG_mte_tagged flag, this second copy will warn. Replace the WARN_ON_ONCE(page already tagged) in the arm64 copy_highpage() with a comment. Reported-by: syzbot+d1974fc28545a3e6218b@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/68dda1ae.a00a0220.102ee.0065.GAE@google.com Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: stable@vger.kernel.org # 6.12.x Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-10-21arm64, mm: avoid always making PTE dirty in pte_mkwrite()Huang Ying1-1/+2
Current pte_mkwrite_novma() makes PTE dirty unconditionally. This may mark some pages that are never written dirty wrongly. For example, do_swap_page() may map the exclusive pages with writable and clean PTEs if the VMA is writable and the page fault is for read access. However, current pte_mkwrite_novma() implementation always dirties the PTE. This may cause unnecessary disk writing if the pages are never written before being reclaimed. So, change pte_mkwrite_novma() to clear the PTE_RDONLY bit only if the PTE_DIRTY bit is set to make it possible to make the PTE writable and clean. The current behavior was introduced in commit 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()"). Before that, pte_mkwrite() only sets the PTE_WRITE bit, while set_pte_at() only clears the PTE_RDONLY bit if both the PTE_WRITE and the PTE_DIRTY bits are set. To test the performance impact of the patch, on an arm64 server machine, run 16 redis-server processes on socket 1 and 16 memtier_benchmark processes on socket 0 with mostly get transactions (that is, redis-server will mostly read memory only). The memory footprint of redis-server is larger than the available memory, so swap out/in will be triggered. Test results show that the patch can avoid most swapping out because the pages are mostly clean. And the benchmark throughput improves ~23.9% in the test. Fixes: 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()") Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-10-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds15-336/+353
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix the handling of ZCR_EL2 in NV VMs - Pick the correct translation regime when doing a PTW on the back of a SEA - Prevent userspace from injecting an event into a vcpu that isn't initialised yet - Move timer save/restore to the sysreg handling code, fixing EL2 timer access in the process - Add FGT-based trapping of MDSCR_EL1 to reduce the overhead of debug - Fix trapping configuration when the host isn't GICv3 - Improve the detection of HCR_EL2.E2H being RES1 - Drop a spurious 'break' statement in the S1 PTW - Don't try to access SPE when owned by EL3 Documentation updates: - Document the failure modes of event injection - Document that a GICv3 guest can be created on a GICv5 host with FEAT_GCIE_LEGACY Selftest improvements: - Add a selftest for the effective value of HCR_EL2.AMO - Address build warning in the timer selftest when building with clang - Teach irqfd selftests about non-x86 architectures - Add missing sysregs to the set_id_regs selftest - Fix vcpu allocation in the vgic_lpi_stress selftest - Correctly enable interrupts in the vgic_lpi_stress selftest x86: - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault") - Don't try to get PMU capabilities from perf when running a CPU with hybrid CPUs/PMUs, as perf will rightly WARN. guest_memfd: - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more generic KVM_CAP_GUEST_MEMFD_FLAGS - Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set said flag to initialize memory as SHARED, irrespective of MMAP. The behavior merged in 6.18 is that enabling mmap() implicitly initializes memory as SHARED, which would result in an ABI collision for x86 CoCo VMs as their memory is currently always initialized PRIVATE. - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private memory, to enable testing such setups, i.e. to hopefully flush out any other lurking ABI issues before 6.18 is officially released. - Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP, and host userspace accesses to mmap()'d private memory" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits) arm64: Revamp HCR_EL2.E2H RES1 detection KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available KVM: arm64: Compute per-vCPU FGTs at vcpu_load() KVM: arm64: selftests: Fix misleading comment about virtual timer encoding KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_list KVM: arm64: selftests: Make dependencies on VHE-specific registers explicit KVM: arm64: Kill leftovers of ad-hoc timer userspace access KVM: arm64: Fix WFxT handling of nested virt KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructure KVM: arm64: Move CNT*_CVAL_EL0 userspace accessors to generic infrastructure KVM: arm64: Move CNT*_CTL_EL0 userspace accessors to generic infrastructure KVM: arm64: Add timer UAPI workaround to sysreg infrastructure KVM: arm64: Make timer_set_offset() generally accessible KVM: arm64: Replace timer context vcpu pointer with timer_id KVM: arm64: Introduce timer_context_to_vcpu() helper KVM: arm64: Hide CNTHV_*_EL2 from userspace for nVHE guests Documentation: KVM: Update GICv3 docs for GICv5 hosts KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress KVM: arm64: selftests: Allocate vcpus with correct size ...
2025-10-17arm64: debug: always unmask interrupts in el0_softstp()Ada Couprie Diaz1-3/+5
We intend that EL0 exception handlers unmask all DAIF exceptions before calling exit_to_user_mode(). When completing single-step of a suspended breakpoint, we do not call local_daif_restore(DAIF_PROCCTX) before calling exit_to_user_mode(), leaving all DAIF exceptions masked. When pseudo-NMIs are not in use this is benign. When pseudo-NMIs are in use, this is unsound. At this point interrupts are masked by both DAIF.IF and PMR_EL1, and subsequent irq flag manipulation may not work correctly. For example, a subsequent local_irq_enable() within exit_to_user_mode_loop() will only unmask interrupts via PMR_EL1 (leaving those masked via DAIF.IF), and anything depending on interrupts being unmasked (e.g. delivery of signals) will not work correctly. This was detected by CONFIG_ARM64_DEBUG_PRIORITY_MASKING. Move the call to `try_step_suspended_breakpoints()` outside of the check so that interrupts can be unmasked even if we don't call the step handler. Fixes: 0ac7584c08ce ("arm64: debug: split single stepping exception entry") Cc: <stable@vger.kernel.org> # 6.17 Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> [catalin.marinas@arm.com: added Mark's rewritten commit log and some whitespace] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-10-17arm64/sysreg: Fix GIC CDEOI instruction encodingLorenzo Pieralisi1-1/+10
The GIC CDEOI system instruction requires the Rt field to be set to 0b11111 otherwise the instruction behaviour becomes CONSTRAINED UNPREDICTABLE. Currenly, its usage is encoded as a system register write, with a constant 0 value: write_sysreg_s(0, GICV5_OP_GIC_CDEOI) While compiling with GCC, the 0 constant value, through these asm constraints and modifiers ('x' modifier and 'Z' constraint combo): asm volatile(__msr_s(r, "%x0") : : "rZ" (__val)); forces the compiler to issue the XZR register for the MSR operation (ie that corresponds to Rt == 0b11111) issuing the right instruction encoding. Unfortunately LLVM does not yet understand that modifier/constraint combo so it ends up issuing a different register from XZR for the MSR source, which in turns means that it encodes the GIC CDEOI instruction wrongly and the instruction behaviour becomes CONSTRAINED UNPREDICTABLE that we must prevent. Add a conditional to write_sysreg_s() macro that detects whether it is passed a constant 0 value and issues an MSR write with XZR as source register - explicitly doing what the asm modifier/constraint is meant to achieve through constraints/modifiers, fixing the LLVM compilation issue. Fixes: 7ec80fb3f025 ("irqchip/gic-v5: Add GICv5 PPI support") Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Cc: Sascha Bischoff <sascha.bischoff@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-10-14arm64: Revamp HCR_EL2.E2H RES1 detectionMarc Zyngier1-6/+32
We currently have two ways to identify CPUs that only implement FEAT_VHE and not FEAT_E2H0: - either they advertise it via ID_AA64MMFR4_EL1.E2H0, - or the HCR_EL2.E2H bit is RAO/WI However, there is a third category of "cpus" that fall between these two cases: on CPUs that do not implement FEAT_FGT, it is IMPDEF whether an access to ID_AA64MMFR4_EL1 can trap to EL2 when the register value is zero. A consequence of this is that on systems such as Neoverse V2, a NV guest cannot reliably detect that it is in a VHE-only configuration (E2H is writable, and ID_AA64MMFR0_EL1 is 0), despite the hypervisor's best effort to repaint the id register. Replace the RAO/WI test by a sequence that makes use of the VHE register remnapping between EL1 and EL2 to detect this situation, and work out whether we get the VHE behaviour even after having set HCR_EL2.E2H to 0. This solves the NV problem, and provides a more reliable acid test for CPUs that do not completely follow the letter of the architecture while providing a RES1 behaviour for HCR_EL2.E2H. Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Jan Kotas <jank@cadence.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/15A85F2B-1A0C-4FA7-9FE4-EEC2203CC09E@global.cadence.com
2025-10-13arm64: dts: broadcom: bcm2712: Define VGIC interruptPeter Robinson1-0/+2
Define the interrupt in the GICv2 for vGIC so KVM can be used, it was missed from the original upstream DTB for some reason. Signed-off-by: Peter Robinson <pbrobinson@gmail.com> Cc: Andrea della Porta <andrea.porta@suse.com> Cc: Phil Elwell <phil@raspberrypi.com> Fixes: faa3381267d0 ("arm64: dts: broadcom: Add minimal support for Raspberry Pi 5") Link: https://lore.kernel.org/r/20250924085612.1039247-1-pbrobinson@gmail.com Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
2025-10-13KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when availableOliver Upton2-4/+15
Marc reports that the performance of running an L3 guest has regressed by 60% as a result of setting MDCR_EL2.TDA to hide bad architecture. That's of course terrible for the single user of recursive NV ;-) While there's nothing to be done on non-FGT systems, take advantage of the precise write trap of MDSCR_EL1 and leave the rest of the debug registers untrapped. Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Compute per-vCPU FGTs at vcpu_load()Oliver Upton5-131/+151
To date KVM has used the fine-grained traps for the sake of UNDEF enforcement (so-called FGUs), meaning the constituent parts could be computed on a per-VM basis and folded into the effective value when programmed. Prepare for traps changing based on the vCPU context by computing the whole mess of them at vcpu_load(). Aggressively inline all the helpers to preserve the build-time checks that were there before. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Kill leftovers of ad-hoc timer userspace accessMarc Zyngier2-123/+0
Now that the whole timer infrastructure is handled as system register accesses, get rid of the now unused ad-hoc infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Fix WFxT handling of nested virtMarc Zyngier1-1/+6
The spec for WFxT indicates that the parameter to the WFxT instruction is relative to the reading of CNTVCT_EL0. This means that the implementation needs to take the execution context into account, as CNTVOFF_EL2 does not always affect readings of CNTVCT_EL0 (such as when HCR_EL2.E2H is 1 and that we're in host context). This also rids us of the last instance of KVM_REG_ARM_TIMER_CNT outside of the userspace interaction code. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructureMarc Zyngier2-10/+31
Moving the counter registers is a bit more involved than for the control and comparator (there is no shadow data for the counter), but still pretty manageable. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Move CNT*_CVAL_EL0 userspace accessors to generic infrastructureMarc Zyngier2-8/+4
As for the control registers, move the comparator registers to the common infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Move CNT*_CTL_EL0 userspace accessors to generic infrastructureMarc Zyngier2-9/+31
Remove the handling of CNT*_CTL_EL0 from guest.c, and move it to sys_regs.c, using a new TIMER_REG() definition to encapsulate it. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Add timer UAPI workaround to sysreg infrastructureMarc Zyngier2-3/+36
Amongst the numerous bugs that plague the KVM/arm64 UAPI, one of the most annoying thing is that the userspace view of the virtual timer has its CVAL and CNT encodings swapped. In order to reduce the amount of code that has to know about this, start by adding handling for this bug in the sys_reg code. Nothing is making use of it yet, as the code responsible for userspace interaction is catching the accesses early. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Make timer_set_offset() generally accessibleMarc Zyngier1-10/+0
Move the timer_set_offset() helper to arm_arch_timer.h, so that it is next to timer_get_offset(), and accessible by the rest of KVM. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Replace timer context vcpu pointer with timer_idMarc Zyngier1-2/+2
Having to follow a pointer to a vcpu is pretty dumb, when the timers are are a fixed offset in the vcpu structure itself. Trade the vcpu pointer for a timer_id, which can then be used to compute the vcpu address as needed. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Introduce timer_context_to_vcpu() helperMarc Zyngier1-12/+13
We currently have a vcpu pointer nested into each timer context. As we are about to remove this pointer, introduce a helper (aptly named timer_context_to_vcpu()) that returns this pointer, at least until we repaint the data structure. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Hide CNTHV_*_EL2 from userspace for nVHE guestsMarc Zyngier1-13/+13
Although we correctly UNDEF any CNTHV_*_EL2 access from the guest when E2H==0, we still expose these registers to userspace, which is a bad idea. Drop the ad-hoc UNDEF injection and switch to a .visibility() callback which will also hide the register from userspace. Fixes: 0e45981028550 ("KVM: arm64: timer: Don't adjust the EL2 virtual timer offset") Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guestsSascha Bischoff1-1/+4
The ICH_HCR_EL2 traps are used when running on GICv3 hardware, or when running a GICv3-based guest using FEAT_GCIE_LEGACY on GICv5 hardware. When running a GICv2 guest on GICv3 hardware the traps are used to ensure that the guest never sees any part of GICv3 (only GICv2 is visible to the guest), and when running a GICv3 guest they are used to trap in specific scenarios. They are not applicable for a GICv2-native guest, and won't be applicable for a(n upcoming) GICv5 guest. The traps themselves are configured in the vGIC CPU IF state, which is stored as a union. Updating the wrong aperture of the union risks corrupting state, and therefore needs to be avoided at all costs. Bail early if we're not running a compatible guest (GICv2 on GICv3 hardware, GICv3 native, GICv3 on GICv5 hardware). Trap everything unconditionally if we're running a GICv2 guest on GICv3 hardware. Otherwise, conditionally set up GICv3-native trapping. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Guard PMSCR_EL1 initialization with SPE presence checkMukesh Ojha1-5/+10
Commit efad60e46057 ("KVM: arm64: Initialize PMSCR_EL1 when in VHE") does not perform sufficient check before initializing PMSCR_EL1 to 0 when running in VHE mode. On some platforms, this causes the system to hang during boot, as EL3 has not delegated access to the Profiling Buffer to the Non-secure world, nor does it reinject an UNDEF on sysreg trap. To avoid this issue, restrict the PMSCR_EL1 initialization to CPUs that support Statistical Profiling Extension (FEAT_SPE) and have the Profiling Buffer accessible in Non-secure EL1. This is determined via a new helper `cpu_has_spe()` which checks both PMSVer and PMBIDR_EL1.P. This ensures the initialization only affects CPUs where SPE is implemented and usable, preventing boot failures on platforms where SPE is not properly configured. Fixes: efad60e46057 ("KVM: arm64: Initialize PMSCR_EL1 when in VHE") Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Remove unreachable break after returnOsama Abdelkader1-1/+0
Remove an unnecessary 'break' statement that follows a 'return' in arch/arm64/kvm/at.c. The break is unreachable. Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Prevent access to vCPU events before initOliver Upton1-0/+6
Another day, another syzkaller bug. KVM erroneously allows userspace to pend vCPU events for a vCPU that hasn't been initialized yet, leading to KVM interpreting a bunch of uninitialized garbage for routing / injecting the exception. In one case the injection code and the hyp disagree on whether the vCPU has a 32bit EL1 and put the vCPU into an illegal mode for AArch64, tripping the BUG() in exception_target_el() during the next injection: kernel BUG at arch/arm64/kvm/inject_fault.c:40! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 3 UID: 0 PID: 318 Comm: repro Not tainted 6.17.0-rc4-00104-g10fd0285305d #6 PREEMPT Hardware name: linux,dummy-virt (DT) pstate: 21402009 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : exception_target_el+0x88/0x8c lr : pend_serror_exception+0x18/0x13c sp : ffff800082f03a10 x29: ffff800082f03a10 x28: ffff0000cb132280 x27: 0000000000000000 x26: 0000000000000000 x25: ffff0000c2a99c20 x24: 0000000000000000 x23: 0000000000008000 x22: 0000000000000002 x21: 0000000000000004 x20: 0000000000008000 x19: ffff0000c2a99c20 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 00000000200000c0 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : ffff800082f03af8 x7 : 0000000000000000 x6 : 0000000000000000 x5 : ffff800080f621f0 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 000000000040009b x1 : 0000000000000003 x0 : ffff0000c2a99c20 Call trace: exception_target_el+0x88/0x8c (P) kvm_inject_serror_esr+0x40/0x3b4 __kvm_arm_vcpu_set_events+0xf0/0x100 kvm_arch_vcpu_ioctl+0x180/0x9d4 kvm_vcpu_ioctl+0x60c/0x9f4 __arm64_sys_ioctl+0xac/0x104 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xf0 el0t_64_sync_handler+0xa0/0xe4 el0t_64_sync+0x198/0x19c Code: f946bc01 b4fffe61 9101e020 17fffff2 (d4210000) Reject the ioctls outright as no sane VMM would call these before KVM_ARM_VCPU_INIT anyway. Even if it did the exception would've been thrown away by the eventual reset of the vCPU's state. Cc: stable@vger.kernel.org # 6.17 Fixes: b7b27facc7b5 ("arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: Use the in-context stage-1 in __kvm_find_s1_desc_level()Oliver Upton1-1/+5
Running the external_aborts selftest at EL2 leads to an ugly splat due to the stage-1 MMU being disabled for the walked context, owing to the fact that __kvm_find_s1_desc_level() is hardcoded to the EL1&0 regime. Select the appropriate translation regime for the stage-1 walk based on the current vCPU context. Fixes: b8e625167a32 ("KVM: arm64: Add S1 IPA to page table level walker") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: nv: Don't advance PC when pending an SVE exceptionMarc Zyngier1-1/+1
Jan reports that running a nested guest on Neoverse-V2 leads to a WARN in the host due to simultaneously pending an exception and PC increment after an access to ZCR_EL2. Returning true from a sysreg accessor is an indication that the sysreg instruction has been retired. Of course this isn't the case when we've pended a synchronous SVE exception for the guest. Fix the return value and let the exception propagate to the guest as usual. Reported-by: Jan Kotas <jank@cadence.com> Closes: https://lore.kernel.org/kvmarm/865xd61tt5.wl-maz@kernel.org/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: nv: Don't treat ZCR_EL2 as a 'mapped' registerOliver Upton1-4/+2
Unlike the other mapped EL2 sysregs ZCR_EL2 isn't guaranteed to be resident when a vCPU is loaded as it actually follows the SVE context. As such, the contents of ZCR_EL1 may belong to another guest if the vCPU has been preempted before reaching sysreg emulation. Unconditionally use the in-memory value of ZCR_EL2 and switch to the memory-only accessors. The in-memory value is guaranteed to be valid as fpsimd_lazy_switch_to_{guest,host}() will restore/save the register appropriately. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-07Merge tag 'arm64-fixes' of ↵Linus Torvalds6-113/+115
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Preserve old 'tt_core' UAPI for Hisilicon L3C PMU driver - Ensure linear alias of kprobes instruction page is not writable - Fix kernel stack unwinding from BPF - Fix build warnings from the Fujitsu uncore PMU documentation - Fix hang with deferred 'struct page' initialisation and MTE - Consolidate KPTI page-table re-writing code * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mte: Do not flag the zero page as PG_mte_tagged docs: perf: Fujitsu: Fix htmldocs build warnings and errors arm64: mm: Move KPTI helpers to mmu.c tracing: Fix the bug where bpf_get_stackid returns -EFAULT on the ARM64 arm64: kprobes: call set_memory_rox() for kprobe page drivers/perf: hisi: Add tt_core_deprecated for compatibility
2025-10-07Merge tag 'hyperv-next-signed-20251006' of ↵Linus Torvalds2-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Unify guest entry code for KVM and MSHV (Sean Christopherson) - Switch Hyper-V MSI domain to use msi_create_parent_irq_domain() (Nam Cao) - Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV (Mukesh Rathor) - Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov) - Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna Kumar T S M) - Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari, Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley) * tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hyperv: Remove the spurious null directive line MAINTAINERS: Mark hyperv_fb driver Obsolete fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver Drivers: hv: Make CONFIG_HYPERV bool Drivers: hv: Add CONFIG_HYPERV_VMBUS option Drivers: hv: vmbus: Fix typos in vmbus_drv.c Drivers: hv: vmbus: Fix sysfs output format for ring buffer index Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store() x86/hyperv: Switch to msi_create_parent_irq_domain() mshv: Use common "entry virt" APIs to do work in root before running guest entry: Rename "kvm" entry code assets to "virt" to genericize APIs entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper mshv: Handle NEED_RESCHED_LAZY before transferring to guest x86/hyperv: Add kexec/kdump support on Azure CVMs Drivers: hv: Simplify data structures for VMBus channel close message Drivers: hv: util: Cosmetic changes for hv_utils_transport.c mshv: Add support for a new parent partition configuration clocksource: hyper-v: Skip unnecessary checks for the root partition hyperv: Add missing field to hv_output_map_device_interrupt
2025-10-04Merge tag 'platform-drivers-x86-v6.18-1' of ↵Linus Torvalds1-0/+24
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Ilpo Järvinen: - amd/pmf: - Add support for adjusting PMF PPT and PPT APU thresholds - Extend custom BIOS inputs for more policies - Update ta_pmf_action structure to the latest PMF TA - arm64: - thinkpad-t14s-ec: Add EC driver for ThinkPad T14s Gen6 Snapdragon - int3472: - Increase handshake GPIO delay - intel/pmc: - SSRAM support for Lunar Lake and Panther Lake - Support reading substate requirements data from S0ix blockers (for platforms starting from Panther Lake) - Wildcat Lake support - intel-uncore-freq: - Solve duplicate sysfs entry warnings - Present unique domain ID per package - portwell-ec: - Support suspend/resume - Add hwmon support for voltage and temperature - redmi-wmi: - Add WMI driver for Redmibook keyboard - think-lmi: - Certificate support for ThinkCenter - x86-android-tables + others: - Convert away from legacy GPIO APIs - x86-android-tables: - Add support for Acer A1-840 tablet - Fix modules list for Lenovo devices - Stop using EPROBE_DEFER - Miscellaneous cleanups / refactoring / improvements * tag 'platform-drivers-x86-v6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (63 commits) platform/x86: pcengines-apuv2: Use static device properties platform/x86: meraki-mx100: Use static device properties platform/x86: barco-p50-gpio: use software nodes for gpio-leds/keys platform/x86: x86-android-tablets: Stop using EPROBE_DEFER platform/x86: x86-android-tablets: Fix modules lists for Lenovo devices platform/x86: x86-android-tablets: Simplify lenovo_yoga_tab2_830_1050_exit() platform/x86: x86-android-tablets: Add support for Acer A1-840 tablet platform/x86: x86-android-tablets: Move Acer info to its own file platform/x86: x86-android-tablets: Update my email address platform/x86: x86-android-tablets: Simplify node-group [un]registration platform/x86: x86-android-tablets: use swnode_group instead of manual registering platform/x86: x86-android-tablets: replace bat_swnode with swnode_group platform/x86: x86-android-tablets: convert gpio_keys devices to GPIO references platform/x86: x86-android-tablets: remove support for GPIO lookup tables platform/x86: x86-android-tablets: convert Yoga Tab2 fast charger to GPIO references platform/x86: x86-android-tablets: convert HID-I2C devices to GPIO references platform/x86: x86-android-tablets: convert wm1502 devices to GPIO references platform/x86: x86-android-tablets: convert int3496 devices to GPIO references platform/x86: x86-android-tablets: convert EDT devices to GPIO references platform/x86: x86-android-tablets: convert Novatek devices to GPIO references ...
2025-10-04Merge tag 'v6.18-p1' of ↵Linus Torvalds2-20/+2
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Drivers: - Add ciphertext hiding support to ccp - Add hashjoin, gather and UDMA data move features to hisilicon - Add lz4 and lz77_only to hisilicon - Add xilinx hwrng driver - Add ti driver with ecb/cbc aes support - Add ring buffer idle and command queue telemetry for GEN6 in qat Others: - Use rcu_dereference_all to stop false alarms in rhashtable - Fix CPU number wraparound in padata" * tag 'v6.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (78 commits) dt-bindings: rng: hisi-rng: convert to DT schema crypto: doc - Add explicit title heading to API docs hwrng: ks-sa - fix division by zero in ks_sa_rng_init KEYS: X.509: Fix Basic Constraints CA flag parsing crypto: anubis - simplify return statement in anubis_mod_init crypto: hisilicon/qm - set NULL to qm->debug.qm_diff_regs crypto: hisilicon/qm - clear all VF configurations in the hardware crypto: hisilicon - enable error reporting again crypto: hisilicon/qm - mask axi error before memory init crypto: hisilicon/qm - invalidate queues in use crypto: qat - Return pointer directly in adf_ctl_alloc_resources crypto: aspeed - Fix dma_unmap_sg() direction rhashtable: Use rcu_dereference_all and rcu_dereference_all_check crypto: comp - Use same definition of context alloc and free ops crypto: omap - convert from tasklet to BH workqueue crypto: qat - Replace kzalloc() + copy_from_user() with memdup_user() crypto: caam - double the entropy delay interval for retry padata: WQ_PERCPU added to alloc_workqueue users padata: replace use of system_unbound_wq with system_dfl_wq crypto: cryptd - WQ_PERCPU added to alloc_workqueue users ...
2025-10-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds37-544/+1349
Pull kvm updates from Paolo Bonzini: "This excludes the bulk of the x86 changes, which I will send separately. They have two not complex but relatively unusual conflicts so I will wait for other dust to settle. guest_memfd: - Add support for host userspace mapping of guest_memfd-backed memory for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have no way to detect private vs. shared). This lays the groundwork for removal of guest memory from the kernel direct map, as well as for limited mmap() for guest_memfd-backed memory. For more information see: - commit a6ad54137af9 ("Merge branch 'guest-memfd-mmap' into HEAD") - guest_memfd in Firecracker: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding - direct map removal: https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/ - mmap support: https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/ ARM: - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature. LoongArch: - Detect page table walk feature on new hardware - Add sign extension with kernel MMIO/IOCSR emulation - Improve in-kernel IPI emulation - Improve in-kernel PCH-PIC emulation - Move kvm_iocsr tracepoint out of generic code RISC-V: - Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V - Added SBI v3.0 PMU enhancements in KVM and perf driver s390: - Improve interrupt cpu for wakeup, in particular the heuristic to decide which vCPU to deliver a floating interrupt to. - Clear the PTE when discarding a swapped page because of CMMA; this bug was introduced in 6.16 when refactoring gmap code. x86 selftests: - Add #DE coverage in the fastops test (the only exception that's guest- triggerable in fastop-emulated instructions). - Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra Forest (SRF) and Clearwater Forest (CWF). - Minor cleanups and improvements x86 (guest side): - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping could map devices as WB and prevent the device drivers from mapping their devices with UC/UC-. - Make kvm_async_pf_task_wake() a local static helper and remove its export. - Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU bindings even when PV_UNHALT is unsupported. Generic: - Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is now included in GFP_NOWAIT. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits) KVM: s390: Fix to clear PTE when discarding a swapped page KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs KVM: arm64: selftests: Cope with arch silliness in EL2 selftest KVM: arm64: selftests: Add basic test for running in VHE EL2 KVM: arm64: selftests: Enable EL2 by default KVM: arm64: selftests: Initialize HCR_EL2 KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2 KVM: arm64: selftests: Select SMCCC conduit based on current EL KVM: arm64: selftests: Provide helper for getting default vCPU target KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts KVM: arm64: selftests: Create a VGICv3 for 'default' VMs KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation KVM: arm64: selftests: Add helper to check for VGICv3 support KVM: arm64: selftests: Initialize VGICv3 only once KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code KVM: selftests: Add ex_str() to print human friendly name of exception vectors selftests/kvm: remove stale TODO in xapic_state_test KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount ...
2025-10-03arm64: mte: Do not flag the zero page as PG_mte_taggedCatalin Marinas2-4/+8
Commit 68d54ceeec0e ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page") attempted to fix ptrace() reading of tags from the zero page by marking it as PG_mte_tagged during cpu_enable_mte(). The same commit also changed the ptrace() tag access permission check to the VM_MTE vma flag while turning the page flag test into a WARN_ON_ONCE(). Attempting to set the PG_mte_tagged flag early with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled may either hang (after commit d77e59a8fccd "arm64: mte: Lock a page for MTE tag initialisation") or have the flags cleared later during page_alloc_init_late(). In addition, pages_identical() -> memcmp_pages() will reject any comparison with the zero page as it is marked as tagged. Partially revert the above commit to avoid setting PG_mte_tagged on the zero page. Update the __access_remote_tags() warning on untagged pages to ignore the zero page since it is known to have the tags initialised. Note that all user mapping of the zero page are marked as pte_special(). The arm64 set_pte_at() will not call mte_sync_tags() on such pages, so PG_mte_tagged will remain cleared. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: 68d54ceeec0e ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page") Reported-by: Gergely Kovacs <Gergely.Kovacs2@arm.com> Cc: stable@vger.kernel.org # 5.10.x Cc: Will Deacon <will@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lance Yang <lance.yang@linux.dev> Acked-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: David Hildenbrand <david@redhat.com> Tested-by: Lance Yang <lance.yang@linux.dev> Signed-off-by: Will Deacon <will@kernel.org>
2025-10-02Merge tag 'mm-stable-2025-10-01-19-00' of ↵Linus Torvalds10-21/+43
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature - "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code - "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON - "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling * tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ...
2025-10-01Merge tag 'kbuild-6.18-1' of ↵Linus Torvalds1-4/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild updates from Nathan Chancellor: - Extend modules.builtin.modinfo to include module aliases from MODULE_DEVICE_TABLE for builtin modules so that userspace tools (such as kmod) can verify that a particular module alias will be handled by a builtin module - Bump the minimum version of LLVM for building the kernel to 15.0.0 - Upgrade several userspace API checks in headers_check.pl to errors - Unify and consolidate CONFIG_WERROR / W=e handling - Turn assembler and linker warnings into errors with CONFIG_WERROR / W=e - Respect CONFIG_WERROR / W=e when building userspace programs (userprogs) - Enable -Werror unconditionally when building host programs (hostprogs) - Support copy_file_range() and data segment alignment in gen_init_cpio to improve performance on filesystems that support reflinks such as btrfs and XFS - Miscellaneous small changes to scripts and configuration files * tag 'kbuild-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (47 commits) modpost: Initialize builtin_modname to stop SIGSEGVs Documentation: kbuild: note CONFIG_DEBUG_EFI in reproducible builds kbuild: vmlinux.unstripped should always depend on .vmlinux.export.o modpost: Create modalias for builtin modules modpost: Add modname to mod_device_table alias scsi: Always define blogic_pci_tbl structure kbuild: extract modules.builtin.modinfo from vmlinux.unstripped kbuild: keep .modinfo section in vmlinux.unstripped kbuild: always create intermediate vmlinux.unstripped s390: vmlinux.lds.S: Reorder sections KMSAN: Remove tautological checks objtool: Drop noinstr hack for KCSAN_WEAK_MEMORY lib/Kconfig.debug: Drop CLANG_VERSION check from DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT riscv: Remove ld.lld version checks from many TOOLCHAIN_HAS configs riscv: Unconditionally use linker relaxation riscv: Remove version check for LTO_CLANG selects powerpc: Drop unnecessary initializations in __copy_inst_from_kernel_nofault() mips: Unconditionally select ARCH_HAS_CURRENT_STACK_POINTER arm64: Remove tautological LLVM Kconfig conditions ARM: Clean up definition of ARM_HAS_GROUP_RELOCS ...
2025-10-01Merge tag 'soc-drivers-6.18' of ↵Linus Torvalds1-14/+37
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "Lots of platform specific updates for Qualcomm SoCs, including a new TEE subsystem driver for the Qualcomm QTEE firmware interface. Added support for the Apple A11 SoC in drivers that are shared with the M1/M2 series, among more updates for those. Smaller platform specific driver updates for Renesas, ASpeed, Broadcom, Nvidia, Mediatek, Amlogic, TI, Allwinner, and Freescale SoCs. Driver updates in the cache controller, memory controller and reset controller subsystems. SCMI firmware updates to add more features and improve robustness. This includes support for having multiple SCMI providers in a single system. TEE subsystem support for protected DMA-bufs, allowing hardware to access memory areas that managed by the kernel but remain inaccessible from the CPU in EL1/EL0" * tag 'soc-drivers-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (139 commits) soc/fsl/qbman: Use for_each_online_cpu() instead of for_each_cpu() soc: fsl: qe: Drop legacy-of-mm-gpiochip.h header from GPIO driver soc: fsl: qe: Change GPIO driver to a proper platform driver tee: fix register_shm_helper() pmdomain: apple: Add "apple,t8103-pmgr-pwrstate" dt-bindings: spmi: Add Apple A11 and T2 compatible serial: qcom-geni: Load UART qup Firmware from linux side spi: geni-qcom: Load spi qup Firmware from linux side i2c: qcom-geni: Load i2c qup Firmware from linux side soc: qcom: geni-se: Add support to load QUP SE Firmware via Linux subsystem soc: qcom: geni-se: Cleanup register defines and update copyright dt-bindings: qcom: se-common: Add QUP Peripheral-specific properties for I2C, SPI, and SERIAL bus Documentation: tee: Add Qualcomm TEE driver tee: qcom: enable TEE_IOC_SHM_ALLOC ioctl tee: qcom: add primordial object tee: add Qualcomm TEE driver tee: increase TEE_MAX_ARG_SIZE to 4096 tee: add TEE_IOCTL_PARAM_ATTR_TYPE_OBJREF tee: add TEE_IOCTL_PARAM_ATTR_TYPE_UBUF tee: add close_context to TEE driver operation ...
2025-10-01Merge tag 'soc-defconfig-6.18' of ↵Linus Torvalds1-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC defconfig updates from Arnd Bergmann: "Only a small set up updates, enabling a few drivers for Artpec, THead, Renesas and Broadcom chips, and cleaning out some Qualcomm options that were removed previously" * tag 'soc-defconfig-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: ARM: configs: u8500: Set NFC_SHDLC as built-in riscv: defconfig: Enable MMP_PDMA support for SpacemiT K1 SoC riscv: defconfig: run savedefconfig to reorder it ARM: defconfig: Remove obsolete CONFIG_USB_EHCI_MSM arm64: defconfig: Enable Marvell WiFi-Ex USB driver arm64: defconfig: Enable BCM2712 on-chip pin controller driver arm64: defconfig: Enable Axis ARTPEC SoC ARM: s3c6400_defconfig: Drop MTD_NAND_S3C2410 ARM: defconfig: pxa: Remove duplicate CONFIG_USB_GPIO_VBUS entry ARM: defconfig: cleanup orphaned CONFIGs arm64: defconfig: enable i.MX91 pinctrl arm64: defconfig: Enable X1P42100 GPUCC driver arm64: defconfig: Enable QCS615 clock controllers arm64: defconfig: Enable the RZ/V2H(P) RSPI driver arm64: defconfig: Enable Renesas RZ/T2H serial SCI
2025-10-01Merge tag 'soc-dt-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds549-11702/+51401
Pull SoC dt updates from Arnd Bergmann: "There are five sets of new SoCs that get added in existing families, all of them being either upgrades or cut-down versions of the older chips: - Apple M2 Pro, M2 Max and M2 Ultra, used in the 2022/2023 generation of high-end workstations and laptops from Apple. Linux has been working on these for a while but stil requires patches. - Axis Artpec8 is an Armv8 chip based on Samsung Exynos design, unlike the earlier Armv7 Artpec6 from the same company that was part of a separate family of chips. - NXP i.MX91 is a cut-down version of i.MX93, using only a single Cortex-A55 core. - Qualcomm Lemans Auto is a variant of the Lemans SoC that was originally merged under the sa8775p name, the differences being mostly the firmware configuration of the platform. - Four new Renesas SoCs RZ/T2H (r9a09g077m44), RZ/N2H (r9a09g087m44), RZ/T2H (r9a09g077), and RZ/N2H (r9a09g087) are all industrial bedded SoCs based on Cortex-A55 cores In total, there are 65 new machines, including: - Industrial embedded system and single-board computers based on NXP, Allwinner, TI, Rockchips, Marvell, Xilinx Spacemit, Starfive chips. - Reference boards for the newly added Renesas, Qualcomm, NXP and Axis ARMv8 chips as well as Microchip's MPFS RISC-V SoC - Laptops and Workstations using Apple M2 and Qualcomm Snapdragon X1 chips. - Several Samsung phones using Qualcomm Snapdragon chips - Set-top boxes based on Allwinner H313 - Five BMC boards using 32-bit ASpeed SoCs - Three network routers using IXP4xx (ARMv5!) and Broadcom bcm4708 (ARMv7) SoCs Two machines get phased out because they were available only in small quantities but never made it into products: one STi407 based reference board, and a Snapdragon 845 based Chromebook. Aside from the newly added machines, a lot of work went into improving hardware support on the existing machines and cleaning up contents for validation" * tag 'soc-dt-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (931 commits) arm64: dts: apm-shadowcat: Drop "apm,xgene2-pcie" compatible arm64: dts: apm-shadowcat: Move slimpro nodes out of "simple-bus" node ARM: dts: microchip: sam9x7: Add qspi controller arm64: dts: qcom: Add MST pixel streams for displayport arm64: dts: qcom: sm6350: correct DP compatibility strings arm64: dts: qcom: monaco-evk: Enable Adreno 623 GPU arm64: dts: qcom: qcs8300-ride: Enable Adreno 623 GPU arm64: dts: qcom: qcs8300: Add gpu and gmu nodes arm64: dts: allwinner: h313: Add Amediatech X96Q dt-bindings: arm: sunxi: Add Amediatech X96Q arm64: dts: apple: t8015: Add SPMI node arm64: dts: apple: t8012: Add SPMI node arm64: dts: apple: Add J180d (Mac Pro, M2 Ultra, 2023) device tree arm64: dts: rockchip: Add devicetree for the ROC-RK3588-RT dt-bindings: arm: rockchip: Add Firefly ROC-RK3588-RT arm64: dts: rockchip: update pinctrl names for Radxa E52C arm64: dts: rockchip: remove vcc_3v3_pmu regulator for Radxa E52C arm64: dts: apple: Add J474s, J475c and J475d device trees arm64: dts: apple: Add J414 and J416 Macbook Pro device trees arm64: dts: apple: Add initial t6020/t6021/t6022 DTs ...
2025-10-01Merge tag 'pm-6.18-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The majority of these are cpufreq changes, which has been a recurring pattern for a few recent cycles. Those changes include new hardware support (AN7583 SoC support in the airoha cpufreq driver, ipq5424 support in the qcom-nvmem cpufreq driver, MT8196 support in the mediatek cpufreq driver, AM62D2 support in the ti cpufreq driver), DT bindings and Rust code updates, cleanups of the core and governors, and multiple driver fixes and cleanups. Beyond that, there are hibernation fixes (some remaining 6.16 cycle fallout and an issue related to hybrid suspend in the amdgpu driver), cleanups of the PM core code, runtime PM documentation update, cpuidle and power capping cleanups, and tooling updates. Specifics: - Rearrange variable declarations involving __free() in the cpufreq core and intel_pstate driver to follow common coding style (Rafael Wysocki) - Fix object lifecycle issue in update_qos_request(), rearrange freq QoS updates using __free(), and adjust frequency percentage computations in the intel_pstate driver (Rafael Wysocki) - Update intel_pstate to allow it to enable HWP without EPP if the new DEC (Dynamic Efficiency Control) HW feature is enabled (Rafael Wysocki) - Use on_each_cpu_mask() in drv_write() in the ACPI cpufreq driver to simplify the code (Rafael Wysocki) - Use likely() optimization in intel_pstate_sample() (Yaxiong Tian) - Remove dead EPB-related code from intel_pstate (Srinivas Pandruvada) - Use scope-based cleanup for cpufreq policy references in multiple cpufreq drivers (Zihuan Zhang) - Avoid calling get_governor() for the first policy in the cpufreq core to simplify the initial policy path (Zihuan Zhang) - Clean up the cpufreq core in multiple places (Zihuan Zhang) - Use int type to store negative error codes in the cpufreq core and update the speedstep-lib to use int for error codes (Qianfeng Rong) - Update the efficient idle check for Intel extended Families in the ondemand cpufreq governor (Sohil Mehta) - Replace sscanf() with kstrtouint() in the conservative cpufreq governor (Kaushlendra Kumar) - Rename CpumaskVar::as[_mut]_ref to from_raw[_mut] in the cpumask Rust code and mark CpumaskVar as transparent (Alice Ryhl, Baptiste Lepers) - Update ARef and AlwaysRefCounted imports from sync::aref in the OPP Rust code (Shankari Anand) - Add support for AN7583 SoC to the airoha cpufreq driver (Christian Marangi) - Enable cpufreq for ipq5424 in the qcom-nvmem cpufreq driver (Md Sadre Alam) - Add support for MT8196 to the mediatek-hw cpufreq driver, refactor that driver and add mediatek,mt8196-cpufreq-hw DT binding (Nicolas Frattaroli) - Avoid redundant conditions in the mediatek cpufreq driver (Liao Yuanhong) - Add support for AM62D2 to the ti cpufreq driver and blocklist ti,am62d2 SoC in dt-platdev (Paresh Bhagat) - Support more speed grades on AM62Px SoC in the ti cpufreq driver, allow all silicon revisions to support OPPs in it, and fix supported hardware for 1GHz OPP (Judith Mendez) - Add QCS615 compatible to DT bindings for cpufreq-qcom-hw (Taniya Das) - Minor assorted updates of the scmi, longhaul, CPPC, and armada-37xx cpufreq drivers (Akhilesh Patil, BowenYu, Dennis Beier, and Florian Fainelli) - Remove outdated cpufreq-dt.txt (Frank Li) - Fix python gnuplot package names in the amd_pstate_tracer utility (Kuan-Wei Chiu) - Saravana Kannan will maintain the virtual-cpufreq driver (Saravana Kannan) - Prevent CPU capacity updates after registering a perf domain from failing on a first CPU that is not present (Christian Loehle) - Add support for the cases in which frequency alone is not sufficient to uniquely identify an OPP (Krishna Chaitanya Chundru) - Use to_result() for OPP error handling in Rust (Onur Özkan) - Add support for LPDDR5 on Rockhip RK3588 SoC to rockchip-dfi devfreq driver (Nicolas Frattaroli) - Fix an issue where DDR cycle counts on RK3588/RK3528 with LPDDR4(X) are reported as half by adding a cycle multiplier to the DFI driver in rockchip-dfi devfreq-event driver (Nicolas Frattaroli) - Fix missing error pointer dereference check of regulator instance in the mtk-cci devfreq driver probe and remove a redundant condition from an if () statement in that driver (Dan Carpenter, Liao Yuanhong) - Fail cpuidle device registration if there is one already to avoid sysfs-related issues (Rafael Wysocki) - Use sysfs_emit()/sysfs_emit_at() instead of sprintf()/scnprintf() in cpuidle (Vivek Yadav) - Fix device and OF node leaks at probe in the qcom-spm cpuidle driver and drop unnecessary initialisations from it (Johan Hovold) - Remove unnecessary address-of operators from the intel_idle cpuidle driver (Kaushlendra Kumar) - Rearrange main loop in menu_select() to make the code in that funtion easier to follow (Rafael Wysocki) - Convert values in microseconds to ktime using us_to_ktime() where applicable in the intel_idle power capping driver (Xichao Zhao) - Annotate loops walking device links in the power management core code as _srcu and add macros for walking device links to reduce the likelihood of coding mistakes related to them (Rafael Wysocki) - Document time units for *_time functions in the runtime PM API (Brian Norris) - Clear power.must_resume in noirq suspend error path to avoid resuming a dependant device under a suspended parent or supplier (Rafael Wysocki) - Fix GFP mask handling during hybrid suspend and make the amdgpu driver handle hybrid suspend correctly (Mario Limonciello, Rafael Wysocki) - Fix GFP mask handling after aborted hibernation in platform mode and combine exit paths in power_down() to avoid code duplication (Rafael Wysocki) - Use vmalloc_array() and vcalloc() in the hibernation core to avoid open-coded size computations (Qianfeng Rong) - Fix typo in hibernation core code comment (Li Jun) - Call pm_wakeup_clear() in the same place where other functions that do bookkeeping prior to suspend_prepare() are called (Samuel Wu) - Fix and clean up the x86_energy_perf_policy utility and update its documentation (Len Brown, Kaushlendra Kumar) - Fix incorrect sorting of PMT telemetry in turbostat (Kaushlendra Kumar) - Fix incorrect size in cpuidle_state_disable() and the error return value of cpupower_write_sysfs() in cpupower (Kaushlendra Kumar)" * tag 'pm-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits) PM: hibernate: Combine return paths in power_down() PM: hibernate: Restrict GFP mask in power_down() PM: hibernate: Fix pm_hibernation_mode_is_suspend() build breakage PM: runtime: Documentation: ABI: Document time units for *_time tools/power x86_energy_perf_policy.8: Emphasize preference for SW interfaces tools/power x86_energy_perf_policy: Add make snapshot target tools/power x86_energy_perf_policy: Prefer driver HWP limits tools/power x86_energy_perf_policy: EPB access is only via sysfs tools/power x86_energy_perf_policy: Prepare for MSR/sysfs refactoring tools/power x86_energy_perf_policy: Enhance HWP enable tools/power x86_energy_perf_policy: Enhance HWP enabled check tools/power x86_energy_perf_policy: Fix incorrect fopen mode usage tools/power turbostat: Fix incorrect sorting of PMT telemetry drm/amd: Fix hybrid sleep PM: hibernate: Add pm_hibernation_mode_is_suspend() PM: hibernate: Fix hybrid-sleep tools/cpupower: Fix incorrect size in cpuidle_state_disable() tools/power/x86/amd_pstate_tracer: Fix python gnuplot package names cpufreq: Replace pointer subtraction with iteration macro cpuidle: Fail cpuidle device registration if there is one already ...
2025-09-30Merge tag 'bpf-next-6.18' of ↵Linus Torvalds3-35/+134
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Support pulling non-linear xdp data with bpf_xdp_pull_data() kfunc (Amery Hung) Applied as a stable branch in bpf-next and net-next trees. - Support reading skb metadata via bpf_dynptr (Jakub Sitnicki) Also a stable branch in bpf-next and net-next trees. - Enforce expected_attach_type for tailcall compatibility (Daniel Borkmann) - Replace path-sensitive with path-insensitive live stack analysis in the verifier (Eduard Zingerman) This is a significant change in the verification logic. More details, motivation, long term plans are in the cover letter/merge commit. - Support signed BPF programs (KP Singh) This is another major feature that took years to materialize. Algorithm details are in the cover letter/marge commit - Add support for may_goto instruction to s390 JIT (Ilya Leoshkevich) - Add support for may_goto instruction to arm64 JIT (Puranjay Mohan) - Fix USDT SIB argument handling in libbpf (Jiawei Zhao) - Allow uprobe-bpf program to change context registers (Jiri Olsa) - Support signed loads from BPF arena (Kumar Kartikeya Dwivedi and Puranjay Mohan) - Allow access to union arguments in tracing programs (Leon Hwang) - Optimize rcu_read_lock() + migrate_disable() combination where it's used in BPF subsystem (Menglong Dong) - Introduce bpf_task_work_schedule*() kfuncs to schedule deferred execution of BPF callback in the context of a specific task using the kernel’s task_work infrastructure (Mykyta Yatsenko) - Enforce RCU protection for KF_RCU_PROTECTED kfuncs (Kumar Kartikeya Dwivedi) - Add stress test for rqspinlock in NMI (Kumar Kartikeya Dwivedi) - Improve the precision of tnum multiplier verifier operation (Nandakumar Edamana) - Use tnums to improve is_branch_taken() logic (Paul Chaignon) - Add support for atomic operations in arena in riscv JIT (Pu Lehui) - Report arena faults to BPF error stream (Puranjay Mohan) - Search for tracefs at /sys/kernel/tracing first in bpftool (Quentin Monnet) - Add bpf_strcasecmp() kfunc (Rong Tao) - Support lookup_and_delete_elem command in BPF_MAP_STACK_TRACE (Tao Chen) * tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (197 commits) libbpf: Replace AF_ALG with open coded SHA-256 selftests/bpf: Add stress test for rqspinlock in NMI selftests/bpf: Add test case for different expected_attach_type bpf: Enforce expected_attach_type for tailcall compatibility bpftool: Remove duplicate string.h header bpf: Remove duplicate crypto/sha2.h header libbpf: Fix error when st-prefix_ops and ops from differ btf selftests/bpf: Test changing packet data from kfunc selftests/bpf: Add stacktrace map lookup_and_delete_elem test case selftests/bpf: Refactor stacktrace_map case with skeleton bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE selftests/bpf: Fix flaky bpf_cookie selftest selftests/bpf: Test changing packet data from global functions with a kfunc bpf: Emit struct bpf_xdp_sock type in vmlinux BTF selftests/bpf: Task_work selftest cleanup fixes MAINTAINERS: Delete inactive maintainers from AF_XDP bpf: Mark kfuncs as __noclone selftests/bpf: Add kprobe multi write ctx attach test selftests/bpf: Add kprobe write ctx attach test selftests/bpf: Add uprobe context ip register change test ...
2025-09-30Merge tag 'timers-vdso-2025-09-29' of ↵Linus Torvalds5-11/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull VDSO updates from Thomas Gleixner: - Further consolidation of the VDSO infrastructure and the common data store - Simplification of the related Kconfig logic - Improve the VDSO selftest suite * tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests: vDSO: Drop vdso_test_clock_getres selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64() selftests: vDSO: vdso_test_abi: Test CPUTIME clocks selftests: vDSO: vdso_test_abi: Use explicit indices for name array selftests: vDSO: vdso_test_abi: Drop clock availability tests selftests: vDSO: vdso_test_abi: Use ksft_finished() selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO selftests: vDSO: Fix -Wunitialized in powerpc VDSO_CALL() wrapper vdso: Add struct __kernel_old_timeval forward declaration to gettime.h vdso: Gate VDSO_GETRANDOM behind HAVE_GENERIC_VDSO vdso: Drop Kconfig GENERIC_VDSO_TIME_NS vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE vdso: Drop kconfig GENERIC_COMPAT_VDSO vdso: Drop kconfig GENERIC_VDSO_32 riscv: vdso: Untangle Kconfig logic time: Build generic update_vsyscall() only with generic time vDSO vdso/gettimeofday: Remove !CONFIG_TIME_NS stubs vdso: Move ENABLE_COMPAT_VDSO from core to arm64 ARM: VDSO: Remove cntvct_ok global variable vdso/datastore: Gate time data behind CONFIG_GENERIC_GETTIMEOFDAY
2025-09-30entry: Rename "kvm" entry code assets to "virt" to genericize APIsSean Christopherson1-1/+1
Rename the "kvm" entry code files and Kconfigs to use generic "virt" nomenclature so that the code can be reused by other hypervisors (or rather, their root/dom0 partition drivers), without incorrectly suggesting the code somehow relies on and/or involves KVM. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-09-30entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM properSean Christopherson1-2/+1
Move KVM's morphing of pending signals into userspace exits into KVM proper, and drop the @vcpu param from xfer_to_guest_mode_handle_work(). How KVM responds to -EINTR is a detail that really belongs in KVM itself, and invoking kvm_handle_signal_exit() from kernel code creates an inverted module dependency. E.g. attempting to move kvm_handle_signal_exit() into kvm_main.c would generate an linker error when building kvm.ko as a module. Dropping KVM details will also converting the KVM "entry" code into a more generic virtualization framework so that it can be used when running as a Hyper-V root partition. Lastly, eliminating usage of "struct kvm_vcpu" outside of KVM is also nice to have for KVM x86 developers, as keeping the details of kvm_vcpu purely within KVM allows changing the layout of the structure without having to boot into a new kernel, e.g. allows rebuilding and reloading kvm.ko with a modified kvm_vcpu structure as part of debug/development. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-09-30Merge tag 'sched-core-2025-09-26' of ↵Linus Torvalds2-23/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core scheduler changes: - Make migrate_{en,dis}able() inline, to improve performance (Menglong Dong) - Move STDL_INIT() functions out-of-line (Peter Zijlstra) - Unify the SCHED_{SMT,CLUSTER,MC} Kconfig (Peter Zijlstra) Fair scheduling: - Defer throttling to when tasks exit to user-space, to reduce the chance & impact of throttle-preemption with held locks and other resources (Aaron Lu, Valentin Schneider) - Get rid of sched_domains_curr_level hack for tl->cpumask(), as the warning was getting triggered on certain topologies (Peter Zijlstra) Misc cleanups & fixes: - Header cleanups (Menglong Dong) - Fix race in push_dl_task() (Harshit Agarwal)" * tag 'sched-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix some typos in include/linux/preempt.h sched: Make migrate_{en,dis}able() inline rcu: Replace preempt.h with sched.h in include/linux/rcupdate.h arch: Add the macro COMPILE_OFFSETS to all the asm-offsets.c sched/fair: Do not balance task to a throttled cfs_rq sched/fair: Do not special case tasks in throttled hierarchy sched/fair: update_cfs_group() for throttled cfs_rqs sched/fair: Propagate load for throttled cfs_rq sched/fair: Get rid of throttled_lb_pair() sched/fair: Task based throttle time accounting sched/fair: Switch to task based throttle model sched/fair: Implement throttle task work and related helpers sched/fair: Add related data structure for task based throttle sched: Unify the SCHED_{SMT,CLUSTER,MC} Kconfig sched: Move STDL_INIT() functions out-of-line sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() sched/deadline: Fix race in push_dl_task()
2025-09-30Merge tag 'loongarch-kvm-6.18' of ↵Paolo Bonzini1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.18 1. Add PTW feature detection on new hardware. 2. Add sign extension with kernel MMIO/IOCSR emulation. 3. Improve in-kernel IPI emulation. 4. Improve in-kernel PCH-PIC emulation. 5. Move kvm_iocsr tracepoint out of generic code.
2025-09-30Merge tag 'kvm-riscv-6.18-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini29-54/+146
KVM/riscv changes for 6.18 - Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V such as access_tracking_perf_test, dirty_log_perf_test, memslot_modification_stress_test, memslot_perf_test, mmu_stress_test, and rseq_test - Added SBI v3.0 PMU enhancements in KVM and perf driver
2025-09-30Merge tag 'kvmarm-6.18' of ↵Paolo Bonzini36-490/+1165
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.18 - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature.