| Age | Commit message (Collapse) | Author | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping updates from Marek Szyprowski:
- added support for batched cache sync, what improves performance of
dma_map/unmap_sg() operations on ARM64 architecture (Barry Song)
- introduced DMA_ATTR_CC_SHARED attribute for explicitly shared memory
used in confidential computing (Jiri Pirko)
- refactored spaghetti-like code in drivers/of/of_reserved_mem.c and
its clients (Marek Szyprowski, shared branch with device-tree updates
to avoid merge conflicts)
- prepared Contiguous Memory Allocator related code for making dma-buf
drivers modularized (Maxime Ripard)
- added support for benchmarking dma_map_sg() calls to tools/dma
utility (Qinxin Xia)
* tag 'dma-mapping-7.1-2026-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: (24 commits)
dma-buf: heaps: system: document system_cc_shared heap
dma-buf: heaps: system: add system_cc_shared heap for explicitly shared memory
dma-mapping: introduce DMA_ATTR_CC_SHARED for shared memory
mm: cma: Export cma_alloc(), cma_release() and cma_get_name()
dma: contiguous: Export dev_get_cma_area()
dma: contiguous: Make dma_contiguous_default_area static
dma: contiguous: Make dev_get_cma_area() a proper function
dma: contiguous: Turn heap registration logic around
of: reserved_mem: rework fdt_init_reserved_mem_node()
of: reserved_mem: clarify fdt_scan_reserved_mem*() functions
of: reserved_mem: rearrange code a bit
of: reserved_mem: replace CMA quirks by generic methods
of: reserved_mem: switch to ops based OF_DECLARE()
of: reserved_mem: use -ENODEV instead of -ENOENT
of: reserved_mem: remove fdt node from the structure
dma-mapping: fix false kernel-doc comment marker
dma-mapping: Support batch mode for dma_direct_{map,unmap}_sg
dma-mapping: Separate DMA sync issuing and completion waiting
arm64: Provide dcache_inval_poc_nosync helper
arm64: Provide dcache_clean_poc_nosync helper
...
|
|
Pull kvm updates from Paolo Bonzini:
"Arm:
- Add support for tracing in the standalone EL2 hypervisor code,
which should help both debugging and performance analysis. This
uses the new infrastructure for 'remote' trace buffers that can be
exposed by non-kernel entities such as firmware, and which came
through the tracing tree
- Add support for GICv5 Per Processor Interrupts (PPIs), as the
starting point for supporting the new GIC architecture in KVM
- Finally add support for pKVM protected guests, where pages are
unmapped from the host as they are faulted into the guest and can
be shared back from the guest using pKVM hypercalls. Protected
guests are created using a new machine type identifier. As the
elusive guestmem has not yet delivered on its promises, anonymous
memory is also supported
This is only a first step towards full isolation from the host; for
example, the CPU register state and DMA accesses are not yet
isolated. Because this does not really yet bring fully what it
promises, it is hidden behind CONFIG_ARM_PKVM_GUEST +
'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is
created. Caveat emptor
- Rework the dreaded user_mem_abort() function to make it more
maintainable, reducing the amount of state being exposed to the
various helpers and rendering a substantial amount of state
immutable
- Expand the Stage-2 page table dumper to support NV shadow page
tables on a per-VM basis
- Tidy up the pKVM PSCI proxy code to be slightly less hard to
follow
- Fix both SPE and TRBE in non-VHE configurations so that they do not
generate spurious, out of context table walks that ultimately lead
to very bad HW lockups
- A small set of patches fixing the Stage-2 MMU freeing in error
cases
- Tighten-up accepted SMC immediate value to be only #0 for host
SMCCC calls
- The usual cleanups and other selftest churn
LoongArch:
- Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel()
- Add DMSINTC irqchip in kernel support
RISC-V:
- Fix steal time shared memory alignment checks
- Fix vector context allocation leak
- Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi()
- Fix double-free of sdata in kvm_pmu_clear_snapshot_area()
- Fix integer overflow in kvm_pmu_validate_counter_mask()
- Fix shift-out-of-bounds in make_xfence_request()
- Fix lost write protection on huge pages during dirty logging
- Split huge pages during fault handling for dirty logging
- Skip CSR restore if VCPU is reloaded on the same core
- Implement kvm_arch_has_default_irqchip() for KVM selftests
- Factored-out ISA checks into separate sources
- Added hideleg to struct kvm_vcpu_config
- Factored-out VCPU config into separate sources
- Support configuration of per-VM HGATP mode from KVM user space
s390:
- Support for ESA (31-bit) guests inside nested hypervisors
- Remove restriction on memslot alignment, which is not needed
anymore with the new gmap code
- Fix LPSW/E to update the bear (which of course is the breaking
event address register)
x86:
- Shut up various UBSAN warnings on reading module parameter before
they were initialized
- Don't zero-allocate page tables that are used for splitting
hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in
the page table and thus write all bytes
- As an optimization, bail early when trying to unsync 4KiB mappings
if the target gfn can just be mapped with a 2MiB hugepage
x86 generic:
- Copy single-chunk MMIO write values into struct kvm_vcpu (more
precisely struct kvm_mmio_fragment) to fix use-after-free stack
bugs where KVM would dereference stack pointer after an exit to
userspace
- Clean up and comment the emulated MMIO code to try to make it
easier to maintain (not necessarily "easy", but "easier")
- Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not *all* of
VMX and SVM enabling) as it is needed for trusted I/O
- Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions
- Immediately fail the build if a required #define is missing in one
of KVM's headers that is included multiple times
- Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected
exception, mostly to prevent syzkaller from abusing the uAPI to
trigger WARNs, but also because it can help prevent userspace from
unintentionally crashing the VM
- Exempt SMM from CPUID faulting on Intel, as per the spec
- Misc hardening and cleanup changes
x86 (AMD):
- Fix and optimize IRQ window inhibit handling for AVIC; make it
per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple
vCPUs have to-be-injected IRQs
- Clean up and optimize the OSVW handling, avoiding a bug in which
KVM would overwrite state when enabling virtualization on multiple
CPUs in parallel. This should not be a problem because OSVW should
usually be the same for all CPUs
- Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains
about a "too large" size based purely on user input
- Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION
- Disallow synchronizing a VMSA of an already-launched/encrypted
vCPU, as doing so for an SNP guest will crash the host due to an
RMP violation page fault
- Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped
queries are required to hold kvm->lock, and enforce it by lockdep.
Fix various bugs where sev_guest() was not ensured to be stable for
the whole duration of a function or ioctl
- Convert a pile of kvm->lock SEV code to guard()
- Play nicer with userspace that does not enable
KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6
as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the
payload would end up in EXITINFO2 rather than CR2, for example).
Only set CR2 and DR6 when consumption of the payload is imminent,
but on the other hand force delivery of the payload in all paths
where userspace retrieves CR2 or DR6
- Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT
instead of vmcb02->save.cr2. The value is out of sync after a
save/restore or after a #PF is injected into L2
- Fix a class of nSVM bugs where some fields written by the CPU are
not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so
are not up-to-date when saved by KVM_GET_NESTED_STATE
- Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE
and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly
initialized after save+restore
- Add a variety of missing nSVM consistency checks
- Fix several bugs where KVM failed to correctly update VMCB fields
on nested #VMEXIT
- Fix several bugs where KVM failed to correctly synthesize #UD or
#GP for SVM-related instructions
- Add support for save+restore of virtualized LBRs (on SVM)
- Refactor various helpers and macros to improve clarity and
(hopefully) make the code easier to maintain
- Aggressively sanitize fields when copying from vmcb12, to guard
against unintentionally allowing L1 to utilize yet-to-be-defined
features
- Fix several bugs where KVM botched rAX legality checks when
emulating SVM instructions. There are remaining issues in that KVM
doesn't handle size prefix overrides for 64-bit guests
- Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails
instead of somewhat arbitrarily synthesizing #GP (i.e. don't double
down on AMD's architectural but sketchy behavior of generating #GP
for "unsupported" addresses)
- Cache all used vmcb12 fields to further harden against TOCTOU bugs
x86 (Intel):
- Drop obsolete branch hint prefixes from the VMX instruction macros
- Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a
register input when appropriate
- Code cleanups
guest_memfd:
- Don't mark guest_memfd folios as accessed, as guest_memfd doesn't
support reclaim, the memory is unevictable, and there is no storage
to write back to
LoongArch selftests:
- Add KVM PMU test cases
s390 selftests:
- Enable more memory selftests
x86 selftests:
- Add support for Hygon CPUs in KVM selftests
- Fix a bug in the MSR test where it would get false failures on
AMD/Hygon CPUs with exactly one of RDPID or RDTSCP
- Add an MADV_COLLAPSE testcase for guest_memfd as a regression test
for a bug where the kernel would attempt to collapse guest_memfd
folios against KVM's will"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (373 commits)
KVM: x86: use inlines instead of macros for is_sev_*guest
x86/virt: Treat SVM as unsupported when running as an SEV+ guest
KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails
KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper
KVM: SEV: use mutex guard in snp_handle_guest_req()
KVM: SEV: use mutex guard in sev_mem_enc_unregister_region()
KVM: SEV: use mutex guard in sev_mem_enc_ioctl()
KVM: SEV: use mutex guard in snp_launch_update()
KVM: SEV: Assert that kvm->lock is held when querying SEV+ support
KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe"
KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y
KVM: SEV: WARN on unhandled VM type when initializing VM
KVM: LoongArch: selftests: Add PMU overflow interrupt test
KVM: LoongArch: selftests: Add basic PMU event counting test
KVM: LoongArch: selftests: Add cpucfg read/write helpers
LoongArch: KVM: Add DMSINTC inject msi to vCPU
LoongArch: KVM: Add DMSINTC device support
LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function
LoongArch: KVM: Move host CSR_GSTAT save and restore in context switch
LoongArch: KVM: Move host CSR_EENTRY save and restore in context switch
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "pid: make sub-init creation retryable" (Oleg Nesterov)
Make creation of init in a new namespace more robust by clearing away
some historical cruft which is no longer needed. Also some
documentation fixups
- "selftests/fchmodat2: Error handling and general" (Mark Brown)
Fix and a cleanup for the fchmodat2() syscall selftest
- "lib: polynomial: Move to math/ and clean up" (Andy Shevchenko)
- "hung_task: Provide runtime reset interface for hung task detector"
(Aaron Tomlin)
Give administrators the ability to zero out
/proc/sys/kernel/hung_task_detect_count
- "tools/getdelays: use the static UAPI headers from
tools/include/uapi" (Thomas Weißschuh)
Teach getdelays to use the in-kernel UAPI headers rather than the
system-provided ones
- "watchdog/hardlockup: Improvements to hardlockup" (Mayank Rungta)
Several cleanups and fixups to the hardlockup detector code and its
documentation
- "lib/bch: fix undefined behavior from signed left-shifts" (Josh Law)
A couple of small/theoretical fixes in the bch code
- "ocfs2/dlm: fix two bugs in dlm_match_regions()" (Junrui Luo)
- "cleanup the RAID5 XOR library" (Christoph Hellwig)
A quite far-reaching cleanup to this code. I can't do better than to
quote Christoph:
"The XOR library used for the RAID5 parity is a bit of a mess right
now. The main file sits in crypto/ despite not being cryptography
and not using the crypto API, with the generic implementations
sitting in include/asm-generic and the arch implementations
sitting in an asm/ header in theory. The latter doesn't work for
many cases, so architectures often build the code directly into
the core kernel, or create another module for the architecture
code.
Change this to a single module in lib/ that also contains the
architecture optimizations, similar to the library work Eric
Biggers has done for the CRC and crypto libraries later. After
that it changes to better calling conventions that allow for
smarter architecture implementations (although none is contained
here yet), and uses static_call to avoid indirection function call
overhead"
- "lib/list_sort: Clean up list_sort() scheduling workarounds"
(Kuan-Wei Chiu)
Clean up this library code by removing a hacky thing which was added
for UBIFS, which UBIFS doesn't actually need
- "Fix bugs in extract_iter_to_sg()" (Christian Ehrhardt)
Fix a few bugs in the scatterlist code, add in-kernel tests for the
now-fixed bugs and fix a leak in the test itself
- "kdump: Enable LUKS-encrypted dump target support in ARM64 and
PowerPC" (Coiby Xu)
Enable support of the LUKS-encrypted device dump target on arm64 and
powerpc
- "ocfs2: consolidate extent list validation into block read callbacks"
(Joseph Qi)
Cleanup, simplify, and make more robust ocfs2's validation of extent
list fields (Kernel test robot loves mounting corrupted fs images!)
* tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (127 commits)
ocfs2: validate group add input before caching
ocfs2: validate bg_bits during freefrag scan
ocfs2: fix listxattr handling when the buffer is full
doc: watchdog: fix typos etc
update Sean's email address
ocfs2: use get_random_u32() where appropriate
ocfs2: split transactions in dio completion to avoid credit exhaustion
ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path()
ocfs2: validate extent block list fields during block read
ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec()
ocfs2: validate dx_root extent list fields during block read
ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY
ocfs2: handle invalid dinode in ocfs2_group_extend
.get_maintainer.ignore: add Askar
ocfs2: validate bg_list extent bounds in discontig groups
checkpatch: exclude forward declarations of const structs
tools/accounting: handle truncated taskstats netlink messages
taskstats: set version in TGID exit notifications
ocfs2/heartbeat: fix slot mapping rollback leaks on error paths
arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "maple_tree: Replace big node with maple copy" (Liam Howlett)
Mainly prepararatory work for ongoing development but it does reduce
stack usage and is an improvement.
- "mm, swap: swap table phase III: remove swap_map" (Kairui Song)
Offers memory savings by removing the static swap_map. It also yields
some CPU savings and implements several cleanups.
- "mm: memfd_luo: preserve file seals" (Pratyush Yadav)
File seal preservation to LUO's memfd code
- "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan
Chen)
Additional userspace stats reportng to zswap
- "arch, mm: consolidate empty_zero_page" (Mike Rapoport)
Some cleanups for our handling of ZERO_PAGE() and zero_pfn
- "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu
Han)
A robustness improvement and some cleanups in the kmemleak code
- "Improve khugepaged scan logic" (Vernon Yang)
Improve khugepaged scan logic and reduce CPU consumption by
prioritizing scanning tasks that access memory frequently
- "Make KHO Stateless" (Jason Miu)
Simplify Kexec Handover by transitioning KHO from an xarray-based
metadata tracking system with serialization to a radix tree data
structure that can be passed directly to the next kernel
- "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas
Ballasi and Steven Rostedt)
Enhance vmscan's tracepointing
- "mm: arch/shstk: Common shadow stack mapping helper and
VM_NOHUGEPAGE" (Catalin Marinas)
Cleanup for the shadow stack code: remove per-arch code in favour of
a generic implementation
- "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin)
Fix a WARN() which can be emitted the KHO restores a vmalloc area
- "mm: Remove stray references to pagevec" (Tal Zussman)
Several cleanups, mainly udpating references to "struct pagevec",
which became folio_batch three years ago
- "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl
Shutsemau)
Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail
pages encode their relationship to the head page
- "mm/damon/core: improve DAMOS quota efficiency for core layer
filters" (SeongJae Park)
Improve two problematic behaviors of DAMOS that makes it less
efficient when core layer filters are used
- "mm/damon: strictly respect min_nr_regions" (SeongJae Park)
Improve DAMON usability by extending the treatment of the
min_nr_regions user-settable parameter
- "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka)
The proper fix for a previously hotfixed SMP=n issue. Code
simplifications and cleanups ensued
- "mm: cleanups around unmapping / zapping" (David Hildenbrand)
A bunch of cleanups around unmapping and zapping. Mostly
simplifications, code movements, documentation and renaming of
zapping functions
- "support batched checking of the young flag for MGLRU" (Baolin Wang)
Batched checking of the young flag for MGLRU. It's part cleanups; one
benchmark shows large performance benefits for arm64
- "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner)
memcg cleanup and robustness improvements
- "Allow order zero pages in page reporting" (Yuvraj Sakshith)
Enhance free page reporting - it is presently and undesirably order-0
pages when reporting free memory.
- "mm: vma flag tweaks" (Lorenzo Stoakes)
Cleanup work following from the recent conversion of the VMA flags to
a bitmap
- "mm/damon: add optional debugging-purpose sanity checks" (SeongJae
Park)
Add some more developer-facing debug checks into DAMON core
- "mm/damon: test and document power-of-2 min_region_sz requirement"
(SeongJae Park)
An additional DAMON kunit test and makes some adjustments to the
addr_unit parameter handling
- "mm/damon/core: make passed_sample_intervals comparisons
overflow-safe" (SeongJae Park)
Fix a hard-to-hit time overflow issue in DAMON core
- "mm/damon: improve/fixup/update ratio calculation, test and
documentation" (SeongJae Park)
A batch of misc/minor improvements and fixups for DAMON
- "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David
Hildenbrand)
Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code
movement was required.
- "zram: recompression cleanups and tweaks" (Sergey Senozhatsky)
A somewhat random mix of fixups, recompression cleanups and
improvements in the zram code
- "mm/damon: support multiple goal-based quota tuning algorithms"
(SeongJae Park)
Extend DAMOS quotas goal auto-tuning to support multiple tuning
algorithms that users can select
- "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao)
Fix the khugpaged sysfs handling so we no longer spam the logs with
reams of junk when starting/stopping khugepaged
- "mm: improve map count checks" (Lorenzo Stoakes)
Provide some cleanups and slight fixes in the mremap, mmap and vma
code
- "mm/damon: support addr_unit on default monitoring targets for
modules" (SeongJae Park)
Extend the use of DAMON core's addr_unit tunable
- "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache)
Cleanups to khugepaged and is a base for Nico's planned khugepaged
mTHP support
- "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand)
Code movement and cleanups in the memhotplug and sparsemem code
- "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup
CONFIG_MIGRATION" (David Hildenbrand)
Rationalize some memhotplug Kconfig support
- "change young flag check functions to return bool" (Baolin Wang)
Cleanups to change all young flag check functions to return bool
- "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh
Law and SeongJae Park)
Fix a few potential DAMON bugs
- "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo
Stoakes)
Convert a lot of the existing use of the legacy vm_flags_t data type
to the new vma_flags_t type which replaces it. Mainly in the vma
code.
- "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes)
Expand the mmap_prepare functionality, which is intended to replace
the deprecated f_op->mmap hook which has been the source of bugs and
security issues for some time. Cleanups, documentation, extension of
mmap_prepare into filesystem drivers
- "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes)
Simplify and clean up zap_huge_pmd(). Additional cleanups around
vm_normal_folio_pmd() and the softleaf functionality are performed.
* tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
mm: fix deferred split queue races during migration
mm/khugepaged: fix issue with tracking lock
mm/huge_memory: add and use has_deposited_pgtable()
mm/huge_memory: add and use normal_or_softleaf_folio_pmd()
mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio()
mm/huge_memory: separate out the folio part of zap_huge_pmd()
mm/huge_memory: use mm instead of tlb->mm
mm/huge_memory: remove unnecessary sanity checks
mm/huge_memory: deduplicate zap deposited table call
mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
mm/huge_memory: add a common exit path to zap_huge_pmd()
mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
mm/huge: avoid big else branch in zap_huge_pmd()
mm/huge_memory: simplify vma_is_specal_huge()
mm: on remap assert that input range within the proposed VMA
mm: add mmap_action_map_kernel_pages[_full]()
uio: replace deprecated mmap hook with mmap_prepare in uio_info
drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare
mm: allow handling of stacked mmap_prepare hooks in more drivers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"The biggest changes are MPAM enablement in drivers/resctrl and new PMU
support under drivers/perf.
On the core side, FEAT_LSUI lets futex atomic operations with EL0
permissions, avoiding PAN toggling.
The rest is mostly TLB invalidation refactoring, further generic entry
work, sysreg updates and a few fixes.
Core features:
- Add support for FEAT_LSUI, allowing futex atomic operations without
toggling Privileged Access Never (PAN)
- Further refactor the arm64 exception handling code towards the
generic entry infrastructure
- Optimise __READ_ONCE() with CONFIG_LTO=y and allow alias analysis
through it
Memory management:
- Refactor the arm64 TLB invalidation API and implementation for
better control over barrier placement and level-hinted invalidation
- Enable batched TLB flushes during memory hot-unplug
- Fix rodata=full block mapping support for realm guests (when
BBML2_NOABORT is available)
Perf and PMU:
- Add support for a whole bunch of system PMUs featured in NVIDIA's
Tegra410 SoC (cspmu extensions for the fabric and PCIe, new drivers
for CPU/C2C memory latency PMUs)
- Clean up iomem resource handling in the Arm CMN driver
- Fix signedness handling of AA64DFR0.{PMUVer,PerfMon}
MPAM (Memory Partitioning And Monitoring):
- Add architecture context-switch and hiding of the feature from KVM
- Add interface to allow MPAM to be exposed to user-space using
resctrl
- Add errata workaround for some existing platforms
- Add documentation for using MPAM and what shape of platforms can
use resctrl
Miscellaneous:
- Check DAIF (and PMR, where relevant) at task-switch time
- Skip TFSR_EL1 checks and barriers in synchronous MTE tag check mode
(only relevant to asynchronous or asymmetric tag check modes)
- Remove a duplicate allocation in the kexec code
- Remove redundant save/restore of SCS SP on entry to/from EL0
- Generate the KERNEL_HWCAP_ definitions from the arm64 hwcap
descriptions
- Add kselftest coverage for cmpbr_sigill()
- Update sysreg definitions"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (109 commits)
arm64: rsi: use linear-map alias for realm config buffer
arm64: Kconfig: fix duplicate word in CMDLINE help text
arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode
arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12
arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps
arm64: kexec: Remove duplicate allocation for trans_pgd
ACPI: AGDI: fix missing newline in error message
arm64: Check DAIF (and PMR) at task-switch time
arm64: entry: Use split preemption logic
arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode()
arm64: entry: Consistently prefix arm64-specific wrappers
arm64: entry: Don't preempt with SError or Debug masked
entry: Split preemption from irqentry_exit_to_kernel_mode()
entry: Split kernel mode logic from irqentry_{enter,exit}()
entry: Move irqentry_enter() prototype later
entry: Remove local_irq_{enable,disable}_exit_to_user()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"Mutexes:
- Add killable flavor to guard definitions (Davidlohr Bueso)
- Remove the list_head from struct mutex (Matthew Wilcox)
- Rename mutex_init_lockep() (Davidlohr Bueso)
rwsems:
- Remove the list_head from struct rw_semaphore and
replace it with a single pointer (Matthew Wilcox)
- Fix logic error in rwsem_del_waiter() (Andrei Vagin)
Semaphores:
- Remove the list_head from struct semaphore (Matthew Wilcox)
Jump labels:
- Use ATOMIC_INIT() for initialization of .enabled (Thomas Weißschuh)
- Remove workaround for old compilers in initializations
(Thomas Weißschuh)
Lock context analysis changes and improvements:
- Add context analysis for rwsems (Peter Zijlstra)
- Fix rwlock and spinlock lock context annotations (Bart Van Assche)
- Fix rwlock support in <linux/spinlock_up.h> (Bart Van Assche)
- Add lock context annotations in the spinlock implementation
(Bart Van Assche)
- signal: Fix the lock_task_sighand() annotation (Bart Van Assche)
- ww-mutex: Fix the ww_acquire_ctx function annotations
(Bart Van Assche)
- Add lock context support in do_raw_{read,write}_trylock()
(Bart Van Assche)
- arm64, compiler-context-analysis: Permit alias analysis through
__READ_ONCE() with CONFIG_LTO=y (Marco Elver)
- Add __cond_releases() (Peter Zijlstra)
- Add context analysis for mutexes (Peter Zijlstra)
- Add context analysis for rtmutexes (Peter Zijlstra)
- Convert futexes to compiler context analysis (Peter Zijlstra)
Rust integration updates:
- Add atomic fetch_sub() implementation (Andreas Hindborg)
- Refactor various rust_helper_ methods for expansion (Boqun Feng)
- Add Atomic<*{mut,const} T> support (Boqun Feng)
- Add atomic operation helpers over raw pointers (Boqun Feng)
- Add performance-optimal Flag type for atomic booleans, to avoid
slow byte-sized RMWs on architectures that don't support them.
(FUJITA Tomonori)
- Misc cleanups and fixes (Andreas Hindborg, Boqun Feng, FUJITA
Tomonori)
LTO support updates:
- arm64: Optimize __READ_ONCE() with CONFIG_LTO=y (Marco Elver)
- compiler: Simplify generic RELOC_HIDE() (Marco Elver)
Miscellaneous fixes and cleanups by Peter Zijlstra, Randy Dunlap,
Thomas Weißschuh, Davidlohr Bueso and Mikhail Gavrilov"
* tag 'locking-core-2026-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
compiler: Simplify generic RELOC_HIDE()
locking: Add lock context annotations in the spinlock implementation
locking: Add lock context support in do_raw_{read,write}_trylock()
locking: Fix rwlock support in <linux/spinlock_up.h>
lockdep: Raise default stack trace limits when KASAN is enabled
cleanup: Optimize guards
jump_label: remove workaround for old compilers in initializations
jump_label: use ATOMIC_INIT() for initialization of .enabled
futex: Convert to compiler context analysis
locking/rwsem: Fix logic error in rwsem_del_waiter()
locking/rwsem: Add context analysis
locking/rtmutex: Add context analysis
locking/mutex: Add context analysis
compiler-context-analysys: Add __cond_releases()
locking/mutex: Remove the list_head from struct mutex
locking/semaphore: Remove the list_head from struct semaphore
locking/rwsem: Remove the list_head from struct rw_semaphore
rust: atomic: Update a safety comment in impl of `fetch_add()`
rust: sync: atomic: Update documentation for `fetch_add()`
rust: sync: atomic: Add fetch_sub()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull vdso updates from Thomas Gleixner:
- Make the handling of compat functions consistent and more robust
- Rework the underlying data store so that it is dynamically allocated,
which allows the conversion of the last holdout SPARC64 to the
generic VDSO implementation
- Rework the SPARC64 VDSO to utilize the generic implementation
- Mop up the left overs of the non-generic VDSO support in the core
code
- Expand the VDSO selftest and make them more robust
- Allow time namespaces to be enabled independently of the generic VDSO
support, which was not possible before due to SPARC64 not using it
- Various cleanups and improvements in the related code
* tag 'timers-vdso-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
timens: Use task_lock guard in timens_get*()
timens: Use mutex guard in proc_timens_set_offset()
timens: Simplify some calls to put_time_ns()
timens: Add a __free() wrapper for put_time_ns()
timens: Remove dependency on the vDSO
vdso/timens: Move functions to new file
selftests: vDSO: vdso_test_correctness: Add a test for time()
selftests: vDSO: vdso_test_correctness: Use facilities from parse_vdso.c
selftests: vDSO: vdso_test_correctness: Handle different tv_usec types
selftests: vDSO: vdso_test_correctness: Drop SYS_getcpu fallbacks
selftests: vDSO: vdso_test_gettimeofday: Remove nolibc checks
Revert "selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers"
random: vDSO: Remove ifdeffery
random: vDSO: Trim vDSO includes
vdso/datapage: Trim down unnecessary includes
vdso/datapage: Remove inclusion of gettimeofday.h
vdso/helpers: Explicitly include vdso/processor.h
vdso/gettimeofday: Add explicit includes
random: vDSO: Add explicit includes
MIPS: vdso: Explicitly include asm/vdso/vdso.h
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI support updates from Rafael Wysocki:
"These include an update of the CMOS RTC driver and the related ACPI
and x86 code that, among other things, switches it over to using the
platform device interface for device binding on x86 instead of the PNP
device driver interface (which allows the code in question to be
simplified quite a bit), a major update of the ACPI Time and Alarm
Device (TAD) driver adding an RTC class device interface to it, and
updates of core ACPI drivers that remove some unnecessary and not
really useful code from them.
Apart from that, two drivers are converted to using the platform
driver interface for device binding instead of the ACPI driver one,
which is slated for removal, support for the Performance Limited
register is added to the ACPI CPPC library and there are some
janitorial updates of it and the related cpufreq CPPC driver, the ACPI
processor driver is fixed and cleaned up, and NVIDIA vendor CPER
record handler is added to the APEI GHES code.
Also, the interface for obtaining a CPU UID from ACPI is consolidated
across architectures and used for fixing a problem with the PCI TPH
Steering Tag on ARM64, there are two updates related to ACPICA, a
minor ACPI OS Services Layer (OSL) update, and a few assorted updates
related to ACPI tables parsing.
Specifics:
- Update maintainers information regarding ACPICA (Rafael Wysocki)
- Replace strncpy() with strscpy_pad() in acpi_ut_safe_strncpy()
(Kees Cook)
- Trigger an ordered system power off after encountering a fatal
error operator in AML (Armin Wolf)
- Enable ACPI FPDT parsing on LoongArch (Xi Ruoyao)
- Remove the temporary stop-gap acpi_pptt_cache_v1_full structure
from the ACPI PPTT parser (Ben Horgan)
- Add support for exposing ACPI FPDT subtables FBPT and S3PT (Nate
DeSimone)
- Address multiple assorted issues and clean up the code in the ACPI
processor idle driver (Huisong Li)
- Replace strlcat() in the ACPI processor idle drive with a better
alternative (Andy Shevchenko)
- Rearrange and clean up acpi_processor_errata_piix4() (Rafael
Wysocki)
- Move reference performance to capabilities and fix an uninitialized
variable in the ACPI CPPC library (Pengjie Zhang)
- Add support for the Performance Limited Register to the ACPI CPPC
library (Sumit Gupta)
- Add cppc_get_perf() API to read performance controls, extend
cppc_set_epp_perf() for FFH/SystemMemory, and make the ACPI CPPC
library warn on missing mandatory DESIRED_PERF register (Sumit
Gupta)
- Modify the cpufreq CPPC driver to update MIN_PERF/MAX_PERF in
target callbacks to allow it to control performance bounds via
standard scaling_min_freq and scaling_max_freq sysfs attributes and
add sysfs documentation for the Performance Limited Register to it
(Sumit Gupta)
- Add ACPI support to the platform device interface in the CMOS RTC
driver, make the ACPI core device enumeration code create a
platform device for the CMOS RTC, and drop CMOS RTC PNP device
support (Rafael Wysocki)
- Consolidate the x86-specific CMOS RTC handling with the ACPI TAD
driver and clean up the CMOS RTC ACPI address space handler (Rafael
Wysocki)
- Enable ACPI alarm in the CMOS RTC driver if advertised in ACPI FADT
and allow that driver to work without a dedicated IRQ if the ACPI
alarm is used (Rafael Wysocki)
- Clean up the ACPI TAD driver in various ways and add an RTC class
device interface, including both the RTC setting/reading and alarm
timer support, to it (Rafael Wysocki)
- Clean up the ACPI AC and ACPI PAD (processor aggregator device)
drivers (Rafael Wysocki)
- Rework checking for duplicate video bus devices and consolidate
pnp.bus_id workarounds handling in the ACPI video bus driver
(Rafael Wysocki)
- Update the ACPI core device drivers to stop setting
acpi_device_name() unnecessarily (Rafael Wysocki)
- Rearrange code using acpi_device_class() in the ACPI core device
drivers and update them to stop setting acpi_device_class()
unnecessarily (Rafael Wysocki)
- Define ACPI_AC_CLASS in one place (Rafael Wysocki)
- Convert the ni903x_wdt watchdog driver and the xen ACPI PAD driver
to bind to platform devices instead of ACPI devices (Rafael
Wysocki)
- Add devm_ghes_register_vendor_record_notifier(), use it in the PCI
hisi driver, and Add NVIDIA vendor CPER record handler (Kai-Heng
Feng)
- Consolidate the interface for obtaining a CPU UID from ACPI across
architectures and use it to address incorrect PCI TPH Steering Tag
on ARM64 resulting from the invalid assumption that the ACPI
Processor UID would always be the same as the corresponding logical
CPU ID in Linux (Chengwen Feng)"
* tag 'acpi-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (73 commits)
ACPICA: Update maintainers information
watchdog: ni903x_wdt: Convert to a platform driver
ACPI: PAD: xen: Convert to a platform driver
ACPI: processor: idle: Reset cpuidle on C-state list changes
cpuidle: Extract and export no-lock variants of cpuidle_unregister_device()
PCI/TPH: Pass ACPI Processor UID to Cache Locality _DSM
ACPI: PPTT: Use acpi_get_cpu_uid() and remove get_acpi_id_for_cpu()
perf: arm_cspmu: Switch to acpi_get_cpu_uid() from get_acpi_id_for_cpu()
ACPI: Centralize acpi_get_cpu_uid() declaration in include/linux/acpi.h
x86/acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
RISC-V: ACPI: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
LoongArch: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
arm64: acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler
PCI: hisi: Use devm_ghes_register_vendor_record_notifier()
ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier()
ACPI: tables: Enable FPDT on LoongArch
ACPI: processor: idle: Fix NULL pointer dereference in hotplug path
ACPI: processor: idle: Reset power_setup_done flag on initialization failure
ACPI: TAD: Add alarm support to the RTC class device interface
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers:
- Migrate more hash algorithms from the traditional crypto subsystem to
lib/crypto/
Like the algorithms migrated earlier (e.g. SHA-*), this simplifies
the implementations, improves performance, enables further
simplifications in calling code, and solves various other issues:
- AES CBC-based MACs (AES-CMAC, AES-XCBC-MAC, and AES-CBC-MAC)
- Support these algorithms in lib/crypto/ using the AES library
and the existing arm64 assembly code
- Reimplement the traditional crypto API's "cmac(aes)",
"xcbc(aes)", and "cbcmac(aes)" on top of the library
- Convert mac80211 to use the AES-CMAC library. Note: several
other subsystems can use it too and will be converted later
- Drop the broken, nonstandard, and likely unused support for
"xcbc(aes)" with key lengths other than 128 bits
- Enable optimizations by default
- GHASH
- Migrate the standalone GHASH code into lib/crypto/
- Integrate the GHASH code more closely with the very similar
POLYVAL code, and improve the generic GHASH implementation to
resist cache-timing attacks and use much less memory
- Reimplement the AES-GCM library and the "gcm" crypto_aead
template on top of the GHASH library. Remove "ghash" from the
crypto_shash API, as it's no longer needed
- Enable optimizations by default
- SM3
- Migrate the kernel's existing SM3 code into lib/crypto/, and
reimplement the traditional crypto API's "sm3" on top of it
- I don't recommend using SM3, but this cleanup is worthwhile
to organize the code the same way as other algorithms
- Testing improvements:
- Add a KUnit test suite for each of the new library APIs
- Migrate the existing ChaCha20Poly1305 test to KUnit
- Make the KUnit all_tests.config enable all crypto library tests
- Move the test kconfig options to the Runtime Testing menu
- Other updates to arch-optimized crypto code:
- Optimize SHA-256 for Zhaoxin CPUs using the Padlock Hash Engine
- Remove some MD5 implementations that are no longer worth keeping
- Drop big endian and voluntary preemption support from the arm64
code, as those configurations are no longer supported on arm64
- Make jitterentropy and samples/tsm-mr use the crypto library APIs
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (66 commits)
lib/crypto: arm64: Assume a little-endian kernel
arm64: fpsimd: Remove obsolete cond_yield macro
lib/crypto: arm64/sha3: Remove obsolete chunking logic
lib/crypto: arm64/sha512: Remove obsolete chunking logic
lib/crypto: arm64/sha256: Remove obsolete chunking logic
lib/crypto: arm64/sha1: Remove obsolete chunking logic
lib/crypto: arm64/poly1305: Remove obsolete chunking logic
lib/crypto: arm64/gf128hash: Remove obsolete chunking logic
lib/crypto: arm64/chacha: Remove obsolete chunking logic
lib/crypto: arm64/aes: Remove obsolete chunking logic
lib/crypto: Include <crypto/utils.h> instead of <crypto/algapi.h>
lib/crypto: aesgcm: Don't disable IRQs during AES block encryption
lib/crypto: aescfb: Don't disable IRQs during AES block encryption
lib/crypto: tests: Migrate ChaCha20Poly1305 self-test to KUnit
lib/crypto: sparc: Drop optimized MD5 code
lib/crypto: mips: Drop optimized MD5 code
lib: Move crypto library tests to Runtime Testing menu
crypto: sm3 - Remove 'struct sm3_state'
crypto: sm3 - Remove the original "sm3_block_generic()"
crypto: sm3 - Remove sm3_base.h
...
|
|
This series cleans up some of the special user copy functions naming and
semantics. In particular, get rid of the (very traditional) double
underscore names and behavior: the whole "optimize away the range check"
model has been largely excised from the other user accessors because
it's so subtle and can be unsafe, but also because it's just not a
relevant optimization any more.
To do that, a couple of drivers that misused the "user" copies as kernel
copies in order to get non-temporal stores had to be fixed up, but that
kind of code should never have been allowed anyway.
The x86-only "nocache" version was also renamed to more accurately
reflect what it actually does.
This was all done because I looked at this code due to a report by Jann
Horn, and I just couldn't stand the inconsistent naming, the horrible
semantics, and the random misuse of these functions. This code should
probably be cleaned up further, but it's at least slightly closer to
normal semantics.
I had a more intrusive series that went even further in trying to
normalize the semantics, but that ended up hitting so many other
inconsistencies between different architectures in this area (eg
'size_t' vs 'unsigned long' vs 'int' as size arguments, and various
iovec check differences that Vasily Gorbik pointed out) that I ended up
with this more limited version that fixed the worst of the issues.
Reported-by: Jann Horn <jannh@google.com>
Tested-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/all/CAHk-=wgg1QVWNWG-UCFo1hx0zqrPnB3qhPzUTrWNft+MtXQXig@mail.gmail.com/
* nocache-cleanup:
x86-64/arm64/powerpc: clean up and rename __copy_from_user_flushcache
x86: rename and clean up __copy_from_user_inatomic_nocache()
x86-64: rename misleadingly named '__copy_user_nocache()' function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 7.1
* New features:
- Add support for tracing in the standalone EL2 hypervisor code,
which should help both debugging and performance analysis.
This comes with a full infrastructure for 'remote' trace buffers
that can be exposed by non-kernel entities such as firmware.
- Add support for GICv5 Per Processor Interrupts (PPIs), as the
starting point for supporting the new GIC architecture in KVM.
- Finally add support for pKVM protected guests, with anonymous
memory being used as a backing store. About time!
* Improvements and bug fixes:
- Rework the dreaded user_mem_abort() function to make it more
maintainable, reducing the amount of state being exposed to
the various helpers and rendering a substantial amount of
state immutable.
- Expand the Stage-2 page table dumper to support NV shadow
page tables on a per-VM basis.
- Tidy up the pKVM PSCI proxy code to be slightly less hard
to follow.
- Fix both SPE and TRBE in non-VHE configurations so that they
do not generate spurious, out of context table walks that
ultimately lead to very bad HW lockups.
- A small set of patches fixing the Stage-2 MMU freeing in error
cases.
- Tighten-up accepted SMC immediate value to be only #0 for host
SMCCC calls.
- The usual cleanups and other selftest churn.
|
|
'for-next/ttbr-macros-cleanup', 'for-next/kselftest', 'for-next/feat_lsui', 'for-next/mpam', 'for-next/hotplug-batched-tlbi', 'for-next/bbml2-fixes', 'for-next/sysreg', 'for-next/generic-entry' and 'for-next/acpi', remote-tracking branches 'arm64/for-next/perf' and 'arm64/for-next/read-once' into for-next/core
* arm64/for-next/perf:
: Perf updates
perf/arm-cmn: Fix resource_size_t printk specifier in arm_cmn_init_dtc()
perf/arm-cmn: Fix incorrect error check for devm_ioremap()
perf: add NVIDIA Tegra410 C2C PMU
perf: add NVIDIA Tegra410 CPU Memory Latency PMU
perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU
perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU
perf/arm_cspmu: Add arm_cspmu_acpi_dev_get
perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU
perf/arm_cspmu: nvidia: Rename doc to Tegra241
perf/arm-cmn: Stop claiming entire iomem region
arm64: cpufeature: Use pmuv3_implemented() function
arm64: cpufeature: Make PMUVer and PerfMon unsigned
KVM: arm64: Read PMUVer as unsigned
* arm64/for-next/read-once:
: Fixes for __READ_ONCE() with CONFIG_LTO=y
arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
* for-next/misc:
: Miscellaneous cleanups/fixes
arm64: rsi: use linear-map alias for realm config buffer
arm64: Kconfig: fix duplicate word in CMDLINE help text
arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode
arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps
arm64: kexec: Remove duplicate allocation for trans_pgd
arm64: mm: Use generic enum pgtable_level
arm64: scs: Remove redundant save/restore of SCS SP on entry to/from EL0
arm64: remove ARCH_INLINE_*
* for-next/tlbflush:
: Refactor the arm64 TLB invalidation API and implementation
arm64: mm: __ptep_set_access_flags must hint correct TTL
arm64: mm: Provide level hint for flush_tlb_page()
arm64: mm: Wrap flush_tlb_page() around __do_flush_tlb_range()
arm64: mm: More flags for __flush_tlb_range()
arm64: mm: Refactor __flush_tlb_range() to take flags
arm64: mm: Refactor flush_tlb_page() to use __tlbi_level_asid()
arm64: mm: Simplify __flush_tlb_range_limit_excess()
arm64: mm: Simplify __TLBI_RANGE_NUM() macro
arm64: mm: Re-implement the __flush_tlb_range_op macro in C
arm64: mm: Inline __TLBI_VADDR_RANGE() into __tlbi_range()
arm64: mm: Push __TLBI_VADDR() into __tlbi_level()
arm64: mm: Implicitly invalidate user ASID based on TLBI operation
arm64: mm: Introduce a C wrapper for by-range TLB invalidation
arm64: mm: Re-implement the __tlbi_level macro as a C function
* for-next/ttbr-macros-cleanup:
: Cleanups of the TTBR1_* macros
arm64/mm: Directly use TTBRx_EL1_CnP
arm64/mm: Directly use TTBRx_EL1_ASID_MASK
arm64/mm: Describe TTBR1_BADDR_4852_OFFSET
* for-next/kselftest:
: arm64 kselftest updates
selftests/arm64: Implement cmpbr_sigill() to hwcap test
* for-next/feat_lsui:
: Futex support using FEAT_LSUI instructions to avoid toggling PAN
arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present
arm64: Kconfig: Add support for LSUI
KVM: arm64: Use CAST instruction for swapping guest descriptor
arm64: futex: Support futex with FEAT_LSUI
arm64: futex: Refactor futex atomic operation
KVM: arm64: kselftest: set_id_regs: Add test for FEAT_LSUI
KVM: arm64: Expose FEAT_LSUI to guests
arm64: cpufeature: Add FEAT_LSUI
* for-next/mpam: (40 commits)
: Expose MPAM to user-space via resctrl:
: - Add architecture context-switch and hiding of the feature from KVM.
: - Add interface to allow MPAM to be exposed to user-space using resctrl.
: - Add errata workaoround for some existing platforms.
: - Add documentation for using MPAM and what shape of platforms can use resctrl
arm64: mpam: Add initial MPAM documentation
arm_mpam: Quirk CMN-650's CSU NRDY behaviour
arm_mpam: Add workaround for T241-MPAM-6
arm_mpam: Add workaround for T241-MPAM-4
arm_mpam: Add workaround for T241-MPAM-1
arm_mpam: Add quirk framework
arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl
arm64: mpam: Select ARCH_HAS_CPU_RESCTRL
arm_mpam: resctrl: Add empty definitions for assorted resctrl functions
arm_mpam: resctrl: Update the rmid reallocation limit
arm_mpam: resctrl: Add resctrl_arch_rmid_read()
arm_mpam: resctrl: Allow resctrl to allocate monitors
arm_mpam: resctrl: Add support for csu counters
arm_mpam: resctrl: Add monitor initialisation and domain boilerplate
arm_mpam: resctrl: Add kunit test for control format conversions
arm_mpam: resctrl: Add support for 'MB' resource
arm_mpam: resctrl: Wait for cacheinfo to be ready
arm_mpam: resctrl: Add rmid index helpers
arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats
arm_mpam: resctrl: Hide CDP emulation behind CONFIG_EXPERT
...
* for-next/hotplug-batched-tlbi:
: arm64/mm: Enable batched TLB flush in unmap_hotplug_range()
arm64/mm: Reject memory removal that splits a kernel leaf mapping
arm64/mm: Enable batched TLB flush in unmap_hotplug_range()
* for-next/bbml2-fixes:
: Fixes for realm guest and BBML2_NOABORT
arm64: mm: Remove pmd_sect() and pud_sect()
arm64: mm: Handle invalid large leaf mappings correctly
arm64: mm: Fix rodata=full block mapping support for realm guests
* for-next/sysreg:
: arm64 sysreg updates
arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06
* for-next/generic-entry:
: More arm64 refactoring towards using the generic entry code
arm64: Check DAIF (and PMR) at task-switch time
arm64: entry: Use split preemption logic
arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode()
arm64: entry: Consistently prefix arm64-specific wrappers
arm64: entry: Don't preempt with SError or Debug masked
entry: Split preemption from irqentry_exit_to_kernel_mode()
entry: Split kernel mode logic from irqentry_{enter,exit}()
entry: Move irqentry_enter() prototype later
entry: Remove local_irq_{enable,disable}_exit_to_user()
entry: Fix stale comment for irqentry_enter()
* for-next/acpi:
: arm64 ACPI updates
ACPI: AGDI: fix missing newline in error message
|
|
With KASAN_HW_TAGS (MTE) in synchronous mode, tag check faults are
reported as immediate Data Abort exceptions. The TFSR_EL1.TF1 bit is
never set since faults never go through the asynchronous path.
Therefore, reading TFSR_EL1 and executing data and instruction barriers
on kernel entry, exit, context switch and suspend is unnecessary
overhead.
As with the check_mte_async_tcf and clear_mte_async_tcf paths for
TFSRE0_EL1, extend the same optimisation to kernel entry/exit, context
switch and suspend.
All mte kselftests pass. The kunit before and after the patch show same
results.
A selection of test_vmalloc benchmarks running on a arm64 machine.
v6.19 is the baseline. (>0 is faster, <0 is slower, (R)/(I) =
statistically significant Regression/Improvement). Based on significance
and ignoring the noise, the benchmarks improved.
* 77 result classes were considered, with 9 wins, 0 losses and 68 ties
Results of fastpath [1] on v6.19 vs this patch:
+----------------------------+----------------------------------------------------------+------------+
| Benchmark | Result Class | barriers |
+============================+==========================================================+============+
| micromm/fork | fork: p:1, d:10 (seconds) | (I) 2.75% |
| | fork: p:512, d:10 (seconds) | 0.96% |
+----------------------------+----------------------------------------------------------+------------+
| micromm/munmap | munmap: p:1, d:10 (seconds) | -1.78% |
| | munmap: p:512, d:10 (seconds) | 5.02% |
+----------------------------+----------------------------------------------------------+------------+
| micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec) | -0.56% |
| | fix_size_alloc_test: p:1, h:0, l:500000 (usec) | 0.70% |
| | fix_size_alloc_test: p:4, h:0, l:500000 (usec) | 1.18% |
| | fix_size_alloc_test: p:16, h:0, l:500000 (usec) | -5.01% |
| | fix_size_alloc_test: p:16, h:1, l:500000 (usec) | 13.81% |
| | fix_size_alloc_test: p:64, h:0, l:100000 (usec) | 6.51% |
| | fix_size_alloc_test: p:64, h:1, l:100000 (usec) | 32.87% |
| | fix_size_alloc_test: p:256, h:0, l:100000 (usec) | 4.17% |
| | fix_size_alloc_test: p:256, h:1, l:100000 (usec) | 8.40% |
| | fix_size_alloc_test: p:512, h:0, l:100000 (usec) | -0.48% |
| | fix_size_alloc_test: p:512, h:1, l:100000 (usec) | -0.74% |
| | full_fit_alloc_test: p:1, h:0, l:500000 (usec) | 0.53% |
| | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | -2.81% |
| | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | -2.06% |
| | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec) | -0.56% |
| | pcpu_alloc_test: p:1, h:0, l:500000 (usec) | -0.41% |
| | random_size_align_alloc_test: p:1, h:0, l:500000 (usec) | 0.89% |
| | random_size_alloc_test: p:1, h:0, l:500000 (usec) | 1.71% |
| | vm_map_ram_test: p:1, h:0, l:500000 (usec) | 0.83% |
+----------------------------+----------------------------------------------------------+------------+
| schbench/thread-contention | -m 16 -t 1 -r 10 -s 1000, avg_rps (req/sec) | 0.05% |
| | -m 16 -t 1 -r 10 -s 1000, req_latency_p99 (usec) | 0.60% |
| | -m 16 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
| | -m 16 -t 4 -r 10 -s 1000, avg_rps (req/sec) | -0.34% |
| | -m 16 -t 4 -r 10 -s 1000, req_latency_p99 (usec) | -0.58% |
| | -m 16 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) | 9.09% |
| | -m 16 -t 16 -r 10 -s 1000, avg_rps (req/sec) | -0.74% |
| | -m 16 -t 16 -r 10 -s 1000, req_latency_p99 (usec) | -1.40% |
| | -m 16 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
| | -m 16 -t 64 -r 10 -s 1000, avg_rps (req/sec) | -0.78% |
| | -m 16 -t 64 -r 10 -s 1000, req_latency_p99 (usec) | -0.11% |
| | -m 16 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.11% |
| | -m 16 -t 256 -r 10 -s 1000, avg_rps (req/sec) | 2.64% |
| | -m 16 -t 256 -r 10 -s 1000, req_latency_p99 (usec) | 3.15% |
| | -m 16 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) | 17.54% |
| | -m 32 -t 1 -r 10 -s 1000, avg_rps (req/sec) | -1.22% |
| | -m 32 -t 1 -r 10 -s 1000, req_latency_p99 (usec) | 0.85% |
| | -m 32 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
| | -m 32 -t 4 -r 10 -s 1000, avg_rps (req/sec) | -0.34% |
| | -m 32 -t 4 -r 10 -s 1000, req_latency_p99 (usec) | 1.05% |
| | -m 32 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
| | -m 32 -t 16 -r 10 -s 1000, avg_rps (req/sec) | -0.41% |
| | -m 32 -t 16 -r 10 -s 1000, req_latency_p99 (usec) | 0.58% |
| | -m 32 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) | 2.13% |
| | -m 32 -t 64 -r 10 -s 1000, avg_rps (req/sec) | 0.67% |
| | -m 32 -t 64 -r 10 -s 1000, req_latency_p99 (usec) | 2.07% |
| | -m 32 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) | -1.28% |
| | -m 32 -t 256 -r 10 -s 1000, avg_rps (req/sec) | 1.01% |
| | -m 32 -t 256 -r 10 -s 1000, req_latency_p99 (usec) | 0.69% |
| | -m 32 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) | 13.12% |
| | -m 64 -t 1 -r 10 -s 1000, avg_rps (req/sec) | -0.25% |
| | -m 64 -t 1 -r 10 -s 1000, req_latency_p99 (usec) | -0.48% |
| | -m 64 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) | 10.53% |
| | -m 64 -t 4 -r 10 -s 1000, avg_rps (req/sec) | -0.06% |
| | -m 64 -t 4 -r 10 -s 1000, req_latency_p99 (usec) | 0.00% |
| | -m 64 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
| | -m 64 -t 16 -r 10 -s 1000, avg_rps (req/sec) | -0.36% |
| | -m 64 -t 16 -r 10 -s 1000, req_latency_p99 (usec) | 0.52% |
| | -m 64 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.11% |
| | -m 64 -t 64 -r 10 -s 1000, avg_rps (req/sec) | 0.52% |
| | -m 64 -t 64 -r 10 -s 1000, req_latency_p99 (usec) | 3.53% |
| | -m 64 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) | -0.10% |
| | -m 64 -t 256 -r 10 -s 1000, avg_rps (req/sec) | 2.53% |
| | -m 64 -t 256 -r 10 -s 1000, req_latency_p99 (usec) | 1.82% |
| | -m 64 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) | -5.80% |
+----------------------------+----------------------------------------------------------+------------+
| syscall/getpid | mean (ns) | (I) 15.98% |
| | p99 (ns) | (I) 11.11% |
| | p99.9 (ns) | (I) 16.13% |
+----------------------------+----------------------------------------------------------+------------+
| syscall/getppid | mean (ns) | (I) 14.82% |
| | p99 (ns) | (I) 17.86% |
| | p99.9 (ns) | (I) 9.09% |
+----------------------------+----------------------------------------------------------+------------+
| syscall/invalid | mean (ns) | (I) 17.78% |
| | p99 (ns) | (I) 11.11% |
| | p99.9 (ns) | 13.33% |
+----------------------------+----------------------------------------------------------+------------+
[1] https://gitlab.arm.com/tooling/fastpath
Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently for each hwcap we define both the HWCAPn_NAME definition which is
exposed to userspace and a kernel internal KERNEL_HWCAP_NAME definition
which we use internally. This is tedious and repetitive, instead use a
script to generate the KERNEL_HWCAP_ definitions from the UAPI definitions.
No functional changes intended.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The generic irqentry code now provides
irqentry_exit_to_kernel_mode_preempt() and
irqentry_exit_to_kernel_mode_after_preempt(), which can be used
where architectures have different state requirements for involuntary
preemption and exception return, as is the case on arm64.
Use the new functions on arm64, aligning our exit to kernel mode logic
with the style of our exit to user mode logic. This removes the need for
the recently-added bodge in arch_irqentry_exit_need_resched(), and
allows preemption to occur when returning from any exception taken from
kernel mode, which is nicer for RT.
In an ideal world, we'd remove arch_irqentry_exit_need_resched(), and
fold the conditionality directly into the architecture-specific entry
code. That way all the logic necessary to avoid preempting from a
pseudo-NMI could be constrained specifically to the EL1 IRQ/FIQ paths,
avoiding redundant work for other exceptions, and making the flow a bit
clearer. At present it looks like that would require a larger
refactoring (e.g. for the PREEMPT_DYNAMIC logic), and so I've left that
as-is for now.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
On arm64, involuntary kernel preemption has been subtly broken since the
move to the generic irqentry code. When preemption occurs, the new task
may run with SError and Debug exceptions masked unexpectedly, leading to
a loss of RAS events, breakpoints, watchpoints, and single-step
exceptions.
Prior to moving to the generic irqentry code, involuntary preemption of
kernel mode would only occur when returning from regular interrupts, in
a state where interrupts were masked and all other arm64-specific
exceptions (SError, Debug, and pseudo-NMI) were unmasked. This is the
only state in which it is valid to switch tasks.
As part of moving to the generic irqentry code, the involuntary
preemption logic was moved such that involuntary preemption could occur
when returning from any (non-NMI) exception. As most exception handlers
mask all arm64-specific exceptions before this point, preemption could
occur in a state where arm64-specific exceptions were masked. This is
not a valid state to switch tasks, and resulted in the loss of
exceptions described above.
As a temporary bodge, avoid the loss of exceptions by avoiding
involuntary preemption when SError and/or Debug exceptions are masked.
Practically speaking this means that involuntary preemption will only
occur when returning from regular interrupts, as was the case before
moving to the generic irqentry code.
Fixes: 99eb057ccd67 ("arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode()")
Reported-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
* kvm-arm64/vgic-fixes-7.1:
: .
: FIrst pass at fixing a number of vgic-v5 bugs that were found
: after the merge of the initial series.
: .
KVM: arm64: Advertise ID_AA64PFR2_EL1.GCIE
KVM: arm64: vgic-v5: Fold PPI state for all exposed PPIs
KVM: arm64: set_id_regs: Allow GICv3 support to be set at runtime
KVM: arm64: Don't advertises GICv3 in ID_PFR1_EL1 if AArch32 isn't supported
KVM: arm64: Correctly plumb ID_AA64PFR2_EL1 into pkvm idreg handling
KVM: arm64: Move GICv5 timer PPI validation into timer_irqs_are_valid()
KVM: arm64: Remove evaluation of timer state in kvm_cpu_has_pending_timer()
KVM: arm64: Kill arch_timer_context::direct field
KVM: arm64: vgic-v5: Correctly set dist->ready once initialised
KVM: arm64: vgic-v5: Make the effective priority mask a strict limit
KVM: arm64: vgic-v5: Cast vgic_apr to u32 to avoid undefined behaviours
KVM: arm64: vgic-v5: Transfer edge pending state to ICH_PPI_PENDRx_EL2
KVM: arm64: vgic-v5: Hold config_lock while finalizing GICv5 PPIs
KVM: arm64: Account for RESx bits in __compute_fgt()
KVM: arm64: Fix writeable mask for ID_AA64PFR2_EL1
arm64: Fix field references for ICH_PPI_DVIR[01]_EL2
KVM: arm64: Don't skip per-vcpu NV initialisation
KVM: arm64: vgic: Don't reset cpuif/redist addresses at finalize time
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/pkvm-protected-guest: (41 commits)
: .
: pKVM support for protected guests, implementing the very long
: awaited support for anonymous memory, as the elusive guestmem
: has failed to deliver on its promises despite a multi-year
: effort. Patches courtesy of Will Deacon. From the initial cover
: letter:
:
: "[...] this patch series implements support for protected guest
: memory with pKVM, where pages are unmapped from the host as they are
: faulted into the guest and can be shared back from the guest using pKVM
: hypercalls. Protected guests are created using a new machine type
: identifier and can be booted to a shell using the kvmtool patches
: available at [2], which finally means that we are able to test the pVM
: logic in pKVM. Since this is an incremental step towards full isolation
: from the host (for example, the CPU register state and DMA accesses are
: not yet isolated), creating a pVM requires a developer Kconfig option to
: be enabled in addition to booting with 'kvm-arm.mode=protected' and
: results in a kernel taint."
: .
KVM: arm64: Don't hold 'vm_table_lock' across guest page reclaim
KVM: arm64: Allow get_pkvm_hyp_vm() to take a reference to a dying VM
KVM: arm64: Prevent teardown finalisation of referenced 'hyp_vm'
drivers/virt: pkvm: Add Kconfig dependency on DMA_RESTRICTED_POOL
KVM: arm64: Rename PKVM_PAGE_STATE_MASK
KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs
KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim
KVM: arm64: Register 'selftest_vm' in the VM table
KVM: arm64: Extend pKVM page ownership selftests to cover guest donation
KVM: arm64: Add some initial documentation for pKVM
KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled
KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs
KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs
KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs
KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte
KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler
KVM: arm64: Introduce hypercall to force reclaim of a protected page
KVM: arm64: Annotate guest donations with handle and gfn in host stage-2
KVM: arm64: Change 'pkvm_handle_t' to u16
KVM: arm64: Introduce host_stage2_set_owner_metadata_locked()
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/spe-trbe-nvhe:
: .
: Fix SPE and TRBE nVHE world switch which can otherwise result in
: pretty bad behaviours, as they have the nasty habit of performing
: out of context speculative page table walks.
:
: Patches courtesy of Will Deacon.
: .
KVM: arm64: Don't pass host_debug_state to BRBE world-switch routines
KVM: arm64: Disable SPE Profiling Buffer when running in guest context
KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/pkvm-psci:
: .
: Cleanup of the pKVM PSCI relay CPU entry code, making it slightly
: easier to follow, should someone have to wade into these waters
: ever again.
: .
KVM: arm64: Remove extra ISBs when using msr_hcr_el2
KVM: arm64: pkvm: Use direct function pointers for cpu_{on,resume}
KVM: arm64: pkvm: Turn __kvm_hyp_init_cpu into an inner label
KVM: arm64: pkvm: Simplify BTI handling on CPU boot
KVM: arm64: pkvm: Move error handling to the end of kvm_hyp_cpu_entry
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/nv-s2-debugfs:
: .
: Expand the stage-2 ptdump infrastructure to be able to display
: the content of the shadow s2 tables generated by nested virt.
:
: Patches courtesy of Wei-Lin Chang.
: .
KVM: arm64: ptdump: Initialize parser_state before pgtable walk
KVM: arm64: nv: Expose shadow page tables in debugfs
KVM: arm64: ptdump: Make KVM ptdump code s2 mmu aware
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/vgic-v5-ppi: (40 commits)
: .
: Add initial GICv5 support for KVM guests, only adding PPI support
: for the time being. Patches courtesy of Sascha Bischoff.
:
: From the cover letter:
:
: "This is v7 of the patch series to add the virtual GICv5 [1] device
: (vgic_v5). Only PPIs are supported by this initial series, and the
: vgic_v5 implementation is restricted to the CPU interface,
: only. Further patch series are to follow in due course, and will add
: support for SPIs, LPIs, the GICv5 IRS, and the GICv5 ITS."
: .
KVM: arm64: selftests: Add no-vgic-v5 selftest
KVM: arm64: selftests: Introduce a minimal GICv5 PPI selftest
KVM: arm64: gic-v5: Communicate userspace-driveable PPIs via a UAPI
Documentation: KVM: Introduce documentation for VGICv5
KVM: arm64: gic-v5: Probe for GICv5 device
KVM: arm64: gic-v5: Set ICH_VCTLR_EL2.En on boot
KVM: arm64: gic-v5: Introduce kvm_arm_vgic_v5_ops and register them
KVM: arm64: gic-v5: Hide FEAT_GCIE from NV GICv5 guests
KVM: arm64: gic: Hide GICv5 for protected guests
KVM: arm64: gic-v5: Mandate architected PPI for PMU emulation on GICv5
KVM: arm64: gic-v5: Enlighten arch timer for GICv5
irqchip/gic-v5: Introduce minimal irq_set_type() for PPIs
KVM: arm64: gic-v5: Initialise ID and priority bits when resetting vcpu
KVM: arm64: gic-v5: Create and initialise vgic_v5
KVM: arm64: gic-v5: Support GICv5 interrupts with KVM_IRQ_LINE
KVM: arm64: gic-v5: Implement direct injection of PPIs
KVM: arm64: Introduce set_direct_injection irq_op
KVM: arm64: gic-v5: Trap and mask guest ICC_PPI_ENABLERx_EL1 writes
KVM: arm64: gic-v5: Check for pending PPIs
KVM: arm64: gic-v5: Clear TWI if single task running
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/hyp-tracing: (40 commits)
: .
: EL2 tracing support, adding both 'remote' ring-buffer
: infrastructure and the tracing itself, courtesy of
: Vincent Donnefort. From the cover letter:
:
: "The growing set of features supported by the hypervisor in protected
: mode necessitates debugging and profiling tools. Tracefs is the
: ideal candidate for this task:
:
: * It is simple to use and to script.
:
: * It is supported by various tools, from the trace-cmd CLI to the
: Android web-based perfetto.
:
: * The ring-buffer, where are stored trace events consists of linked
: pages, making it an ideal structure for sharing between kernel and
: hypervisor.
:
: This series first introduces a new generic way of creating remote events and
: remote buffers. Then it adds support to the pKVM hypervisor."
: .
tracing: selftests: Extend hotplug testing for trace remotes
tracing: Non-consuming read for trace remotes with an offline CPU
tracing: Adjust cmd_check_undefined to show unexpected undefined symbols
tracing: Restore accidentally removed SPDX tag
KVM: arm64: avoid unused-variable warning
tracing: Generate undef symbols allowlist for simple_ring_buffer
KVM: arm64: tracing: add ftrace dependency
tracing: add more symbols to whitelist
tracing: Update undefined symbols allow list for simple_ring_buffer
KVM: arm64: Fix out-of-tree build for nVHE/pKVM tracing
tracing: selftests: Add hypervisor trace remote tests
KVM: arm64: Add selftest event support to nVHE/pKVM hyp
KVM: arm64: Add hyp_enter/hyp_exit events to nVHE/pKVM hyp
KVM: arm64: Add event support to the nVHE/pKVM hyp and trace remote
KVM: arm64: Add trace reset to the nVHE/pKVM hyp
KVM: arm64: Sync boot clock with the nVHE/pKVM hyp
KVM: arm64: Add trace remote for the nVHE/pKVM hyp
KVM: arm64: Add tracing capability for the nVHE/pKVM hyp
KVM: arm64: Support unaligned fixmap in the pKVM hyp
KVM: arm64: Initialise hyp_nr_cpus for nVHE hyp
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Update acpi/pptt.c to use acpi_get_cpu_uid() and remove unused
get_acpi_id_for_cpu() from arm64/loongarch/riscv, completing PPTT's
migration to the unified ACPI CPU UID interface
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260401081640.26875-8-fengchengwen@huawei.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Centralize acpi_get_cpu_uid() in include/linux/acpi.h (global scope) and
remove arch-specific declarations from arm64/loongarch/riscv/x86
asm/acpi.h. This unifies the interface across architectures and
simplifies maintenance by eliminating duplicate prototypes.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260401081640.26875-6-fengchengwen@huawei.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
As a step towards unifying the interface for retrieving ACPI CPU UID
across architectures, introduce a new function acpi_get_cpu_uid() for
arm64. While at it, add input validation to make the code more robust.
Reimplement get_cpu_for_acpi_id() based on acpi_get_cpu_uid() for
consistency, and move its implementation next to the new function for
code coherence.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://patch.msgid.link/20260401081640.26875-2-fengchengwen@huawei.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
In order to be able to do this, we need to change VM_DATA_DEFAULT_FLAGS
and friends and update the architecture-specific definitions also.
We then have to update some KSM logic to handle VMA flags, and introduce
VMA_STACK_FLAGS to define the vma_flags_t equivalent of VM_STACK_FLAGS.
We also introduce two helper functions for use during the time we are
converting legacy flags to vma_flags_t values - vma_flags_to_legacy() and
legacy_to_vma_flags().
This enables us to iteratively make changes to break these changes up into
separate parts.
We use these explicitly here to keep VM_STACK_FLAGS around for certain
users which need to maintain the legacy vm_flags_t values for the time
being.
We are no longer able to rely on the simple VM_xxx being set to zero if
the feature is not enabled, so in the case of VM_DROPPABLE we introduce
VMA_DROPPABLE as the vma_flags_t equivalent, which is set to
EMPTY_VMA_FLAGS if the droppable flag is not available.
While we're here, we make the description of do_brk_flags() into a kdoc
comment, as it almost was already.
We use vma_flags_to_legacy() to not need to update the vm_get_page_prot()
logic as this time.
Note that in create_init_stack_vma() we have to replace the BUILD_BUG_ON()
with a VM_WARN_ON_ONCE() as the tested values are no longer build time
available.
We also update mprotect_fixup() to use VMA flags where possible, though we
have to live with a little duplication between vm_flags_t and vma_flags_t
values for the time being until further conversions are made.
While we're here, update VM_SPECIAL to be defined in terms of
VMA_SPECIAL_FLAGS now we have vma_flags_to_legacy().
Finally, we update the VMA tests to reflect these changes.
Link: https://lkml.kernel.org/r/d02e3e45d9a33d7904b149f5604904089fd640ae.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com> [SELinux]
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Callers use pmdp_test_and_clear_young() to clear the young flag and check
whether it was set for this PMD entry. Change the return type to bool to
make the intention clearer.
Link: https://lkml.kernel.org/r/f1d31307a13365d3d0fed5809727dcc2dd59631b.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The ptep_clear_flush_young() and clear_flush_young_ptes() are used to
clear the young flag and flush the TLB, returning whether the young flag
was set. Change the return type to bool to make the intention clearer.
Link: https://lkml.kernel.org/r/24af5144b96103631594501f77d4525f2475c1be.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "change young flag check functions to return bool", v2.
This is a cleanup patchset to change all young flag check functions to
return bool, as discussed with David in the previous thread[1]. Since
callers only care about whether the young flag was set, returning bool
makes the intention clearer. No functional changes intended.
This patch (of 6):
Callers use ptep_test_and_clear_young() to clear the young flag and check
whether it was set. Change the return type to bool to make the intention
clearer.
Link: https://lkml.kernel.org/r/cover.1774075004.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/57e70efa9703d43959aa645246ea3cbdba14fa17.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Implement the Arm64 architecture-specific test_and_clear_young_ptes() to
enable batched checking of young flags, improving performance during large
folio reclamation when MGLRU is enabled.
While we're at it, simplify ptep_test_and_clear_young() by calling
test_and_clear_young_ptes(). Since callers guarantee that PTEs are
present before calling these functions, we can use pte_cont() to check the
CONT_PTE flag instead of pte_valid_cont().
Performance testing:
Enable MGLRU, then allocate 10G clean file-backed folios by mmap() in a
memory cgroup, and try to reclaim 8G file-backed folios via the
memory.reclaim interface. I can observe 60%+ performance improvement on
my Arm64 32-core server (and about 15% improvement on my X86 machine).
W/o patchset:
real 0m0.470s
user 0m0.000s
sys 0m0.470s
W/ patchset:
real 0m0.180s
user 0m0.001s
sys 0m0.179s
Link: https://lkml.kernel.org/r/7f891d42a720cc2e57862f3b79e4f774404f313c.1772778858.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
For most architectures every invocation of ZERO_PAGE() does
virt_to_page(empty_zero_page). But empty_zero_page is in BSS and it is
enough to get its struct page once at initialization time and then use it
whenever a zero page should be accessed.
Add yet another __zero_page variable that will be initialized as
virt_to_page(empty_zero_page) for most architectures in a weak
arch_setup_zero_pages() function.
For architectures that use colored zero pages (MIPS and s390) rename their
setup_zero_pages() to arch_setup_zero_pages() and make it global rather
than static.
For architectures that cannot use virt_to_page() for BSS (arm64 and
sparc64) add override of arch_setup_zero_pages().
Link: https://lkml.kernel.org/r/20260211103141.3215197-5-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Reduce 22 declarations of empty_zero_page to 3 and 23 declarations of
ZERO_PAGE() to 4.
Every architecture defines empty_zero_page that way or another, but for the
most of them it is always a page aligned page in BSS and most definitions
of ZERO_PAGE do virt_to_page(empty_zero_page).
Move Linus vetted x86 definition of empty_zero_page and ZERO_PAGE() to the
core MM and drop these definitions in architectures that do not implement
colored zero page (MIPS and s390).
ZERO_PAGE() remains a macro because turning it to a wrapper for a static
inline causes severe pain in header dependencies.
For the most part the change is mechanical, with these being noteworthy:
* alpha: aliased empty_zero_page with ZERO_PGE that was also used for boot
parameters. Switching to a generic empty_zero_page removes the aliasing
and keeps ZERO_PGE for boot parameters only
* arm64: uses __pa_symbol() in ZERO_PAGE() so that definition of
ZERO_PAGE() is kept intact.
* m68k/parisc/um: allocated empty_zero_page from memblock,
although they do not support zero page coloring and having it in BSS
will work fine.
* sparc64 can have empty_zero_page in BSS rather allocate it, but it
can't use virt_to_page() for BSS. Keep it's definition of ZERO_PAGE()
but instead of allocating it, make mem_map_zero point to
empty_zero_page.
* sh: used empty_zero_page for boot parameters at the very early boot.
Rename the parameters page to boot_params_page and let sh use the generic
empty_zero_page.
* hexagon: had an amusing comment about empty_zero_page
/* A handy thing to have if one has the RAM. Declared in head.S */
that unfortunately had to go :)
Link: https://lkml.kernel.org/r/20260211103141.3215197-4-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Helge Deller <deller@gmx.de> [parisc]
Tested-by: Helge Deller <deller@gmx.de> [parisc]
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Magnus Lindholm <linmag7@gmail.com> [alpha]
Acked-by: Dinh Nguyen <dinguyen@kernel.org> [nios2]
Acked-by: Andreas Larsson <andreas@gaisler.com> [sparc]
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The tsk parameter in arch_set_user_pkey_access() is never used in the
function implementations across all architectures (arm64, powerpc, x86).
Link: https://lkml.kernel.org/r/20260219063506.545148-1-sgsu.park@samsung.com
Signed-off-by: Seongsu Park <sgsu.park@samsung.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
- Implement a basic static call trampoline to fix CFI failures with the
generic implementation
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Use static call trampolines when kCFI is enabled
|
|
Move the asm/xor.h headers to lib/raid/xor/$(SRCARCH)/xor_arch.h and
include/linux/raid/xor_impl.h to lib/raid/xor/xor_impl.h so that the
xor.ko module implementation is self-contained in lib/raid/.
As this remove the asm-generic mechanism a new kconfig symbol is added to
indicate that a architecture-specific implementations exists, and
xor_arch.h should be included.
Link: https://lkml.kernel.org/r/20260327061704.3707577-22-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Sterba <dsterba@suse.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Li Nan <linan122@huawei.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove the inner xor_block_templates, and instead have two separate actual
template that call into the neon-enabled compilation unit.
Link: https://lkml.kernel.org/r/20260327061704.3707577-21-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Sterba <dsterba@suse.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Li Nan <linan122@huawei.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Move the optimized XOR into lib/raid and include it it in the main xor.ko
instead of building a separate module for it.
Note that this drops the CONFIG_KERNEL_MODE_NEON dependency, as that is
always set for arm64.
Link: https://lkml.kernel.org/r/20260327061704.3707577-14-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Sterba <dsterba@suse.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Li Nan <linan122@huawei.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Drop the pretty confusing historic XOR_TRY_TEMPLATES and
XOR_SELECT_TEMPLATE, and instead let the architectures provide a
arch_xor_init that calls either xor_register to register candidates or
xor_force to force a specific implementation.
Link: https://lkml.kernel.org/r/20260327061704.3707577-10-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Sterba <dsterba@suse.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Li Nan <linan122@huawei.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit 2c54b423cf85 ("arm64/xor: use EOR3 instructions when available")
changes the definition to __ro_after_init instead of const, but failed to
update the external declaration in xor.h. This was not found because
xor-neon.c doesn't include <asm/xor.h>, and can't easily do that due to
current architecture of the XOR code.
Link: https://lkml.kernel.org/r/20260327061704.3707577-4-hch@lst.de
Fixes: 2c54b423cf85 ("arm64/xor: use EOR3 instructions when available")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Sterba <dsterba@suse.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Li Nan <linan122@huawei.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The semantics of pXd_leaf() are very similar to pXd_sect(). The only
difference is that pXd_sect() only considers it a section if PTE_VALID
is set, whereas pXd_leaf() permits both "valid" and "present-invalid"
types.
Using pXd_sect() has caused issues now that large leaf entries can be
present-invalid since commit a166563e7ec37 ("arm64: mm: support large
block mapping when rodata=full"), so let's just remove the API and
standardize on pXd_leaf().
There are a few callsites of the form pXd_leaf(READ_ONCE(*pXdp)). This
was previously fine for the pXd_sect() macro because it only evaluated
its argument once. But pXd_leaf() evaluates its argument multiple times.
So let's avoid unintended side effects by reimplementing pXd_leaf() as
an inline function.
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
It has been possible for a long time to mark ptes in the linear map as
invalid. This is done for secretmem, kfence, realm dma memory un/share,
and others, by simply clearing the PTE_VALID bit. But until commit
a166563e7ec37 ("arm64: mm: support large block mapping when
rodata=full") large leaf mappings were never made invalid in this way.
It turns out various parts of the code base are not equipped to handle
invalid large leaf mappings (in the way they are currently encoded) and
I've observed a kernel panic while booting a realm guest on a
BBML2_NOABORT system as a result:
[ 15.432706] software IO TLB: Memory encryption is active and system is using DMA bounce buffers
[ 15.476896] Unable to handle kernel paging request at virtual address ffff000019600000
[ 15.513762] Mem abort info:
[ 15.527245] ESR = 0x0000000096000046
[ 15.548553] EC = 0x25: DABT (current EL), IL = 32 bits
[ 15.572146] SET = 0, FnV = 0
[ 15.592141] EA = 0, S1PTW = 0
[ 15.612694] FSC = 0x06: level 2 translation fault
[ 15.640644] Data abort info:
[ 15.661983] ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
[ 15.694875] CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[ 15.723740] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 15.755776] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081f3f000
[ 15.800410] [ffff000019600000] pgd=0000000000000000, p4d=180000009ffff403, pud=180000009fffe403, pmd=00e8000199600704
[ 15.855046] Internal error: Oops: 0000000096000046 [#1] SMP
[ 15.886394] Modules linked in:
[ 15.900029] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-dirty #4 PREEMPT
[ 15.935258] Hardware name: linux,dummy-virt (DT)
[ 15.955612] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 15.986009] pc : __pi_memcpy_generic+0x128/0x22c
[ 16.006163] lr : swiotlb_bounce+0xf4/0x158
[ 16.024145] sp : ffff80008000b8f0
[ 16.038896] x29: ffff80008000b8f0 x28: 0000000000000000 x27: 0000000000000000
[ 16.069953] x26: ffffb3976d261ba8 x25: 0000000000000000 x24: ffff000019600000
[ 16.100876] x23: 0000000000000001 x22: ffff0000043430d0 x21: 0000000000007ff0
[ 16.131946] x20: 0000000084570010 x19: 0000000000000000 x18: ffff00001ffe3fcc
[ 16.163073] x17: 0000000000000000 x16: 00000000003fffff x15: 646e612065766974
[ 16.194131] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 16.225059] x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000018
[ 16.256113] x8 : 0000000000000018 x7 : 0000000000000000 x6 : 0000000000000000
[ 16.287203] x5 : ffff000019607ff0 x4 : ffff000004578000 x3 : ffff000019600000
[ 16.318145] x2 : 0000000000007ff0 x1 : ffff000004570010 x0 : ffff000019600000
[ 16.349071] Call trace:
[ 16.360143] __pi_memcpy_generic+0x128/0x22c (P)
[ 16.380310] swiotlb_tbl_map_single+0x154/0x2b4
[ 16.400282] swiotlb_map+0x5c/0x228
[ 16.415984] dma_map_phys+0x244/0x2b8
[ 16.432199] dma_map_page_attrs+0x44/0x58
[ 16.449782] virtqueue_map_page_attrs+0x38/0x44
[ 16.469596] virtqueue_map_single_attrs+0xc0/0x130
[ 16.490509] virtnet_rq_alloc.isra.0+0xa4/0x1fc
[ 16.510355] try_fill_recv+0x2a4/0x584
[ 16.526989] virtnet_open+0xd4/0x238
[ 16.542775] __dev_open+0x110/0x24c
[ 16.558280] __dev_change_flags+0x194/0x20c
[ 16.576879] netif_change_flags+0x24/0x6c
[ 16.594489] dev_change_flags+0x48/0x7c
[ 16.611462] ip_auto_config+0x258/0x1114
[ 16.628727] do_one_initcall+0x80/0x1c8
[ 16.645590] kernel_init_freeable+0x208/0x2f0
[ 16.664917] kernel_init+0x24/0x1e0
[ 16.680295] ret_from_fork+0x10/0x20
[ 16.696369] Code: 927cec03 cb0e0021 8b0e0042 a9411c26 (a900340c)
[ 16.723106] ---[ end trace 0000000000000000 ]---
[ 16.752866] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 16.792556] Kernel Offset: 0x3396ea200000 from 0xffff800080000000
[ 16.818966] PHYS_OFFSET: 0xfff1000080000000
[ 16.837237] CPU features: 0x0000000,00060005,13e38581,957e772f
[ 16.862904] Memory Limit: none
[ 16.876526] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
This panic occurs because the swiotlb memory was previously shared to
the host (__set_memory_enc_dec()), which involves transitioning the
(large) leaf mappings to invalid, sharing to the host, then marking the
mappings valid again. But pageattr_p[mu]d_entry() would only update the
entry if it is a section mapping, since otherwise it concluded it must
be a table entry so shouldn't be modified. But p[mu]d_sect() only
returns true if the entry is valid. So the result was that the large
leaf entry was made invalid in the first pass then ignored in the second
pass. It remains invalid until the above code tries to access it and
blows up.
The simple fix would be to update pageattr_pmd_entry() to use
!pmd_table() instead of pmd_sect(). That would solve this problem.
But the ptdump code also suffers from a similar issue. It checks
pmd_leaf() and doesn't call into the arch-specific note_page() machinery
if it returns false. As a result of this, ptdump wasn't even able to
show the invalid large leaf mappings; it looked like they were valid
which made this super fun to debug. the ptdump code is core-mm and
pmd_table() is arm64-specific so we can't use the same trick to solve
that.
But we already support the concept of "present-invalid" for user space
entries. And even better, pmd_leaf() will return true for a leaf mapping
that is marked present-invalid. So let's just use that encoding for
present-invalid kernel mappings too. Then we can use pmd_leaf() where we
previously used pmd_sect() and everything is magically fixed.
Additionally, from inspection kernel_page_present() was broken in a
similar way, so I'm also updating that to use pmd_leaf().
The transitional page tables component was also similarly broken; it
creates a copy of the kernel page tables, making RO leaf mappings RW in
the process. It also makes invalid (but-not-none) pte mappings valid.
But it was not doing this for large leaf mappings. This could have
resulted in crashes at kexec- or hibernate-time. This code is fixed to
flip "present-invalid" mappings back to "present-valid" at all levels.
Finally, I have hardened split_pmd()/split_pud() so that if it is passed
a "present-invalid" leaf, it will maintain that property in the split
leaves, since I wasn't able to convince myself that it would only ever
be called for "present-valid" leaves.
Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
Cc: stable@vger.kernel.org
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Commit a166563e7ec37 ("arm64: mm: support large block mapping when
rodata=full") enabled the linear map to be mapped by block/cont while
still allowing granular permission changes on BBML2_NOABORT systems by
lazily splitting the live mappings. This mechanism was intended to be
usable by realm guests since they need to dynamically share dma buffers
with the host by "decrypting" them - which for Arm CCA, means marking
them as shared in the page tables.
However, it turns out that the mechanism was failing for realm guests
because realms need to share their dma buffers (via
__set_memory_enc_dec()) much earlier during boot than
split_kernel_leaf_mapping() was able to handle. The report linked below
showed that GIC's ITS was one such user. But during the investigation I
found other callsites that could not meet the
split_kernel_leaf_mapping() constraints.
The problem is that we block map the linear map based on the boot CPU
supporting BBML2_NOABORT, then check that all the other CPUs support it
too when finalizing the caps. If they don't, then we stop_machine() and
split to ptes. For safety, split_kernel_leaf_mapping() previously
wouldn't permit splitting until after the caps were finalized. That
ensured that if any secondary cpus were running that didn't support
BBML2_NOABORT, we wouldn't risk breaking them.
I've fix this problem by reducing the black-out window where we refuse
to split; there are now 2 windows. The first is from T0 until the page
allocator is inititialized. Splitting allocates memory for the page
allocator so it must be in use. The second covers the period between
starting to online the secondary cpus until the system caps are
finalized (this is a very small window).
All of the problematic callers are calling __set_memory_enc_dec() before
the secondary cpus come online, so this solves the problem. However, one
of these callers, swiotlb_update_mem_attributes(), was trying to split
before the page allocator was initialized. So I have moved this call
from arch_mm_preinit() to mem_init(), which solves the ordering issue.
I've added warnings and return an error if any attempt is made to split
in the black-out windows.
Note there are other issues which prevent booting all the way to user
space, which will be fixed in subsequent patches.
Reported-by: Jinjiang Tu <tujinjiang@huawei.com>
Closes: https://lore.kernel.org/all/0b2a4ae5-fc51-4d77-b177-b2e9db74f11d@huawei.com/
Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
Cc: stable@vger.kernel.org
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
All invocations of the cond_yield macro have been removed, so remove the
macro definition as well.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
enum pgtable_type was introduced for arm64 by commit c64f46ee1377
("arm64: mm: use enum to identify pgtable level instead of
*_SHIFT"). In the meantime, the generic enum pgtable_level got
introduced by commit b22cc9a9c7ff ("mm/rmap: convert "enum
rmap_level" to "enum pgtable_level"").
Let's switch to the generic enum pgtable_level. The only difference
is that it also includes PGD level; __pgd_pgtable_alloc() isn't
expected to create PGD tables so we add a VM_WARN_ON() for that
case.
Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
GICv5 supports up to 128 PPIs, which would introduce a large amount of
overhead if all of them were actively tracked. Rather than keeping
track of all 128 potential PPIs, we instead only consider the set of
architected PPIs (the first 64). Moreover, we further reduce that set
by only exposing a subset of the PPIs to a guest. In practice, this
means that only 4 PPIs are typically exposed to a guest - the SW_PPI,
PMUIRQ, and the timers.
When folding the PPI state, changed bits in the active or pending were
used to choose which state to sync back. However, this breaks badly
for Edge interrupts when exiting the guest before it has consumed the
edge. There is no change in pending state detected, and the edge is
lost forever.
Given the reduced set of PPIs exposed to the guest, and the issues
around tracking the edges, drop the tracking of changed state, and
instead iterate over the limited subset of PPIs exposed to the guest
directly.
This change drops the second copy of the PPI pending state used for
detecting edges in the pending state, and reworks
vgic_v5_fold_ppi_state() to iterate over the VM's PPI mask instead.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://patch.msgid.link/20260401162152.932243-1-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Implement arm64 support for the 'unoptimized' static call variety, which
routes all calls through a trampoline that performs a tail call to the
chosen function, and wire it up for use when kCFI is enabled. This works
around an issue with kCFI and generic static calls, where the prototypes
of default handlers such as __static_call_nop() and __static_call_ret0()
don't match the expected prototype of the call site, resulting in kCFI
false positives [0].
Since static call targets may be located in modules loaded out of direct
branching range, this needs an ADRP/LDR pair to load the branch target
into R16 and a branch-to-register (BR) instruction to perform an
indirect call.
Unlike on x86, there is no pressing need on arm64 to avoid indirect
calls at all cost, but hiding it from the compiler as is done here does
have some benefits:
- the literal is located in .rodata, which gives us the same robustness
advantage that code patching does;
- no D-cache pollution from fetching hash values from .text sections.
From an execution speed PoV, this is unlikely to make any difference at
all.
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will McVicker <willmcvicker@google.com>
Reported-by: Carlos Llamas <cmllamas@google.com>
Closes: https://lore.kernel.org/all/20260311225822.1565895-1-cmllamas@google.com/ [0]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
This finishes the work on these odd functions that were only implemented
by a handful of architectures.
The 'flushcache' function was only used from the iterator code, and
let's make it do the same thing that the nontemporal version does:
remove the two underscores and add the user address checking.
Yes, yes, the user address checking is also done at iovec import time,
but we have long since walked away from the old double-underscore thing
where we try to avoid address checking overhead at access time, and
these functions shouldn't be so special and old-fashioned.
The arm64 version already did the address check, in fact, so there it's
just a matter of renaming it. For powerpc and x86-64 we now do the
proper user access boilerplate.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Introduce a new VM type for KVM/arm64 to allow userspace to request the
creation of a "protected VM" when the host has booted with pKVM enabled.
For now, this feature results in a taint on first use as many aspects of
a protected VM are not yet protected!
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-32-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
If a protected vCPU faults on an IPA which appears to be mapped, query
the hypervisor to determine whether or not the faulting pte has been
poisoned by a forceful reclaim. If the pte has been poisoned, return
-EFAULT back to userspace rather than retrying the instruction forever.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-28-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|