aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-11-08Merge tag 'x86-urgent-2025-11-08' of ↵Linus Torvalds3-99/+54
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix AMD PCI root device caching regression that triggers on certain firmware variants - Fix the zen5_rdseed_microcode[] array to be NULL-terminated - Add more AMD models to microcode signature checking * tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Add more known models to entry sign checking x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcode x86/amd_node: Fix AMD root device caching
2025-11-07x86/microcode/AMD: Add more known models to entry sign checkingMario Limonciello (AMD)1-0/+2
Two Zen5 systems are missing from need_sha_check(). Add them. Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://patch.msgid.link/20251106182904.4143757-1-superm1@kernel.org
2025-11-05x86: uaccess: don't use runtime-const rewriting in modulesLinus Torvalds1-1/+5
The runtime-const infrastructure was never designed to handle the modular case, because the constant fixup is only done at boot time for core kernel code. But by the time I used it for the x86-64 user space limit handling in commit 86e6b1547b3d ("x86: fix user address masking non-canonical speculation issue"), I had completely repressed that fact. And it all happens to work because the only code that currently actually gets inlined by modules is for the access_ok() limit check, where the default constant value works even when not fixed up. Because at least I had intentionally made it be something that is in the non-canonical address space region. But it's technically very wrong, and it does mean that at least in theory, the use of 'access_ok()' + '__get_user()' can trigger the same speculation issue with non-canonical addresses that the original commit was all about. The pattern is unusual enough that this probably doesn't matter in practice, but very wrong is still very wrong. Also, let's fix it before the nice optimized scoped user accessor helpers that Thomas Gleixner is working on cause this pseudo-constant to then be more widely used. This all came up due to an unrelated discussion with Mateusz Guzik about using the runtime const infrastructure for names_cachep accesses too. There the modular case was much more obviously broken, and Mateusz noted it in his 'v2' of the patch series. That then made me notice how broken 'access_ok()' had been in modules all along. Mea culpa, mea maxima culpa. Fix it by simply not using the runtime-const code in modules, and just using the USER_PTR_MAX variable value instead. This is not performance-critical like the core user accessor functions (get_user() and friends) are. Also make sure this doesn't get forgotten the next time somebody wants to do runtime constant optimizations by having the x86 runtime-const.h header file error out if included by modules. Fixes: 86e6b1547b3d ("x86: fix user address masking non-canonical speculation issue") Acked-by: Borislav Petkov <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Triggered-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/all/20251030105242.801528-1-mjguzik@gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-11-04x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcodeMario Limonciello1-0/+1
Running x86_match_min_microcode_rev() on a Zen5 CPU trips up KASAN for an out of bounds access. Fixes: 607b9fb2ce248 ("x86/CPU/AMD: Add RDSEED fix for Zen5") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251104161007.269885-1-mario.limonciello@amd.com
2025-11-03x86/amd_node: Fix AMD root device cachingYazen Ghannam1-99/+51
Recent AMD node rework removed the "search and count" method of caching AMD root devices. This depended on the value from a Data Fabric register that was expected to hold the PCI bus of one of the root devices attached to that fabric. However, this expectation is incorrect. The register, when read from PCI config space, returns the bitwise-OR of the buses of all attached root devices. This behavior is benign on AMD reference design boards, since the bus numbers are aligned. This results in a bitwise-OR value matching one of the buses. For example, 0x00 | 0x40 | 0xA0 | 0xE0 = 0xE0. This behavior breaks on boards where the bus numbers are not exactly aligned. For example, 0x00 | 0x07 | 0xE0 | 0x15 = 0x1F. The examples above are for AMD node 0. The first root device on other nodes will not be 0x00. The first root device for other nodes will depend on the total number of root devices, the system topology, and the specific PCI bus number assignment. For example, a system with 2 AMD nodes could have this: Node 0 : 0x00 0x07 0x0e 0x15 Node 1 : 0x1c 0x23 0x2a 0x31 The bus numbering style in the reference boards is not a requirement. The numbering found in other boards is not incorrect. Therefore, the root device caching method needs to be adjusted. Go back to the "search and count" method used before the recent rework. Search for root devices using PCI class code rather than fixed PCI IDs. This keeps the goal of the rework (remove dependency on PCI IDs) while being able to support various board designs. Merge helper functions to reduce code duplication. [ bp: Reflow comment. ] Fixes: 40a5f6ffdfc8 ("x86/amd_nb: Simplify root device search") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/all/20251028-fix-amd-root-v2-1-843e38f8be2c@amd.com
2025-10-30x86/CPU/AMD: Extend Zen6 model rangeBorislav Petkov (AMD)1-1/+1
Add some more Zen6 models. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251029123056.19987-1-bp@kernel.org
2025-10-28x86/fpu: Ensure XFD state on signal deliveryChang S. Bae1-0/+3
Sean reported [1] the following splat when running KVM tests: WARNING: CPU: 232 PID: 15391 at xfd_validate_state+0x65/0x70 Call Trace: <TASK> fpu__clear_user_states+0x9c/0x100 arch_do_signal_or_restart+0x142/0x210 exit_to_user_mode_loop+0x55/0x100 do_syscall_64+0x205/0x2c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Chao further identified [2] a reproducible scenario involving signal delivery: a non-AMX task is preempted by an AMX-enabled task which modifies the XFD MSR. When the non-AMX task resumes and reloads XSTATE with init values, a warning is triggered due to a mismatch between fpstate::xfd and the CPU's current XFD state. fpu__clear_user_states() does not currently re-synchronize the XFD state after such preemption. Invoke xfd_update_state() which detects and corrects the mismatch if there is a dynamic feature. This also benefits the sigreturn path, as fpu__restore_sig() may call fpu__clear_user_states() when the sigframe is inaccessible. [ dhansen: minor changelog munging ] Closes: https://lore.kernel.org/lkml/aDCo_SczQOUaB2rS@google.com [1] Fixes: 672365477ae8a ("x86/fpu: Update XFD state where required") Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/all/aDWbctO%2FRfTGiCg3@intel.com [2] Cc:stable@vger.kernel.org Link: https://patch.msgid.link/20250610001700.4097-1-chang.seok.bae%40intel.com
2025-10-28x86/CPU/AMD: Add RDSEED fix for Zen5Gregory Price1-0/+10
There's an issue with RDSEED's 16-bit and 32-bit register output variants on Zen5 which return a random value of 0 "at a rate inconsistent with randomness while incorrectly signaling success (CF=1)". Search the web for AMD-SB-7055 for more detail. Add a fix glue which checks microcode revisions. [ bp: Add microcode revisions checking, rewrite. ] Cc: stable@vger.kernel.org Signed-off-by: Gregory Price <gourry@gourry.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20251018024010.4112396-1-gourry@gourry.net
2025-10-27x86/microcode/AMD: Limit Entrysign signature checking to known generationsBorislav Petkov (AMD)1-1/+19
Limit Entrysign sha256 signature checking to CPUs in the range Zen1-Zen5. X86_BUG cannot be used here because the loading on the BSP happens way too early, before the cpufeatures machinery has been set up. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/all/20251023124629.5385-1-bp@kernel.org
2025-10-24x86/bugs: Remove dead code which might prevent from buildingAndy Shevchenko1-7/+0
Clang, in particular, is not happy about dead code: arch/x86/kernel/cpu/bugs.c:1830:20: error: unused function 'match_option' [-Werror,-Wunused-function] 1830 | static inline bool match_option(const char *arg, int arglen, const char *opt) | ^~~~~~~~~~~~ 1 error generated. Remove a leftover from the previous cleanup. Fixes: 02ac6cc8c5a1 ("x86/bugs: Simplify SSB cmdline parsing") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251024125959.1526277-1-andriy.shevchenko%40linux.intel.com
2025-10-21x86/bugs: Qualify RETBLEED_INTEL_MSGDavid Kaplan1-1/+3
When retbleed mitigation is disabled, the kernel already prints an info message that the system is vulnerable. Recent code restructuring also inadvertently led to RETBLEED_INTEL_MSG being printed as an error, which is unnecessary as retbleed mitigation was already explicitly disabled (by config option, cmdline, etc.). Qualify this print statement so the warning is not printed unless an actual retbleed mitigation was selected and is being disabled due to incompatibility with spectre_v2. Fixes: e3b78a7ad5ea ("x86/bugs: Restructure retbleed mitigation") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220624 Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251003171936.155391-1-david.kaplan@amd.com
2025-10-21x86/microcode: Fix Entrysign revision check for Zen1/NaplesAndrew Cooper1-1/+1
... to match AMD's statement here: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7033.html Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://patch.msgid.link/20251020144124.2930784-1-andrew.cooper3@citrix.com
2025-10-20x86,fs/resctrl: Fix NULL pointer dereference with events force-disabled in ↵Babu Moger1-1/+10
mbm_event mode The following NULL pointer dereference is encountered on mount of resctrl fs after booting a system that supports assignable counters with the "rdt=!mbmtotal,!mbmlocal" kernel parameters: BUG: kernel NULL pointer dereference, address: 0000000000000008 RIP: 0010:mbm_cntr_get Call Trace: rdtgroup_assign_cntr_event rdtgroup_assign_cntrs rdt_get_tree Specifying the kernel parameter "rdt=!mbmtotal,!mbmlocal" effectively disables the legacy X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features and the MBM events they represent. This results in the per-domain MBM event related data structures to not be allocated during early initialization. resctrl fs initialization follows by implicitly enabling both MBM total and local events on a system that supports assignable counters (mbm_event mode), but this enabling occurs after the per-domain data structures have been created. After booting, resctrl fs assumes that an enabled event can access all its state. This results in NULL pointer dereference when resctrl attempts to access the un-allocated structures of an enabled event. Remove the late MBM event enabling from resctrl fs. This leaves a problem where the X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features may be disabled while assignable counter (mbm_event) mode is enabled without any events to support. Switching between the "default" and "mbm_event" mode without any events is not practical. Create a dependency between the X86_FEATURE_{CQM_MBM_TOTAL,CQM_MBM_LOCAL} and X86_FEATURE_ABMC (assignable counter) hardware features. An x86 system that supports assignable counters now requires support of X86_FEATURE_CQM_MBM_TOTAL or X86_FEATURE_CQM_MBM_LOCAL. This ensures all needed MBM related data structures are created before use and that it is only possible to switch between "default" and "mbm_event" mode when the same events are available in both modes. This dependency does not exist in the hardware but this usage of these feature settings work for known systems. [ bp: Massage commit message. ] Fixes: 13390861b426e ("x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature details") Co-developed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/a62e6ac063d0693475615edd213d5be5e55443e6.1760560934.git.babu.moger@amd.com
2025-10-15x86/CPU/AMD: Prevent reset reasons from being retained across rebootRong Zhang1-2/+14
The S5_RESET_STATUS register is parsed on boot and printed to kmsg. However, this could sometimes be misleading and lead to users wasting a lot of time on meaningless debugging for two reasons: * Some bits are never cleared by hardware. It's the software's responsibility to clear them as per the Processor Programming Reference (see [1]). * Some rare hardware-initiated platform resets do not update the register at all. In both cases, a previous reboot could leave its trace in the register, resulting in users seeing unrelated reboot reasons while debugging random reboots afterward. Write the read value back to the register in order to clear all reason bits since they are write-1-to-clear while the others must be preserved. [1]: https://bugzilla.kernel.org/show_bug.cgi?id=206537#attach_303991 [ bp: Massage commit message. ] Fixes: ab8131028710 ("x86/CPU/AMD: Print the reason for the last reset") Signed-off-by: Rong Zhang <i@rong.moe> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/all/20250913144245.23237-1-i@rong.moe/
2025-10-13x86/resctrl: Fix miscount of bandwidth event when reactivating previously ↵Babu Moger1-4/+10
unavailable RMID Users can create as many monitoring groups as the number of RMIDs supported by the hardware. However, on AMD systems, only a limited number of RMIDs are guaranteed to be actively tracked by the hardware. RMIDs that exceed this limit are placed in an "Unavailable" state. When a bandwidth counter is read for such an RMID, the hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable remains set on first read after tracking re-starts and is clear on all subsequent reads as long as the RMID is tracked. resctrl miscounts the bandwidth events after an RMID transitions from the "Unavailable" state back to being tracked. This happens because when the hardware starts counting again after resetting the counter to zero, resctrl in turn compares the new count against the counter value stored from the previous time the RMID was tracked. This results in resctrl computing an event value that is either undercounting (when new counter is more than stored counter) or a mistaken overflow (when new counter is less than stored counter). Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to zero whenever the RMID is in the "Unavailable" state to ensure accurate counting after the RMID resets to zero when it starts to be tracked again. Example scenario that results in mistaken overflow ================================================== 1. The resctrl filesystem is mounted, and a task is assigned to a monitoring group. $mount -t resctrl resctrl /sys/fs/resctrl $mkdir /sys/fs/resctrl/mon_groups/test1/ $echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 21323 <- Total bytes on domain 0 "Unavailable" <- Total bytes on domain 1 Task is running on domain 0. Counter on domain 1 is "Unavailable". 2. The task runs on domain 0 for a while and then moves to domain 1. The counter starts incrementing on domain 1. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 7345357 <- Total bytes on domain 0 4545 <- Total bytes on domain 1 3. At some point, the RMID in domain 0 transitions to the "Unavailable" state because the task is no longer executing in that domain. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes "Unavailable" <- Total bytes on domain 0 434341 <- Total bytes on domain 1 4. Since the task continues to migrate between domains, it may eventually return to domain 0. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 17592178699059 <- Overflow on domain 0 3232332 <- Total bytes on domain 1 In this case, the RMID on domain 0 transitions from "Unavailable" state to active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when the counter is read and begins tracking the RMID counting from 0. Subsequent reads succeed but return a value smaller than the previously saved MSR value (7345357). Consequently, the resctrl's overflow logic is triggered, it compares the previous value (7345357) with the new, smaller value and incorrectly interprets this as a counter overflow, adding a large delta. In reality, this is a false positive: the counter did not overflow but was simply reset when the RMID transitioned from "Unavailable" back to active state. Here is the text from APM [1] available from [2]. "In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the first QM_CTR read when it begins tracking an RMID that it was not previously tracking. The U bit will be zero for all subsequent reads from that RMID while it is still tracked by the hardware. Therefore, a QM_CTR read with the U bit set when that RMID is in use by a processor can be considered 0 when calculating the difference with a subsequent read." [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory Bandwidth (MBM). [ bp: Split commit message into smaller paragraph chunks for better consumption. ] Fixes: 4d05bf71f157d ("x86/resctrl: Introduce AMD QOS feature") Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org # needs adjustments for <= v6.17 Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-10-11Merge tag 'x86_core_for_v6.18_rc1' of ↵Linus Torvalds7-129/+203
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more x86 updates from Borislav Petkov: - Remove a bunch of asm implementing condition flags testing in KVM's emulator in favor of int3_emulate_jcc() which is written in C - Replace KVM fastops with C-based stubs which avoids problems with the fastop infra related to latter not adhering to the C ABI due to their special calling convention and, more importantly, bypassing compiler control-flow integrity checking because they're written in asm - Remove wrongly used static branches and other ugliness accumulated over time in hyperv's hypercall implementation with a proper static function call to the correct hypervisor call variant - Add some fixes and modifications to allow running FRED-enabled kernels in KVM even on non-FRED hardware - Add kCFI improvements like validating indirect calls and prepare for enabling kCFI with GCC. Add cmdline params documentation and other code cleanups - Use the single-byte 0xd6 insn as the official #UD single-byte undefined opcode instruction as agreed upon by both x86 vendors - Other smaller cleanups and touchups all over the place * tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86,retpoline: Optimize patch_retpoline() x86,ibt: Use UDB instead of 0xEA x86/cfi: Remove __noinitretpoline and __noretpoline x86/cfi: Add "debug" option to "cfi=" bootparam x86/cfi: Standardize on common "CFI:" prefix for CFI reports x86/cfi: Document the "cfi=" bootparam options x86/traps: Clarify KCFI instruction layout compiler_types.h: Move __nocfi out of compiler-specific header objtool: Validate kCFI calls x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware x86/fred: Install system vector handlers even if FRED isn't fully enabled x86/hyperv: Use direct call to hypercall-page x86/hyperv: Clean up hv_do_hypercall() KVM: x86: Remove fastops KVM: x86: Convert em_salc() to C KVM: x86: Introduce EM_ASM_3WCL KVM: x86: Introduce EM_ASM_1SRC2 KVM: x86: Introduce EM_ASM_2CL KVM: x86: Introduce EM_ASM_2W ...
2025-10-11Merge tag 'x86_cleanups_for_v6.18_rc1' of ↵Linus Torvalds3-31/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - Simplify inline asm flag output operands now that the minimum compiler version supports the =@ccCOND syntax - Remove a bunch of AS_* Kconfig symbols which detect assembler support for various instruction mnemonics now that the minimum assembler version supports them all - The usual cleanups all over the place * tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__ x86/sgx: Use ENCLS mnemonic in <kernel/cpu/sgx/encls.h> x86/mtrr: Remove license boilerplate text with bad FSF address x86/asm: Use RDPKRU and WRPKRU mnemonics in <asm/special_insns.h> x86/idle: Use MONITORX and MWAITX mnemonics in <asm/mwait.h> x86/entry/fred: Push __KERNEL_CS directly x86/kconfig: Remove CONFIG_AS_AVX512 crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ crypto: X86 - Remove CONFIG_AS_VAES crypto: x86 - Remove CONFIG_AS_GFNI x86/kconfig: Drop unused and needless config X86_64_SMP
2025-10-07Merge tag 'acpi-6.18-rc1-2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "These fix a driver bug, clean up two pieces of code and improve the fwnode API consistency: - Add missing synchronization between interface updates in the ACPI battery driver (Rafael Wysocki) - Remove open coded check for cpu_feature_enabled() from acpi_processor_power_init_bm_check() (Mario Limonciello) - Remove redundant rcu_read_lock/unlock() under spinlock from ghes_notify_hed() in the ACPI APEI support code (pengdonglin) - Make the .get_next_child_node() callback in the ACPI fwnode backend skip ACPI devices that are not present for consistency with the analogous callback in the OF fwnode backend (Sakari Ailus)" * tag 'acpi-6.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: property: Return present device nodes only on fwnode interface ACPI: APEI: Remove redundant rcu_read_lock/unlock() under spinlock ACPI: battery: Add synchronization between interface updates x86/acpi/cstate: Remove open coded check for cpu_feature_enabled()
2025-10-07Merge tag 'hyperv-next-signed-20251006' of ↵Linus Torvalds1-1/+10
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Unify guest entry code for KVM and MSHV (Sean Christopherson) - Switch Hyper-V MSI domain to use msi_create_parent_irq_domain() (Nam Cao) - Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV (Mukesh Rathor) - Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov) - Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna Kumar T S M) - Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari, Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley) * tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hyperv: Remove the spurious null directive line MAINTAINERS: Mark hyperv_fb driver Obsolete fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver Drivers: hv: Make CONFIG_HYPERV bool Drivers: hv: Add CONFIG_HYPERV_VMBUS option Drivers: hv: vmbus: Fix typos in vmbus_drv.c Drivers: hv: vmbus: Fix sysfs output format for ring buffer index Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store() x86/hyperv: Switch to msi_create_parent_irq_domain() mshv: Use common "entry virt" APIs to do work in root before running guest entry: Rename "kvm" entry code assets to "virt" to genericize APIs entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper mshv: Handle NEED_RESCHED_LAZY before transferring to guest x86/hyperv: Add kexec/kdump support on Azure CVMs Drivers: hv: Simplify data structures for VMBus channel close message Drivers: hv: util: Cosmetic changes for hv_utils_transport.c mshv: Add support for a new parent partition configuration clocksource: hyper-v: Skip unnecessary checks for the root partition hyperv: Add missing field to hv_output_map_device_interrupt
2025-10-07Merge branches 'acpi-x86', 'acpi-battery', 'acpi-apei' and 'acpi-property'Rafael J. Wysocki1-1/+1
Merge an x86 cleanup related to ACPI, an ACPI battery driver fix, an ACPI APEI cleanup, and an ACPI device properties handling update for 6.18-rc1: - Remove open coded check for cpu_feature_enabled() from acpi_processor_power_init_bm_check() (Mario Limonciello) - Add missing synchronization between interface updates in the ACPI battery driver (Rafael Wysocki) - Remove redundant rcu_read_lock/unlock() under spinlock from ghes_notify_hed() in the ACPI APEI support code (pengdonglin) - Make the .get_next_child_node() callback in the ACPI fwnode backend skip ACPI devices that are not present for consistency with the analogous callback in the OF fwnode backend (Sakari Ailus) * acpi-x86: x86/acpi/cstate: Remove open coded check for cpu_feature_enabled() * acpi-battery: ACPI: battery: Add synchronization between interface updates * acpi-apei: ACPI: APEI: Remove redundant rcu_read_lock/unlock() under spinlock * acpi-property: ACPI: property: Return present device nodes only on fwnode interface
2025-10-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+1
Pull x86 kvm updates from Paolo Bonzini: "Generic: - Rework almost all of KVM's exports to expose symbols only to KVM's x86 vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko x86: - Rework almost all of KVM x86's exports to expose symbols only to KVM's vendor modules, i.e. to kvm-{amd,intel}.ko - Add support for virtualizing Control-flow Enforcement Technology (CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks). It is worth noting that while SHSTK and IBT can be enabled separately in CPUID, it is not really possible to virtualize them separately. Therefore, Intel processors will really allow both SHSTK and IBT under the hood if either is made visible in the guest's CPUID. The alternative would be to intercept XSAVES/XRSTORS, which is not feasible for performance reasons - Fix a variety of fuzzing WARNs all caused by checking L1 intercepts when completing userspace I/O. KVM has already committed to allowing L2 to to perform I/O at that point - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is supposed to exist for v2 PMUs - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs - Add support for the immediate forms of RDMSR and WRMSRNS, sans full emulator support (KVM should never need to emulate the MSRs outside of forced emulation and other contrived testing scenarios) - Clean up the MSR APIs in preparation for CET and FRED virtualization, as well as mediated vPMU support - Clean up a pile of PMU code in anticipation of adding support for mediated vPMUs - Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI vmexits needed to faithfully emulate an I/O APIC for such guests - Many cleanups and minor fixes - Recover possible NX huge pages within the TDP MMU under read lock to reduce guest jitter when restoring NX huge pages - Return -EAGAIN during prefault if userspace concurrently deletes/moves the relevant memslot, to fix an issue where prefaulting could deadlock with the memslot update x86 (AMD): - Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is supported - Require a minimum GHCB version of 2 when starting SEV-SNP guests via KVM_SEV_INIT2 so that invalid GHCB versions result in immediate errors instead of latent guest failures - Add support for SEV-SNP's CipherText Hiding, an opt-in feature that prevents unauthorized CPU accesses from reading the ciphertext of SNP guest private memory, e.g. to attempt an offline attack. This feature splits the shared SEV-ES/SEV-SNP ASID space into separate ranges for SEV-ES and SEV-SNP guests, therefore a new module parameter is needed to control the number of ASIDs that can be used for VMs with CipherText Hiding vs. how many can be used to run SEV-ES guests - Add support for Secure TSC for SEV-SNP guests, which prevents the untrusted host from tampering with the guest's TSC frequency, while still allowing the the VMM to configure the guest's TSC frequency prior to launch - Validate the XCR0 provided by the guest (via the GHCB) to avoid bugs resulting from bogus XCR0 values - Save an SEV guest's policy if and only if LAUNCH_START fully succeeds to avoid leaving behind stale state (thankfully not consumed in KVM) - Explicitly reject non-positive effective lengths during SNP's LAUNCH_UPDATE instead of subtly relying on guest_memfd to deal with them - Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the host's desired TSC_AUX, to fix a bug where KVM was keeping a different vCPU's TSC_AUX in the host MSR until return to userspace KVM (Intel): - Preparation for FRED support - Don't retry in TDX's anti-zero-step mitigation if the target memslot is invalid, i.e. is being deleted or moved, to fix a deadlock scenario similar to the aforementioned prefaulting case - Misc bugfixes and minor cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits) KVM: x86: Export KVM-internal symbols for sub-modules only KVM: x86: Drop pointless exports of kvm_arch_xxx() hooks KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c KVM: Export KVM-internal symbols for sub-modules only KVM: s390/vfio-ap: Use kvm_is_gpa_in_memslot() instead of open coded equivalent KVM: VMX: Make CR4.CET a guest owned bit KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported KVM: selftests: Add coverage for KVM-defined registers in MSRs test KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test KVM: selftests: Extend MSRs test to validate vCPUs without supported features KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test KVM: selftests: Add an MSR test to exercise guest/host and read/write KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors KVM: x86: Define Control Protection Exception (#CP) vector KVM: x86: Add human friendly formatting for #XM, and #VE KVM: SVM: Enable shadow stack virtualization for SVM KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid KVM: SVM: Pass through shadow stack MSRs as appropriate KVM: SVM: Update dump_vmcb with shadow stack save area additions KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02 ...
2025-10-04Merge tag 'x86_tdx_for_6.18-rc1' of ↵Linus Torvalds4-34/+87
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 TDX updates from Dave Hansen: "The biggest change here is making TDX and kexec play nicely together. Before this, the memory encryption hardware (which doesn't respect cache coherency) could write back old cachelines on top of data in the new kernel, so kexec and TDX were made mutually exclusive. This removes the limitation. There is also some work to tighten up a hardware bug workaround and some MAINTAINERS updates. - Make TDX and kexec work together - Skip TDX bug workaround when the bug is not present - Update maintainers entries" * tag 'x86_tdx_for_6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/virt/tdx: Use precalculated TDVPR page physical address KVM/TDX: Explicitly do WBINVD when no more TDX SEAMCALLs x86/virt/tdx: Update the kexec section in the TDX documentation x86/virt/tdx: Remove the !KEXEC_CORE dependency x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL x86/sme: Use percpu boolean to control WBINVD during kexec x86/kexec: Consolidate relocate_kernel() function parameters x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present x86/tdx: Tidy reset_pamt functions x86/tdx: Eliminate duplicate code in tdx_clear_page() MAINTAINERS: Add KVM mail list to the TDX entry MAINTAINERS: Add Rick Edgecombe as a TDX reviewer MAINTAINERS: Update the file list in the TDX entry.
2025-10-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-14/+30
Pull kvm updates from Paolo Bonzini: "This excludes the bulk of the x86 changes, which I will send separately. They have two not complex but relatively unusual conflicts so I will wait for other dust to settle. guest_memfd: - Add support for host userspace mapping of guest_memfd-backed memory for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have no way to detect private vs. shared). This lays the groundwork for removal of guest memory from the kernel direct map, as well as for limited mmap() for guest_memfd-backed memory. For more information see: - commit a6ad54137af9 ("Merge branch 'guest-memfd-mmap' into HEAD") - guest_memfd in Firecracker: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding - direct map removal: https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/ - mmap support: https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/ ARM: - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature. LoongArch: - Detect page table walk feature on new hardware - Add sign extension with kernel MMIO/IOCSR emulation - Improve in-kernel IPI emulation - Improve in-kernel PCH-PIC emulation - Move kvm_iocsr tracepoint out of generic code RISC-V: - Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V - Added SBI v3.0 PMU enhancements in KVM and perf driver s390: - Improve interrupt cpu for wakeup, in particular the heuristic to decide which vCPU to deliver a floating interrupt to. - Clear the PTE when discarding a swapped page because of CMMA; this bug was introduced in 6.16 when refactoring gmap code. x86 selftests: - Add #DE coverage in the fastops test (the only exception that's guest- triggerable in fastop-emulated instructions). - Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra Forest (SRF) and Clearwater Forest (CWF). - Minor cleanups and improvements x86 (guest side): - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping could map devices as WB and prevent the device drivers from mapping their devices with UC/UC-. - Make kvm_async_pf_task_wake() a local static helper and remove its export. - Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU bindings even when PV_UNHALT is unsupported. Generic: - Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is now included in GFP_NOWAIT. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits) KVM: s390: Fix to clear PTE when discarding a swapped page KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs KVM: arm64: selftests: Cope with arch silliness in EL2 selftest KVM: arm64: selftests: Add basic test for running in VHE EL2 KVM: arm64: selftests: Enable EL2 by default KVM: arm64: selftests: Initialize HCR_EL2 KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2 KVM: arm64: selftests: Select SMCCC conduit based on current EL KVM: arm64: selftests: Provide helper for getting default vCPU target KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts KVM: arm64: selftests: Create a VGICv3 for 'default' VMs KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation KVM: arm64: selftests: Add helper to check for VGICv3 support KVM: arm64: selftests: Initialize VGICv3 only once KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code KVM: selftests: Add ex_str() to print human friendly name of exception vectors selftests/kvm: remove stale TODO in xapic_state_test KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount ...
2025-10-02Merge tag 'mm-nonmm-stable-2025-10-02-15-29' of ↵Linus Torvalds2-9/+63
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ida: Remove the ida_simple_xxx() API" from Christophe Jaillet completes the removal of this legacy IDR API - "panic: introduce panic status function family" from Jinchao Wang provides a number of cleanups to the panic code and its various helpers, which were rather ad-hoc and scattered all over the place - "tools/delaytop: implement real-time keyboard interaction support" from Fan Yu adds a few nice user-facing usability changes to the delaytop monitoring tool - "efi: Fix EFI boot with kexec handover (KHO)" from Evangelos Petrongonas fixes a panic which was happening with the combination of EFI and KHO - "Squashfs: performance improvement and a sanity check" from Phillip Lougher teaches squashfs's lseek() about SEEK_DATA/SEEK_HOLE. A mere 150x speedup was measured for a well-chosen microbenchmark - plus another 50-odd singleton patches all over the place * tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (75 commits) Squashfs: reject negative file sizes in squashfs_read_inode() kallsyms: use kmalloc_array() instead of kmalloc() MAINTAINERS: update Sibi Sankar's email address Squashfs: add SEEK_DATA/SEEK_HOLE support Squashfs: add additional inode sanity checking lib/genalloc: fix device leak in of_gen_pool_get() panic: remove CONFIG_PANIC_ON_OOPS_VALUE ocfs2: fix double free in user_cluster_connect() checkpatch: suppress strscpy warnings for userspace tools cramfs: fix incorrect physical page address calculation kernel: prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit Squashfs: fix uninit-value in squashfs_get_parent kho: only fill kimage if KHO is finalized ocfs2: avoid extra calls to strlen() after ocfs2_sprintf_system_inode_name() kernel/sys.c: fix the racy usage of task_lock(tsk->group_leader) in sys_prlimit64() paths sched/task.h: fix the wrong comment on task_lock() nesting with tasklist_lock coccinelle: platform_no_drv_owner: handle also built-in drivers coccinelle: of_table: handle SPI device ID tables lib/decompress: use designated initializers for struct compress_format efi: support booting with kexec handover (KHO) ...
2025-09-30Merge tag 'x86_apic_for_v6.18_rc1' of ↵Linus Torvalds8-63/+508
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV and apic updates from Borislav Petkov: - Add functionality to provide runtime firmware updates for the non-x86 parts of an AMD platform like the security processor (ASP) firmware, modules etc, for example. The intent being that these updates are interim, live fixups before a proper BIOS update can be attempted - Add guest support for AMD's Secure AVIC feature which gives encrypted guests the needed protection against a malicious hypervisor generating unexpected interrupts and injecting them into such guest, thus interfering with its operation in an unexpected and negative manner. The advantage of this scheme is that the guest determines which interrupts and when to accept them vs leaving that to the benevolence (or not) of the hypervisor - Strictly separate the startup code from the rest of the kernel where former is executed from the initial 1:1 mapping of memory. The problem was that the toolchain-generated version of the code was being executed from a different mapping of memory than what was "assumed" during code generation, needing an ever-growing pile of fixups for absolute memory references which are invalid in the early, 1:1 memory mapping during boot. The major advantage of this is that there's no need to check the 1:1 mapping portion of the code for absolute relocations anymore and get rid of the RIP_REL_REF() macro sprinkling all over the place. For more info, see Ard's very detailed writeup on this [1] - The usual cleanups and fixes Link: https://lore.kernel.org/r/CAMj1kXEzKEuePEiHB%2BHxvfQbFz0sTiHdn4B%2B%2BzVBJ2mhkPkQ4Q@mail.gmail.com [1] * tag 'x86_apic_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) x86/boot: Drop erroneous __init annotation from early_set_pages_state() crypto: ccp - Add AMD Seamless Firmware Servicing (SFS) driver crypto: ccp - Add new HV-Fixed page allocation/free API x86/sev: Add new dump_rmp parameter to snp_leak_pages() API x86/startup/sev: Document the CPUID flow in the boot #VC handler objtool: Ignore __pi___cfi_ prefixed symbols x86/sev: Zap snp_abort() x86/apic/savic: Do not use snp_abort() x86/boot: Get rid of the .head.text section x86/boot: Move startup code out of __head section efistub/x86: Remap inittext read-execute when needed x86/boot: Create a confined code area for startup code x86/kbuild: Incorporate boot/startup/ via Kbuild makefile x86/boot: Revert "Reject absolute references in .head.text" x86/boot: Check startup code for absence of absolute relocations objtool: Add action to check for absence of absolute relocations x86/sev: Export startup routines for later use x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object x86/sev: Provide PIC aliases for SEV related data objects x86/boot: Provide PIC aliases for 5-level paging related constants ...
2025-09-30Merge tag 'x86_cache_for_v6.18_rc1' of ↵Linus Torvalds4-75/+311
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "Add support on AMD for assigning QoS bandwidth counters to resources (RMIDs) with the ability for those resources to be tracked by the counters as long as they're assigned to them. Previously, due to hw limitations, bandwidth counts from untracked resources would get lost when those resources are not tracked. Refactor the code and user interfaces to be able to also support other, similar features on ARM, for example" * tag 'x86_cache_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) fs/resctrl: Fix counter auto-assignment on mkdir with mbm_event enabled MAINTAINERS: resctrl: Add myself as reviewer x86/resctrl: Configure mbm_event mode if supported fs/resctrl: Introduce the interface to switch between monitor modes fs/resctrl: Disable BMEC event configuration when mbm_event mode is enabled fs/resctrl: Introduce the interface to modify assignments in a group fs/resctrl: Introduce mbm_L3_assignments to list assignments in a group fs/resctrl: Auto assign counters on mkdir and clean up on group removal fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir fs/resctrl: Provide interface to update the event configurations fs/resctrl: Add event configuration directory under info/L3_MON/ fs/resctrl: Support counter read/reset with mbm_event assignment mode x86/resctrl: Implement resctrl_arch_reset_cntr() and resctrl_arch_cntr_read() x86/resctrl: Refactor resctrl_arch_rmid_read() fs/resctrl: Introduce counter ID read, reset calls in mbm_event mode fs/resctrl: Pass struct rdtgroup instead of individual members fs/resctrl: Add the functionality to unassign MBM events fs/resctrl: Add the functionality to assign MBM events x86,fs/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC fs/resctrl: Introduce event configuration field in struct mon_evt ...
2025-09-30Merge tag 'x86_cpu_for_v6.18_rc1' of ↵Linus Torvalds6-43/+116
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid updates from Borislav Petkov: - Make UMIP instruction detection more robust - Correct and cleanup AMD CPU topology detection; document the relevant CPUID leaves topology parsing precedence on AMD - Add support for running the kernel as guest on FreeBSD's Bhyve hypervisor - Cleanups and improvements * tag 'x86_cpu_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/umip: Fix decoding of register forms of 0F 01 (SGDT and SIDT aliases) x86/umip: Check that the instruction opcode is at least two bytes Documentation/x86/topology: Detail CPUID leaves used for topology enumeration x86/cpu/topology: Define AMD64_CPUID_EXT_FEAT MSR x86/cpu/topology: Check for X86_FEATURE_XTOPOLOGY instead of passing has_xtopology x86/cpu/cacheinfo: Simplify cacheinfo_amd_init_llc_id() using _cpuid4_info x86/cpu: Rename and move CPU model entry for Diamond Rapids x86/cpu: Detect FreeBSD Bhyve hypervisor
2025-09-30Merge tag 'x86_bugs_for_v6.18_rc1' of ↵Linus Torvalds1-240/+172
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mitigation updates from Borislav Petkov: - Add VMSCAPE to the attack vector controls infrastructure - A bunch of the usual cleanups and fixlets, some of them resulting from fuzzing the different mitigation options * tag 'x86_bugs_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Report correct retbleed mitigation status x86/bugs: Fix reporting of LFENCE retpoline x86/bugs: Fix spectre_v2 forcing x86/bugs: Remove uses of cpu_mitigations_off() x86/bugs: Simplify SSB cmdline parsing x86/bugs: Use early_param() for spectre_v2 x86/bugs: Use early_param() for spectre_v2_user x86/bugs: Add attack vector controls for VMSCAPE x86/its: Move ITS indirect branch thunks to .text..__x86.indirect_thunk
2025-09-30Merge tag 'ras_core_for_v6.18_rc1' of ↵Linus Torvalds5-273/+233
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Borislav Petkov: - Unify and refactor the MCA arch side and better separate code - Cleanup and simplify the AMD RAS side, unify code, drop unused stuff * tag 'ras_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Add a clear_bank() helper x86/mce: Move machine_check_poll() status checks to helper functions x86/mce: Separate global and per-CPU quirks x86/mce: Do 'UNKNOWN' vendor check early x86/mce: Define BSP-only SMCA init x86/mce: Define BSP-only init x86/mce: Set CR4.MCE last during init x86/mce: Remove __mcheck_cpu_init_early() x86/mce: Cleanup bank processing on init x86/mce/amd: Put list_head in threshold_bank x86/mce/amd: Remove smca_banks_map x86/mce/amd: Remove return value for mce_threshold_{create,remove}_device() x86/mce/amd: Rename threshold restart function
2025-09-30Merge tag 'x86_microcode_for_v6.18_rc1' of ↵Linus Torvalds4-65/+150
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading updates from Borislav Petkov: - Add infrastructure to be able to debug the microcode loader in a guest - Refresh Intel old microcode revisions * tag 'x86_microcode_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Add microcode loader debugging functionality x86/microcode: Add microcode= cmdline parsing x86/microcode/intel: Refresh the revisions that determine old_microcode
2025-09-30Merge tag 'perf-core-2025-09-26' of ↵Linus Torvalds2-22/+653
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: "Core perf code updates: - Convert mmap() related reference counts to refcount_t. This is in reaction to the recently fixed refcount bugs, which could have been detected earlier and could have mitigated the bug somewhat (Thomas Gleixner, Peter Zijlstra) - Clean up and simplify the callchain code, in preparation for sframes (Steven Rostedt, Josh Poimboeuf) Uprobes updates: - Add support to optimize usdt probes on x86-64, which gives a substantial speedup (Jiri Olsa) - Cleanups and fixes on x86 (Peter Zijlstra) PMU driver updates: - Various optimizations and fixes to the Intel PMU driver (Dapeng Mi) Misc cleanups and fixes: - Remove redundant __GFP_NOWARN (Qianfeng Rong)" * tag 'perf-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) selftests/bpf: Fix uprobe_sigill test for uprobe syscall error value uprobes/x86: Return error from uprobe syscall when not called from trampoline perf: Skip user unwind if the task is a kernel thread perf: Simplify get_perf_callchain() user logic perf: Use current->flags & PF_KTHREAD|PF_USER_WORKER instead of current->mm == NULL perf: Have get_perf_callchain() return NULL if crosstask and user are set perf: Remove get_perf_callchain() init_nr argument perf/x86: Print PMU counters bitmap in x86_pmu_show_pmu_cap() perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASK perf/x86/intel: Change macro GLOBAL_CTRL_EN_PERF_METRICS to BIT_ULL(48) perf/x86: Add PERF_CAP_PEBS_TIMING_INFO flag perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error perf/x86/intel: Use early_initcall() to hook bts_init() uprobes: Remove redundant __GFP_NOWARN selftests/seccomp: validate uprobe syscall passes through seccomp seccomp: passthrough uprobe systemcall without filtering selftests/bpf: Fix uprobe syscall shadow stack test selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe selftests/bpf: Add uprobe_regs_equal test selftests/bpf: Add optimized usdt variant for basic usdt test ...
2025-09-30Merge tag 'kvm-x86-misc-6.18' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-0/+1
KVM x86 changes for 6.18 - Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's intercepts and trigger a variety of WARNs in KVM. - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is supposed to exist for v2 PMUs. - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs. - Clean up KVM's vector hashing code for delivering lowest priority IRQs. - Clean up the fastpath handler code to only handle IPIs and WRMSRs that are actually "fast", as opposed to handling those that KVM _hopes_ are fast, and in the process of doing so add fastpath support for TSC_DEADLINE writes on AMD CPUs. - Clean up a pile of PMU code in anticipation of adding support for mediated vPMUs. - Add support for the immediate forms of RDMSR and WRMSRNS, sans full emulator support (KVM should never need to emulate the MSRs outside of forced emulation and other contrived testing scenarios). - Clean up the MSR APIs in preparation for CET and FRED virtualization, as well as mediated vPMU support. - Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs, as KVM can't faithfully emulate an I/O APIC for such guests. - KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation for mediated vPMU support, as KVM will need to recalculate MSR intercepts in response to PMU refreshes for guests with mediated vPMUs. - Misc cleanups and minor fixes.
2025-09-30Merge tag 'sched-core-2025-09-26' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core scheduler changes: - Make migrate_{en,dis}able() inline, to improve performance (Menglong Dong) - Move STDL_INIT() functions out-of-line (Peter Zijlstra) - Unify the SCHED_{SMT,CLUSTER,MC} Kconfig (Peter Zijlstra) Fair scheduling: - Defer throttling to when tasks exit to user-space, to reduce the chance & impact of throttle-preemption with held locks and other resources (Aaron Lu, Valentin Schneider) - Get rid of sched_domains_curr_level hack for tl->cpumask(), as the warning was getting triggered on certain topologies (Peter Zijlstra) Misc cleanups & fixes: - Header cleanups (Menglong Dong) - Fix race in push_dl_task() (Harshit Agarwal)" * tag 'sched-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix some typos in include/linux/preempt.h sched: Make migrate_{en,dis}able() inline rcu: Replace preempt.h with sched.h in include/linux/rcupdate.h arch: Add the macro COMPILE_OFFSETS to all the asm-offsets.c sched/fair: Do not balance task to a throttled cfs_rq sched/fair: Do not special case tasks in throttled hierarchy sched/fair: update_cfs_group() for throttled cfs_rqs sched/fair: Propagate load for throttled cfs_rq sched/fair: Get rid of throttled_lb_pair() sched/fair: Task based throttle time accounting sched/fair: Switch to task based throttle model sched/fair: Implement throttle task work and related helpers sched/fair: Add related data structure for task based throttle sched: Unify the SCHED_{SMT,CLUSTER,MC} Kconfig sched: Move STDL_INIT() functions out-of-line sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() sched/deadline: Fix race in push_dl_task()
2025-09-30Merge tag 'kvm-x86-guest-6.18' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-14/+30
x86/kvm guest side changes for 6.18 - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping could map devices as WB and prevent the device drivers from mapping their devices with UC/UC-. - Make kvm_async_pf_task_wake() a local static helper and remove its export. - Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU bindings even when PV_UNHALT is unsupported.
2025-09-29Merge tag 'hardening-v6.18-rc1' of ↵Linus Torvalds3-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "One notable addition is the creation of the 'transitional' keyword for kconfig so CONFIG renaming can go more smoothly. This has been a long-standing deficiency, and with the renaming of CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI support), this came up again. The breadth of the diffstat is mainly this renaming. - Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva) - lkdtm: fortify: Fix potential NULL dereference on kmalloc failure (Junjie Cao) - Add str_assert_deassert() helper (Lad Prabhakar) - gcc-plugins: Remove TODO_verify_il for GCC >= 16 - kconfig: Fix BrokenPipeError warnings in selftests - kconfig: Add transitional symbol attribute for migration support - kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI" * tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lib/string_choices: Add str_assert_deassert() helper kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI kconfig: Add transitional symbol attribute for migration support kconfig: Fix BrokenPipeError warnings in selftests gcc-plugins: Remove TODO_verify_il for GCC >= 16 stddef: Introduce __TRAILING_OVERLAP() stddef: Remove token-pasting in TRAILING_OVERLAP() lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
2025-09-29Merge tag 'kernel-6.18-rc1.clone3' of ↵Linus Torvalds3-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull copy_process updates from Christian Brauner: "This contains the changes to enable support for clone3() on nios2 which apparently is still a thing. The more exciting part of this is that it cleans up the inconsistency in how the 64-bit flag argument is passed from copy_process() into the various other copy_*() helpers" [ Fixed up rv ltl_monitor 32-bit support as per Sasha Levin in the merge ] * tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nios2: implement architecture-specific portion of sys_clone3 arch: copy_thread: pass clone_flags as u64 copy_process: pass clone_flags as u64 across calltree copy_sighand: Handle architectures where sizeof(unsigned long) < sizeof(u64)
2025-09-24kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFIKees Cook3-4/+4
The kernel's CFI implementation uses the KCFI ABI specifically, and is not strictly tied to a particular compiler. In preparation for GCC supporting KCFI, rename CONFIG_CFI_CLANG to CONFIG_CFI (along with associated options). Use new "transitional" Kconfig option for old CONFIG_CFI_CLANG that will enable CONFIG_CFI during olddefconfig. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20250923213422.1105654-3-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-09-22x86/topology: Implement topology_is_core_online() to address SMT regressionThomas Gleixner1-0/+13
Christian reported that commit a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization") broke the use case in which both 'nosmt' and 'maxcpus' are on the kernel command line because it onlines primary threads, which were offline due to the maxcpus limit. The initially proposed fix to skip primary threads in the loop is inconsistent. While it prevents the primary thread to be onlined, it then onlines the corresponding hyperthread(s), which does not really make sense. The CPU iterator in cpuhp_smt_enable() contains a check which excludes all threads of a core, when the primary thread is offline. The default implementation is a NOOP and therefore not effective on x86. Implement topology_is_core_online() on x86 to address this issue. This makes the behaviour consistent between x86 and PowerPC. Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization") Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization") Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/ Reported-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Tested-by: Christian Loehle <christian.loehle@arm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/12740505.O9o76ZdvQC@rafael.j.wysocki
2025-09-22x86/acpi/cstate: Remove open coded check for cpu_feature_enabled()Mario Limonciello (AMD)1-1/+1
acpi_processor_power_init_bm_check() has an open coded implementation of cpu_feature_enabled() to detect X86_FEATURE_ZEN. Switch to using cpu_feature_enabled(). Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-19x86/umip: Fix decoding of register forms of 0F 01 (SGDT and SIDT aliases)Sean Christopherson1-0/+11
Filter out the register forms of 0F 01 when determining whether or not to emulate in response to a potential UMIP violation #GP, as SGDT and SIDT only accept memory operands. The register variants of 0F 01 are used to encode instructions for things like VMX and SGX, i.e. not checking the Mod field would cause the kernel to incorrectly emulate on #GP, e.g. due to a CPL violation on VMLAUNCH. Fixes: 1e5db223696a ("x86/umip: Add emulation code for UMIP instructions") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org
2025-09-19x86/umip: Check that the instruction opcode is at least two bytesSean Christopherson1-2/+2
When checking for a potential UMIP violation on #GP, verify the decoder found at least two opcode bytes to avoid false positives when the kernel encounters an unknown instruction that starts with 0f. Because the array of opcode.bytes is zero-initialized by insn_init(), peeking at bytes[1] will misinterpret garbage as a potential SLDT or STR instruction, and can incorrectly trigger emulation. E.g. if a VPALIGNR instruction 62 83 c5 05 0f 08 ff vpalignr xmm17{k5},xmm23,XMMWORD PTR [r8],0xff hits a #GP, the kernel emulates it as STR and squashes the #GP (and corrupts the userspace code stream). Arguably the check should look for exactly two bytes, but no three byte opcodes use '0f 00 xx' or '0f 01 xx' as an escape, i.e. it should be impossible to get a false positive if the first two opcode bytes match '0f 00' or '0f 01'. Go with a more conservative check with respect to the existing code to minimize the chances of breaking userspace, e.g. due to decoder weirdness. Analyzed by Nick Bray <ncbray@google.com>. Fixes: 1e5db223696a ("x86/umip: Add emulation code for UMIP instructions") Reported-by: Dan Snyder <dansnyder@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org
2025-09-17x86/cpu/topology: Define AMD64_CPUID_EXT_FEAT MSRK Prateek Nayak1-3/+4
Add defines for the 0xc001_1005 MSR (Core::X86::Msr::CPUID_ExtFeatures) used to toggle the extended CPUID features, instead of using literal numbers. Also define and use the bits necessary for an old TOPOEXT fixup on AMD Family 0x15 processors. No functional changes intended. [ bp: Massage, rename MSR to adhere to the documentation name. ] Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/20250901170418.4314-1-kprateek.nayak@amd.com
2025-09-17x86/cpu/topology: Check for X86_FEATURE_XTOPOLOGY instead of passing ↵K Prateek Nayak1-11/+8
has_xtopology cpu_parse_topology_ext() sets X86_FEATURE_XTOPOLOGY before returning true if any of the XTOPOLOGY leaf (0x80000026 / 0xb) could be parsed successfully. Instead of storing and passing around this return value using "has_xtopology" in parse_topology_amd(), check for X86_FEATURE_XTOPOLOGY directly in parse_8000_001e() to simplify the flow. No functional changes intended. Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/20250901170418.4314-1-kprateek.nayak@amd.com
2025-09-17x86/cpu/cacheinfo: Simplify cacheinfo_amd_init_llc_id() using _cpuid4_infoK Prateek Nayak1-27/+21
struct _cpuid4_info has the same layout as the CPUID leaf 0x8000001d. Use the encoded definition and amd_fill_cpuid4_info(), get_cache_id() helpers instead of open coding masks and shifts to calculate the llc_id. cacheinfo_amd_init_llc_id() is only called on AMD systems that support X86_FEATURE_TOPOEXT and amd_fill_cpuid4_info() uses the information from CPUID leaf 0x8000001d on all these systems which is consistent with the current open coded implementation. While at it, avoid reading cpu_data() every time get_cache_id() is called and instead pass the APIC ID necessary to return the _cpuid4_info.id from get_cache_id(). No functional changes intended. [ bp: do what Ahmed suggests: merge into one patch, make id4 ptr const. ] Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Ahmed S. Darwish <darwi@linutronix.de> Link: https://lore.kernel.org/20250821051910.7351-2-kprateek.nayak@amd.com
2025-09-16x86/bugs: Report correct retbleed mitigation statusDavid Kaplan1-1/+3
On Intel CPUs, the default retbleed mitigation is IBRS/eIBRS but this requires that a similar spectre_v2 mitigation is applied. If the user selects a different spectre_v2 mitigation (like spectre_v2=retpoline) a warning is printed but sysfs will still report 'Mitigation: IBRS' or 'Mitigation: Enhanced IBRS'. This is incorrect because retbleed is not mitigated, and IBRS is not actually set. Fix this by choosing RETBLEED_MITIGATION_NONE in this scenario so the kernel correctly reports the system as vulnerable to retbleed. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250915134706.3201818-1-david.kaplan@amd.com
2025-09-16x86/bugs: Fix reporting of LFENCE retpolineDavid Kaplan1-4/+1
The LFENCE retpoline mitigation is not secure but the kernel prints inconsistent messages about this fact. The dmesg log says 'Mitigation: LFENCE', implying the system is mitigated. But sysfs reports 'Vulnerable: LFENCE' implying the system (correctly) is not mitigated. Fix this by printing a consistent 'Vulnerable: LFENCE' string everywhere when this mitigation is selected. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250915134706.3201818-1-david.kaplan@amd.com
2025-09-16x86/bugs: Fix spectre_v2 forcingDavid Kaplan1-18/+18
There were two oddities with spectre_v2 command line options. First, any option other than 'off' or 'auto' would force spectre_v2 mitigations even if the CPU (hypothetically) wasn't vulnerable to spectre_v2. That was inconsistent with all the other bugs where mitigations are ignored unless an explicit 'force' option is specified. Second, even though spectre_v2 mitigations would be enabled in these cases, the X86_BUG_SPECTRE_V2 bit wasn't set. This is again inconsistent with the forcing behavior of other bugs and arguably incorrect as it doesn't make sense to enable a mitigation if the X86_BUG bit isn't set. Fix both issues by only forcing spectre_v2 mitigations when the 'spectre_v2=on' option is specified (which was already called SPECTRE_V2_CMD_FORCE) and setting the relevant X86_BUG_* bits in that case. This also allows for simplifying bhi_update_mitigation() because spectre_v2_cmd will now always be SPECTRE_V2_CMD_NONE if the CPU is immune to spectre_v2. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250915134706.3201818-1-david.kaplan@amd.com
2025-09-16Merge branch 'x86/urgent' into x86/apic, to resolve conflictIngo Molnar5-136/+316
Conflicts: arch/x86/include/asm/sev.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-09-15x86/bugs: Remove uses of cpu_mitigations_off()David Kaplan1-5/+5
cpu_mitigations_off() is no longer needed because all bugs use attack vector controls to select a mitigation, and cpu_mitigations_off() is equivalent to no attack vectors being selected. Remove the few remaining unnecessary uses of this function in this file. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
2025-09-15x86/bugs: Simplify SSB cmdline parsingDavid Kaplan1-80/+40
Simplify the SSB command line parsing by selecting a mitigation directly, as is done in most of the simpler vulnerabilities. Use early_param() instead of cmdline_find_option() for consistency with the other mitigation selections. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Link: https://lore.kernel.org/r/20250819192200.2003074-4-david.kaplan@amd.com