aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-09-11KVM: nSVM: Replace kzalloc() + copy_from_user() with memdup_user()Thorsten Blum1-11/+9
Replace kzalloc() followed by copy_from_user() with memdup_user() to improve and simplify svm_set_nested_state(). Return early if an error occurs instead of trying to allocate memory for 'save' when memory allocation for 'ctl' already failed. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20250903002951.118912-1-thorsten.blum@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-10Merge tag 'vmscape-for-linus-20250904' of ↵Linus Torvalds1-0/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull vmescape mitigation fixes from Dave Hansen: "Mitigate vmscape issue with indirect branch predictor flushes. vmscape is a vulnerability that essentially takes Spectre-v2 and attacks host userspace from a guest. It particularly affects hypervisors like QEMU. Even if a hypervisor may not have any sensitive data like disk encryption keys, guest-userspace may be able to attack the guest-kernel using the hypervisor as a confused deputy. There are many ways to mitigate vmscape using the existing Spectre-v2 defenses like IBRS variants or the IBPB flushes. This series focuses solely on IBPB because it works universally across vendors and all vulnerable processors. Further work doing vendor and model-specific optimizations can build on top of this if needed / wanted. Do the normal issue mitigation dance: - Add the CPU bug boilerplate - Add a list of vulnerable CPUs - Use IBPB to flush the branch predictors after running guests" * tag 'vmscape-for-linus-20250904' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vmscape: Add old Intel CPUs to affected list x86/vmscape: Warn when STIBP is disabled with SMT x86/bugs: Move cpu_bugs_smt_update() down x86/vmscape: Enable the mitigation x86/vmscape: Add conditional IBPB mitigation x86/vmscape: Enumerate VMSCAPE bug Documentation/hw-vuln: Add VMSCAPE documentation
2025-09-10KVM: TDX: Do not retry locally when the retry is caused by invalid memslotSean Christopherson1-0/+11
Avoid local retries within the TDX EPT violation handler if a retry is triggered by faulting in an invalid memslot, indicating that the memslot is undergoing a removal process. Faulting in a GPA from an invalid memslot will never succeed, and holding SRCU prevents memslot deletion from succeeding, i.e. retrying when the memslot is being actively deleted will lead to (breakable) deadlock. Opportunistically export kvm_vcpu_gfn_to_memslot() to allow for a per-vCPU lookup (which, strictly speaking, is unnecessary since TDX doesn't support SMM, but aligns the TDX code with the MMU code). Fixes: b0327bb2e7e0 ("KVM: TDX: Retry locally in TDX EPT violation handler on RET_PF_RETRY") Reported-by: Reinette Chatre <reinette.chatre@intel.com> Closes: https://lore.kernel.org/all/20250519023737.30360-1-yan.y.zhao@intel.com [Yan: Wrote patch log, comment, fixed a minor error, function export] Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/r/20250822070523.26495-1-yan.y.zhao@intel.com [sean: massage changelog, relocate and tweak comment] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-10KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefaultSean Christopherson1-2/+8
Return -EAGAIN if userspace attempts to delete or move a memslot while also prefaulting memory for that same memslot, i.e. force userspace to retry instead of trying to handle the scenario entirely within KVM. Unlike KVM_RUN, which needs to handle the scenario entirely within KVM because userspace has come to depend on such behavior, KVM_PRE_FAULT_MEMORY can return -EAGAIN without breaking userspace as this scenario can't have ever worked (and there's no sane use case for prefaulting to a memslot that's being deleted/moved). And also unlike KVM_RUN, the prefault path doesn't naturally guarantee forward progress. E.g. to handle such a scenario, KVM would need to drop and reacquire SRCU to break the deadlock between the memslot update (synchronizes SRCU) and the prefault (waits for the memslot update to complete). However, dropping SRCU creates more problems, as completing the memslot update will bump the memslot generation, which in turn will invalidate the MMU root. To handle that, prefaulting would need to handle pending KVM_REQ_MMU_FREE_OBSOLETE_ROOTS requests and do kvm_mmu_reload() prior to mapping each individual. I.e. to fully handle this scenario, prefaulting would eventually need to look a lot like vcpu_enter_guest(). Given that there's no reasonable use case and practically zero risk of breaking userspace, punt the problem to userspace and avoid adding unnecessary complexity to the prefault path. Note, TDX's guest_memfd post-populate path is unaffected as slots_lock is held for the entire duration of populate(), i.e. any memslot modifications will be fully serialized against TDX's flavor of prefaulting. Reported-by: Reinette Chatre <reinette.chatre@intel.com> Closes: https://lore.kernel.org/all/20250519023737.30360-1-yan.y.zhao@intel.com Debugged-by: Yan Zhao <yan.y.zhao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250822070347.26451-1-yan.y.zhao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-10KVM: x86: Move vector_hashing into lapic.cSean Christopherson3-11/+6
Move the vector_hashing module param into lapic.c now that all usage is contained within the local APIC emulation code. Opportunistically drop the accessor and append "_enabled" to the variable to help capture that it's a boolean module param. No functional change intended. Link: https://lore.kernel.org/r/20250821214209.3463350-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-10KVM: x86: Make "lowest priority" helpers local to lapic.cSean Christopherson2-16/+12
Make various helpers for resolving lowest priority IRQs local to lapic.c now that kvm_irq_delivery_to_apic() lives in lapic.c as well. No functional change intended. Link: https://lore.kernel.org/r/20250821214209.3463350-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-10KVM: x86: Move kvm_irq_delivery_to_apic() from irq.c to lapic.cSean Christopherson4-61/+60
Move kvm_irq_delivery_to_apic() to lapic.c as it is specific to local APIC emulation. This will allow burying more local APIC code in lapic.c, e.g. the various "lowest priority" helpers. No functional change intended. Link: https://lore.kernel.org/r/20250821214209.3463350-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-10KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is activeMaciej S. Szmigiero1-2/+1
Commit 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC") inhibited pre-VMRUN sync of TPR from LAPIC into VMCB::V_TPR in sync_lapic_to_cr8() when AVIC is active. AVIC does automatically sync between these two fields, however it does so only on explicit guest writes to one of these fields, not on a bare VMRUN. This meant that when AVIC is enabled host changes to TPR in the LAPIC state might not get automatically copied into the V_TPR field of VMCB. This is especially true when it is the userspace setting LAPIC state via KVM_SET_LAPIC ioctl() since userspace does not have access to the guest VMCB. Practice shows that it is the V_TPR that is actually used by the AVIC to decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI), so any leftover value in V_TPR will cause serious interrupt delivery issues in the guest when AVIC is enabled. Fix this issue by doing pre-VMRUN TPR sync from LAPIC into VMCB::V_TPR even when AVIC is enabled. Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC") Cc: stable@vger.kernel.org Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/c231be64280b1461e854e1ce3595d70cde3a2e9d.1756139678.git.maciej.szmigiero@oracle.com [sean: tag for stable@] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-08KVM: SEV: Save the SEV policy if and only if LAUNCH_START succeedsSean Christopherson1-4/+2
Wait until LAUNCH_START fully succeeds to set a VM's SEV/SNP policy so that KVM doesn't keep a potentially stale policy. In practice, the issue is benign as the policy is only used to detect if the VMSA can be decrypted, and the VMSA only needs to be decrypted if LAUNCH_UPDATE and thus LAUNCH_START succeeded. Fixes: 962e2b6152ef ("KVM: SVM: Decrypt SEV VMSA in dump_vmcb() if debugging is enabled") Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Kim Phillips <kim.phillips@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250821213841.3462339-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-05KVM/TDX: Explicitly do WBINVD when no more TDX SEAMCALLsKai Huang1-0/+10
On TDX platforms, during kexec, the kernel needs to make sure there are no dirty cachelines of TDX private memory before booting to the new kernel to avoid silent memory corruption to the new kernel. To do this, the kernel has a percpu boolean to indicate whether the cache of a CPU may be in incoherent state. During kexec, namely in stop_this_cpu(), the kernel does WBINVD if that percpu boolean is true. TDX turns on that percpu boolean on a CPU when the kernel does SEAMCALL, Thus making sure the cache will be flushed during kexec. However, kexec has a race condition that, while remaining extremely rare, would be more likely in the presence of a relatively long operation such as WBINVD. In particular, the kexec-ing CPU invokes native_stop_other_cpus() to stop all remote CPUs before booting to the new kernel. native_stop_other_cpus() then sends a REBOOT vector IPI to remote CPUs and waits for them to stop; if that times out, it also sends NMIs to the still-alive CPUs and waits again for them to stop. If the race happens, kexec proceeds before all CPUs have processed the NMI and stopped[1], and the system hangs. But after tdx_disable_virtualization_cpu(), no more TDX activity can happen on this cpu. When kexec is enabled, flush the cache explicitly at that point; this moves the WBINVD to an earlier stage than stop_this_cpus(), avoiding a possibly lengthy operation at a time where it could cause this race. [1] https://lore.kernel.org/kvm/b963fcd60abe26c7ec5dc20b42f1a2ebbcc72397.1750934177.git.kai.huang@intel.com/ [Make the new function a stub for !CONFIG_KEXEC_CORE. - Paolo] Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Farrah Chen <farrah.chen@intel.com> Link: https://lore.kernel.org/all/20250901160930.1785244-8-pbonzini%40redhat.com
2025-08-27KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memoryAckerley Tng1-4/+4
Update the KVM MMU fault handler to service guest page faults for memory slots backed by guest_memfd with mmap support. For such slots, the MMU must always fault in pages directly from guest_memfd, bypassing the host's userspace_addr. This ensures that guest_memfd-backed memory is always handled through the guest_memfd specific faulting path, regardless of whether it's for private or non-private (shared) use cases. Additionally, rename kvm_mmu_faultin_pfn_private() to kvm_mmu_faultin_pfn_gmem(), as this function is now used to fault in pages from guest_memfd for both private and non-private memory, accommodating the new use cases. Co-developed-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Ackerley Tng <ackerleytng@google.com> Co-developed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> [sean: drop the helper] Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20250729225455.670324-17-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-27KVM: x86/mmu: Extend guest_memfd's max mapping level to shared mappingsSean Christopherson6-12/+18
Rework kvm_mmu_max_mapping_level() to consult guest_memfd for all mappings, not just private mappings, so that hugepage support plays nice with the upcoming support for backing non-private memory with guest_memfd. In addition to getting the max order from guest_memfd for gmem-only memslots, update TDX's hook to effectively ignore shared mappings, as TDX's restrictions on page size only apply to Secure EPT mappings. Do nothing for SNP, as RMP restrictions apply to both private and shared memory. Suggested-by: Ackerley Tng <ackerleytng@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20250729225455.670324-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-27KVM: x86/mmu: Enforce guest_memfd's max order when recovering hugepagesSean Christopherson3-35/+47
Rework kvm_mmu_max_mapping_level() to provide the plumbing to consult guest_memfd (and relevant vendor code) when recovering hugepages, e.g. after disabling live migration. The flaw has existed since guest_memfd was originally added, but has gone unnoticed due to lack of guest_memfd support for hugepages or dirty logging. Don't actually call into guest_memfd at this time, as it's unclear as to what the API should be. Ideally, KVM would simply use kvm_gmem_get_pfn(), but invoking kvm_gmem_get_pfn() would lead to sleeping in atomic context if guest_memfd needed to allocate memory (mmu_lock is held). Luckily, the path isn't actually reachable, so just add a TODO and WARN to ensure the functionality is added alongisde guest_memfd hugepage support, and punt the guest_memfd API design question to the future. Note, calling kvm_mem_is_private() in the non-fault path is safe, so long as mmu_lock is held, as hugepage recovery operates on shadow-present SPTEs, i.e. calling kvm_mmu_max_mapping_level() with @fault=NULL is mutually exclusive with kvm_vm_set_mem_attributes() changing the PRIVATE attribute of the gfn. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Message-ID: <20250729225455.670324-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-27KVM: x86/mmu: Hoist guest_memfd max level/order helpers "up" in mmu.cSean Christopherson1-36/+36
Move kvm_max_level_for_order() and kvm_max_private_mapping_level() up in mmu.c so that they can be used by __kvm_mmu_max_mapping_level(). Opportunistically drop the "inline" from kvm_max_level_for_order(). No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Message-ID: <20250729225455.670324-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-27KVM: x86/mmu: Rename .private_max_mapping_level() to .gmem_max_mapping_level()Ackerley Tng7-10/+10
Rename kvm_x86_ops.private_max_mapping_level() to .gmem_max_mapping_level() in anticipation of extending guest_memfd support to non-private memory. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Ackerley Tng <ackerleytng@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Message-ID: <20250729225455.670324-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-27KVM: guest_memfd: Add plumbing to host to map guest_memfd pagesFuad Tabba1-0/+11
Introduce the core infrastructure to enable host userspace to mmap() guest_memfd-backed memory. This is needed for several evolving KVM use cases: * Non-CoCo VM backing: Allows VMMs like Firecracker to run guests entirely backed by guest_memfd, even for non-CoCo VMs [1]. This provides a unified memory management model and simplifies guest memory handling. * Direct map removal for enhanced security: This is an important step for direct map removal of guest memory [2]. By allowing host userspace to fault in guest_memfd pages directly, we can avoid maintaining host kernel direct maps of guest memory. This provides additional hardening against Spectre-like transient execution attacks by removing a potential attack surface within the kernel. * Future guest_memfd features: This also lays the groundwork for future enhancements to guest_memfd, such as supporting huge pages and enabling in-place sharing of guest memory with the host for CoCo platforms that permit it [3]. Enable the basic mmap and fault handling logic within guest_memfd, but hold off on allow userspace to actually do mmap() until the architecture support is also in place. [1] https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding [2] https://lore.kernel.org/linux-mm/cc1bb8e9bc3e1ab637700a4d3defeec95b55060a.camel@amazon.com [3] https://lore.kernel.org/all/c1c9591d-218a-495c-957b-ba356c8f8e09@redhat.com/T/#u Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shivank Garg <shivankg@amd.com> Acked-by: David Hildenbrand <david@redhat.com> Co-developed-by: Ackerley Tng <ackerleytng@google.com> Signed-off-by: Ackerley Tng <ackerleytng@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250729225455.670324-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-27KVM: x86: Enable KVM_GUEST_MEMFD for all 64-bit buildsFuad Tabba1-8/+4
Enable KVM_GUEST_MEMFD for all KVM x86 64-bit builds, i.e. for "default" VM types when running on 64-bit KVM. This will allow using guest_memfd to back non-private memory for all VM shapes, by supporting mmap() on guest_memfd. Opportunistically clean up various conditionals that become tautologies once x86 selects KVM_GUEST_MEMFD more broadly. Specifically, because SW protected VMs, SEV, and TDX are all 64-bit only, private memory no longer needs to take explicit dependencies on KVM_GUEST_MEMFD, because it is effectively a prerequisite. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250729225455.670324-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-27KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()Fuad Tabba2-4/+4
Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() to improve clarity and accurately reflect its purpose. The function kvm_slot_can_be_private() was previously used to check if a given kvm_memory_slot is backed by guest_memfd. However, its name implied that the memory in such a slot was exclusively "private". As guest_memfd support expands to include non-private memory (e.g., shared host mappings), it's important to remove this association. The new name, kvm_slot_has_gmem(), states that the slot is backed by guest_memfd without making assumptions about the memory's privacy attributes. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shivank Garg <shivankg@amd.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Co-developed-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250729225455.670324-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-27KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_HAVE_KVM_ARCH_GMEM_POPULATEFuad Tabba1-4/+10
The original name was vague regarding its functionality. This Kconfig option specifically enables and gates the kvm_gmem_populate() function, which is responsible for populating a GPA range with guest data. The new name, HAVE_KVM_ARCH_GMEM_POPULATE, describes the purpose of the option: to enable arch-specific guest_memfd population mechanisms. It also follows the same pattern as the other HAVE_KVM_ARCH_* configuration options. This improves clarity for developers and ensures the name accurately reflects the functionality it controls, especially as guest_memfd support expands beyond purely "private" memory scenarios. Temporarily keep KVM_GENERIC_PRIVATE_MEM as an x86-only config so as to minimize churn, and to hopefully make it easier to see what features require HAVE_KVM_ARCH_GMEM_POPULATE. On that note, omit GMEM_POPULATE for KVM_X86_SW_PROTECTED_VM, as regular ol' memset() suffices for software-protected VMs. As for KVM_GENERIC_PRIVATE_MEM, a future change will select KVM_GUEST_MEMFD for all 64-bit KVM builds, at which point the intermediate config will become obsolete and can/will be dropped. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shivank Garg <shivankg@amd.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Co-developed-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250729225455.670324-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-27KVM: x86: Select TDX's KVM_GENERIC_xxx dependencies iff CONFIG_KVM_INTEL_TDX=ySean Christopherson1-2/+2
Select KVM_GENERIC_PRIVATE_MEM and KVM_GENERIC_MEMORY_ATTRIBUTES directly from KVM_INTEL_TDX, i.e. if and only if TDX support is fully enabled in KVM. There is no need to enable KVM's private memory support just because the core kernel's INTEL_TDX_HOST is enabled. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Message-ID: <20250729225455.670324-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-27KVM: x86: Select KVM_GENERIC_PRIVATE_MEM directly from KVM_SW_PROTECTED_VMSean Christopherson1-1/+1
Now that KVM_SW_PROTECTED_VM doesn't have a hidden dependency on KVM_X86, select KVM_GENERIC_PRIVATE_MEM from within KVM_SW_PROTECTED_VM instead of conditionally selecting it from KVM_X86. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Message-ID: <20250729225455.670324-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-27KVM: x86: Have all vendor neutral sub-configs depend on KVM_X86, not just KVMSean Christopherson1-8/+8
Make all vendor neutral KVM x86 configs depend on KVM_X86, not just KVM, i.e. gate them on at least one vendor module being enabled and thus on kvm.ko actually being built. Depending on just KVM allows the user to select the configs even though they won't actually take effect, and more importantly, makes it all too easy to create unmet dependencies. E.g. KVM_GENERIC_PRIVATE_MEM can't be selected by KVM_SW_PROTECTED_VM, because the KVM_GENERIC_MMU_NOTIFIER dependency is select by KVM_X86. Hiding all sub-configs when neither KVM_AMD nor KVM_INTEL is selected also helps communicate to the user that nothing "interesting" is going on, e.g. --- Virtualization <M> Kernel-based Virtual Machine (KVM) support < > KVM for Intel (and compatible) processors support < > KVM for AMD processors support Fixes: ea4290d77bda ("KVM: x86: leave kvm.ko out of the build if no vendor module is requested") Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Message-ID: <20250729225455.670324-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-22x86/tdx: Eliminate duplicate code in tdx_clear_page()Adrian Hunter1-22/+3
tdx_clear_page() and reset_tdx_pages() duplicate the TDX page clearing logic. Rename reset_tdx_pages() to tdx_quirk_reset_paddr() and create tdx_quirk_reset_page() to call tdx_quirk_reset_paddr() and be used in place of tdx_clear_page(). The new name reflects that, in fact, the clearing is necessary only for hardware with a certain quirk. That is dealt with in a subsequent patch but doing the rename here avoids additional churn. Note reset_tdx_pages() is slightly different from tdx_clear_page() because, more appropriately, it uses mb() in place of __mb(). Except when extra debugging is enabled (kcsan at present), mb() just calls __mb(). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kas@kernel.org> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Acked-by: Sean Christopherson <seanjc@google.com> Acked-by: Vishal Annapurve <vannapurve@google.com> Link: https://lore.kernel.org/all/20250819155811.136099-2-adrian.hunter%40intel.com
2025-08-21perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASKDapeng Mi1-1/+1
ICL_FIXED_0_ADAPTIVE is missed to be added into INTEL_FIXED_BITS_MASK, add it. With help of this new INTEL_FIXED_BITS_MASK, intel_pmu_enable_fixed() can be optimized. The old fixed counter control bits can be unconditionally cleared with INTEL_FIXED_BITS_MASK and then set new control bits base on new configuration. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Link: https://lore.kernel.org/r/20250820023032.17128-7-dapeng1.mi@linux.intel.com
2025-08-21KVM: SVM: Enable Secure TSC for SNP guestsNikunj A Dadhania1-0/+28
Add support for Secure TSC, allowing userspace to configure the Secure TSC feature for SNP guests. Use the SNP specification's desired TSC frequency parameter during the SNP_LAUNCH_START command to set the mean TSC frequency in KHz for Secure TSC enabled guests. Always use kvm->arch.arch.default_tsc_khz as the TSC frequency that is passed to SNP guests in the SNP_LAUNCH_START command. The default value is the host TSC frequency. The userspace can optionally change the TSC frequency via the KVM_SET_TSC_KHZ ioctl before calling the SNP_LAUNCH_START ioctl. Introduce the read-only MSR GUEST_TSC_FREQ (0xc0010134) that returns guest's effective frequency in MHZ when Secure TSC is enabled for SNP guests. Disable interception of this MSR when Secure TSC is enabled. Note that GUEST_TSC_FREQ MSR is accessible only to the guest and not from the hypervisor context. Co-developed-by: Ketan Chaturvedi <Ketan.Chaturvedi@amd.com> Signed-off-by: Ketan Chaturvedi <Ketan.Chaturvedi@amd.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> [sean: contain Secure TSC to sev.c] Link: https://lore.kernel.org/r/20250819234833.3080255-9-seanjc@google.com [sean: return -EINVAL if TSC frequency is '0'] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-21KVM: SEV: Fold sev_es_vcpu_reset() into sev_vcpu_create()Sean Christopherson3-9/+2
Fold the remaining line of sev_es_vcpu_reset() into sev_vcpu_create() as there's no need for a dedicated RESET hook just to init a mutex, and the mutex should be initialized as early as possible anyways. No functional change intended. Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/20250819234833.3080255-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-21KVM: SEV: Set RESET GHCB MSR value during sev_es_init_vmcb()Sean Christopherson1-13/+11
Set the RESET value for the GHCB "MSR" during sev_es_init_vmcb() instead of sev_es_vcpu_reset() to allow for dropping sev_es_vcpu_reset() entirely. Note, the call to sev_init_vmcb() from sev_migrate_from() also kinda sorta emulates a RESET, but sev_migrate_from() immediately overwrites ghcb_gpa with the source's current value, so whether or not stuffing the GHCB version is correct/desirable is moot. No functional change intended. Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/20250819234833.3080255-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-21KVM: SEV: Move init of SNP guest state into sev_init_vmcb()Sean Christopherson3-16/+13
Move the initialization of SNP guest state from svm_vcpu_reset() into sev_init_vmcb() to reduce the number of paths that deal with INIT/RESET for SEV+ vCPUs from 4+ to 1. Plumb in @init_event as necessary. Opportunistically check for an SNP guest outside of sev_snp_init_protected_guest_state() so that sev_init_vmcb() is consistent with respect to checking for SEV-ES+ and SNP+ guests. No functional change intended. Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/20250819234833.3080255-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-21KVM: SVM: Move SEV-ES VMSA allocation to a dedicated sev_vcpu_create() helperSean Christopherson3-18/+29
Add a dedicated sev_vcpu_create() helper to allocate the VMSA page for SEV-ES+ vCPUs, and to allow for consolidating a variety of related SEV+ code in the near future. No functional change intended. Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/20250819234833.3080255-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-21KVM: SEV: Enforce minimum GHCB version requirement for SEV-SNP guestsNikunj A Dadhania1-2/+6
Require a minimum GHCB version of 2 when starting SEV-SNP guests through KVM_SEV_INIT2. When a VMM attempts to start an SEV-SNP guest with an incompatible GHCB version (less than 2), reject the request early rather than allowing the guest kernel to start with an incorrect protocol version and fail later with GHCB_SNP_UNSUPPORTED guest termination. Not enforcing the minimum version typically causes the guest to request termination with GHCB_SNP_UNSUPPORTED error code: kvm_amd: SEV-ES guest requested termination: 0x0:0x2 Fixes: 4af663c2f64a ("KVM: SEV: Allow per-guest configuration of GHCB protocol version") Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Michael Roth <michael.roth@amd.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/20250819234833.3080255-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-21KVM: SEV: Drop GHCB_VERSION_DEFAULT and open code itNikunj A Dadhania1-9/+8
Remove the GHCB_VERSION_DEFAULT macro and open code it with '2'. The macro is used conditionally and is not a true default. KVM ABI does not advertise/emumerates the default GHCB version. Any future change to this macro would silently alter the ABI and potentially break existing deployments that rely on the current behavior. Additionally, move the GHCB version assignment earlier in the code flow and update the comment to clarify that KVM_SEV_INIT2 defaults to version 2, while KVM_SEV_INIT forces version 1. No functional change intended. Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: Michael Roth <michael.roth@amd.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/20250819234833.3080255-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86: Zero XSTATE components on INIT by iterating over supported featuresChao Gao1-3/+9
Tweak the code a bit to facilitate resetting more xstate components in the future, e.g., CET's xstate-managed MSRs. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/r/20250812025606.74625-6-chao.gao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86: Manually clear MPX state only on INITSean Christopherson1-16/+30
Don't manually clear/zero MPX state on RESET, as the guest FPU state is zero allocated and KVM only does RESET during vCPU creation, i.e. the relevant state is guaranteed to be all zeroes. Opportunistically move the relevant code into a helper in anticipation of adding support for CET shadow stacks, which also has state that is zeroed on INIT. Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/r/20250812025606.74625-5-chao.gao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86: Add kvm_msr_{read,write}() helpersYang Weijiang2-4/+14
Wrap __kvm_{get,set}_msr() into two new helpers for KVM usage and use the helpers to replace existing usage of the raw functions. kvm_msr_{read,write}() are KVM-internal helpers, i.e. used when KVM needs to get/set a MSR value for emulating CPU behavior, i.e., host_initiated == %true in the helpers. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/r/20250812025606.74625-4-chao.gao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86: Use double-underscore read/write MSR helpers as appropriateSean Christopherson1-13/+16
Use the double-underscore helpers for emulating MSR reads and writes in he no-underscore versions to better capture the relationship between the two sets of APIs (the double-underscore versions don't honor userspace MSR filters). No functional change intended. Signed-off-by: Chao Gao <chao.gao@intel.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/r/20250812025606.74625-3-chao.gao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest accessesYang Weijiang3-22/+23
Rename kvm_{g,s}et_msr_with_filter() kvm_{g,s}et_msr() to kvm_emulate_msr_{read,write} __kvm_emulate_msr_{read,write} to make it more obvious that KVM uses these helpers to emulate guest behaviors, i.e., host_initiated == false in these helpers. Suggested-by: Sean Christopherson <seanjc@google.com> Suggested-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/r/20250812025606.74625-2-chao.gao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86: Advertise support for the immediate form of MSR instructionsXin Li3-2/+15
Advertise support for the immediate form of MSR instructions to userspace if the instructions are supported by the underlying CPU, and KVM is using VMX, i.e. is running on an Intel-compatible CPU. For SVM, explicitly clear X86_FEATURE_MSR_IMM to ensure KVM doesn't over- report support if AMD-compatible CPUs ever implement the immediate forms, as SVM will likely require explicit enablement in KVM. Signed-off-by: Xin Li (Intel) <xin@zytor.com> [sean: massage changelog] Link: https://lore.kernel.org/r/20250805202224.1475590-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: VMX: Support the immediate form of WRMSRNS in the VM-Exit fastpathXin Li3-4/+17
Add support for handling "WRMSRNS with an immediate" VM-Exits in KVM's fastpath. On Intel, all writes to the x2APIC ICR and to the TSC Deadline MSR are non-serializing, i.e. it's highly likely guest kernels will switch to using WRMSRNS when possible. And in general, any MSR written via WRMSRNS is probably worth handling in the fastpath, as the entire point of WRMSRNS is to shave cycles in hot paths. Signed-off-by: Xin Li (Intel) <xin@zytor.com> [sean: rewrite changelog, split rename to separate patch] Link: https://lore.kernel.org/r/20250805202224.1475590-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86: Add support for RDMSR/WRMSRNS w/ immediate on IntelXin Li4-12/+82
Add support for the immediate forms of RDMSR and WRMSRNS (currently Intel-only). The immediate variants are only valid in 64-bit mode, and use a single general purpose register for the data (the register is also encoded in the instruction, i.e. not implicit like regular RDMSR/WRMSR). The immediate variants are primarily motivated by performance, not code size: by having the MSR index in an immediate, it is available *much* earlier in the CPU pipeline, which allows hardware much more leeway about how a particular MSR is handled. Intel VMX support for the immediate forms of MSR accesses communicates exit information to the host as follows: 1) The immediate form of RDMSR uses VM-Exit Reason 84. 2) The immediate form of WRMSRNS uses VM-Exit Reason 85. 3) For both VM-Exit reasons 84 and 85, the Exit Qualification field is set to the MSR index that triggered the VM-Exit. 4) Bits 3 ~ 6 of the VM-Exit Instruction Information field are set to the register encoding used by the immediate form of the instruction, i.e. the destination register for RDMSR, and the source for WRMSRNS. 5) The VM-Exit Instruction Length field records the size of the immediate form of the MSR instruction. To deal with userspace RDMSR exits, stash the destination register in a new kvm_vcpu_arch field, similar to cui_linear_rip, pio, etc. Alternatively, the register could be saved in kvm_run.msr or re-retrieved from the VMCS, but the former would require sanitizing the value to ensure userspace doesn't clobber the value to an out-of-bounds index, and the latter would require a new one-off kvm_x86_ops hook. Don't bother adding support for the instructions in KVM's emulator, as the only way for RDMSR/WRMSR to be encountered is if KVM is emulating large swaths of code due to invalid guest state, and a vCPU cannot have invalid guest state while in 64-bit mode. Signed-off-by: Xin Li (Intel) <xin@zytor.com> [sean: minor tweaks, massage and expand changelog] Link: https://lore.kernel.org/r/20250805202224.1475590-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86: Rename handle_fastpath_set_msr_irqoff() to handle_fastpath_wrmsr()Xin Li4-5/+5
Rename the WRMSR fastpath API to drop "irqoff", as that information is redundant (the fastpath always runs with IRQs disabled), and to prepare for adding a fastpath for the immediate variant of WRMSRNS. No functional change intended. Signed-off-by: Xin Li (Intel) <xin@zytor.com> [sean: split to separate patch, write changelog] Link: https://lore.kernel.org/r/20250805202224.1475590-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86: Rename local "ecx" variables to "msr" and "pmc" as appropriateSean Christopherson1-12/+12
Rename "ecx" variables in {RD,WR}MSR and RDPMC helpers to "msr" and "pmc" respectively, in anticipation of adding support for the immediate variants of RDMSR and WRMSRNS, and to better document what the variables hold (versus where the data originated). No functional change intended. Link: https://lore.kernel.org/r/20250805202224.1475590-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86: Add a fastpath handler for INVDSean Christopherson4-0/+14
Add a fastpath handler for INVD so that the common fastpath logic can be trivially tested on both Intel and AMD. Under KVM, INVD is always: (a) intercepted, (b) available to the guest, and (c) emulated as a nop, with no side effects. Combined with INVD not having any inputs or outputs, i.e. no register constraints, INVD is the perfect instruction for exercising KVM's fastpath as it can be inserted into practically any guest-side code stream. Link: https://lore.kernel.org/r/20250805190526.1453366-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86: Push acquisition of SRCU in fastpath into kvm_pmu_trigger_event()Sean Christopherson2-14/+8
Acquire SRCU in the VM-Exit fastpath if and only if KVM needs to check the PMU event filter, to further trim the amount of code that is executed with SRCU protection in the fastpath. Counter-intuitively, holding SRCU can do more harm than good due to masking potential bugs, and introducing a new SRCU-protected asset to code reachable via kvm_skip_emulated_instruction() would be quite notable, i.e. definitely worth auditing. E.g. the primary user of kvm->srcu is KVM's memslots, accessing memslots all but guarantees guest memory may be accessed, accessing guest memory can fault, and page faults might sleep, which isn't allowed while IRQs are disabled. Not acquiring SRCU means the (hypothetical) illegal sleep would be flagged when running with PROVE_RCU=y, even if DEBUG_ATOMIC_SLEEP=n. Note, performance is NOT a motivating factor, as SRCU lock/unlock only adds ~15 cycles of latency to fastpath VM-Exits. I.e. overhead isn't a concern _if_ SRCU protection needs to be extended beyond PMU events, e.g. to honor userspace MSR filters. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86/pmu: Rename check_pmu_event_filter() to pmc_is_event_allowed()Sean Christopherson1-3/+3
Rename check_pmu_event_filter() to make its polarity more obvious, and to connect the dots to is_gp_event_allowed() and is_fixed_event_allowed(). No functional change intended. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86/pmu: Drop redundant check on PMC being locally enabled for emulationSean Christopherson1-2/+1
Drop the check on a PMC being locally enabled when triggering emulated events, as the bitmap of passed-in PMCs only contains locally enabled PMCs. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86/pmu: Drop redundant check on PMC being globally enabled for emulationSean Christopherson1-1/+1
When triggering PMC events in response to emulation, drop the redundant checks on a PMC being globally and locally enabled, as the passed in bitmap contains only PMCs that are locally enabled (and counting the right event), and the local copy of the bitmap has already been masked with global_ctrl. No true functional change intended. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86/pmu: Open code pmc_event_is_allowed() in its callersSean Christopherson1-8/+4
Open code pmc_event_is_allowed() in its callers, as kvm_pmu_trigger_event() only needs to check the event filter (both global and local enables are consulted outside of the loop). No functional change intended. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86/pmu: Rename pmc_speculative_in_use() to pmc_is_locally_enabled()Sean Christopherson3-5/+5
Rename pmc_speculative_in_use() to pmc_is_locally_enabled() to better capture what it actually tracks, and to show its relationship to pmc_is_globally_enabled(). While neither AMD nor Intel refer to event selectors or the fixed counter control MSR as "local", it's the obvious name to pair with "global". As for "speculative", there's absolutely nothing speculative about the checks. E.g. for PMUs without PERF_GLOBAL_CTRL, from the guest's perspective, the counters are "in use" without any qualifications. No functional change intended. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86/pmu: Calculate set of to-be-emulated PMCs at time of WRMSRsSean Christopherson2-21/+58
Calculate and track PMCs that are counting instructions/branches retired when the PMC's event selector (or fixed counter control) is modified instead evaluating the event selector on-demand. Immediately recalc a PMC's configuration on writes to avoid false negatives/positives when KVM skips an emulated WRMSR, which is guaranteed to occur before the main run loop processes KVM_REQ_PMU. Out of an abundance of caution, and because it's relatively cheap, recalc reprogrammed PMCs in kvm_pmu_handle_event() as well. Recalculating in response to KVM_REQ_PMU _should_ be unnecessary, but for now be paranoid to avoid introducing easily-avoidable bugs in edge cases. The code can be removed in the future if necessary, e.g. in the unlikely event that the overhead of recalculating to-be-emulated PMCs is noticeable. Note! Deliberately don't check the PMU event filters, as doing so could result in KVM consuming stale information. Tracking which PMCs are counting branches/instructions will allow grabbing SRCU in the fastpath VM-Exit handlers if and only if a PMC event might be triggered (to consult the event filters), and will also allow the upcoming mediated PMU to do the right thing with respect to counting instructions (the mediated PMU won't be able to update PMCs in the VM-Exit fastpath). Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19KVM: x86/pmu: Add wrappers for counting emulated instructions/branchesSean Christopherson4-15/+24
Add wrappers for triggering instruction retired and branch retired PMU events in anticipation of reworking the internal mechanisms to track which PMCs need to be evaluated, e.g. to avoid having to walk and check every PMC. Opportunistically bury "struct kvm_pmu_emulated_event_selectors" in pmu.c. No functional change intended. Link: https://lore.kernel.org/r/20250805190526.1453366-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>