linux - Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

Age	Commit message (Collapse)	Author	Lines
9 days	Merge tag 'usb-7.1-rc1' of ↵	Linus Torvalds	-6/+10
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB / Thunderbolt updates from Greg KH: "Here is the big set of USB and Thunderbolt changes for 7.1-rc1. Lots of little things in here, nothing major, just constant improvements, updates, and new features. Highlights are: - new USB power supply driver support. These changes did touch outside of drivers/usb/ but got acks from the relevant mantainers for them. - dts file updates and conversions - string function conversions into "safer" ones - new device quirks - xhci driver updates - usb gadget driver minor fixes - typec driver additions and updates - small number of thunderbolt driver changes - dwc3 driver updates and additions of new hardware support - other minor driver updates All of these have been in the linux-next tree for a while with no reported issues" * tag 'usb-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (176 commits) usb: dwc3: starfive: Add JHB100 USB 2.0 DRD controller dt-bindings: usb: dwc3: add support for StarFive JHB100 dt-bindings: usb: atmel,at91sam9rl-udc: convert to DT schema dt-bindings: usb: atmel,at91rm9200-udc: convert to DT schema dt-bindings: usb: generic-ehci: fix schema structure and add at91sam9g45 constraints dt-bindings: usb: generic-ohci: add AT91RM9200 OHCI binding support arm: dts: at91: remove unused #address-cells/#size-cells from sam9x60 udc node drivers/usb/host: Fix spelling error 'seperate' -> 'separate' usbip: tools: add hint when no exported devices are found USB: serial: iuu_phoenix: fix iuutool author name usb: gadget: f_ncm: validate minimum block_len in ncm_unwrap_ntb() usb: gadget: f_phonet: fix skb frags[] overflow in pn_rx_complete() usb: gadget: f_hid: Add missing error code usb: typec: cros_ec_ucsi: Load driver from OF and ACPI definitions dt-bindings: chrome: Add cros-ec-ucsi compatibility to typec binding USB: of: Simplify with scoped for each OF child loop usbip: validate number_of_packets in usbip_pack_ret_submit() usb: gadget: renesas_usb3: validate endpoint index in standard request handlers usb: core: config: reverse the size check of the SSP isoc endpoint descriptor usb: typec: ucsi: Set usb mode on partner change ...
9 days	Merge tag 'mm-stable-2026-04-18-02-14' of ↵	Linus Torvalds	-138/+263
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - "Eliminate Dying Memory Cgroup" (Qi Zheng and Muchun Song) Address the longstanding "dying memcg problem". A situation wherein a no-longer-used memory control group will hang around for an extended period pointlessly consuming memory - "fix unexpected type conversions and potential overflows" (Qi Zheng) Fix a couple of potential 32-bit/64-bit issues which were identified during review of the "Eliminate Dying Memory Cgroup" series - "kho: history: track previous kernel version and kexec boot count" (Breno Leitao) Use Kexec Handover (KHO) to pass the previous kernel's version string and the number of kexec reboots since the last cold boot to the next kernel, and print it at boot time - "liveupdate: prevent double preservation" (Pasha Tatashin) Teach LUO to avoid managing the same file across different active sessions - "liveupdate: Fix module unloading and unregister API" (Pasha Tatashin) Address an issue with how LUO handles module reference counting and unregistration during module unloading - "zswap pool per-CPU acomp_ctx simplifications" (Kanchana Sridhar) Simplify and clean up the zswap crypto compression handling and improve the lifecycle management of zswap pool's per-CPU acomp_ctx resources - "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race" (SeongJae Park) Address unlikely but possible leaks and deadlocks in damon_call() and damon_walk() - "mm/damon/core: validate damos_quota_goal->nid" (SeongJae Park) Fix a couple of root-only wild pointer dereferences - "Docs/admin-guide/mm/damon: warn commit_inputs vs other params race" (SeongJae Park) Update the DAMON documentation to warn operators about potential races which can occur if the commit_inputs parameter is altered at the wrong time - "Minor hmm_test fixes and cleanups" (Alistair Popple) Bugfixes and a cleanup for the HMM kernel selftests - "Modify memfd_luo code" (Chenghao Duan) Cleanups, simplifications and speedups to the memfd_lou code - "mm, kvm: allow uffd support in guest_memfd" (Mike Rapoport) Support for userfaultfd in guest_memfd - "selftests/mm: skip several tests when thp is not available" (Chunyu Hu) Fix several issues in the selftests code which were causing breakage when the tests were run on CONFIG_THP=n kernels - "mm/mprotect: micro-optimization work" (Pedro Falcato) A couple of nice speedups for mprotect() - "MAINTAINERS: update KHO and LIVE UPDATE entries" (Pratyush Yadav) Document upcoming changes in the maintenance of KHO, LUO, memfd_luo, kexec, crash, kdump and probably other kexec-based things - they are being moved out of mm.git and into a new git tree * tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (121 commits) MAINTAINERS: add page cache reviewer mm/vmscan: avoid false-positive -Wuninitialized warning MAINTAINERS: update Dave's kdump reviewer email address MAINTAINERS: drop include/linux/liveupdate from LIVE UPDATE MAINTAINERS: drop include/linux/kho/abi/ from KHO MAINTAINERS: update KHO and LIVE UPDATE maintainers MAINTAINERS: update kexec/kdump maintainers entries mm/migrate_device: remove dead migration entry check in migrate_vma_collect_huge_pmd() selftests: mm: skip charge_reserved_hugetlb without killall userfaultfd: allow registration of ranges below mmap_min_addr mm/vmstat: fix vmstat_shepherd double-scheduling vmstat_update mm/hugetlb: fix early boot crash on parameters without '=' separator zram: reject unrecognized type= values in recompress_store() docs: proc: document ProtectionKey in smaps mm/mprotect: special-case small folios when applying permissions mm/mprotect: move softleaf code out of the main function mm: remove '!root_reclaim' checking in should_abort_scan() mm/sparse: fix comment for section map alignment mm/page_io: use sio->len for PSWPIN accounting in sio_read_complete() selftests/mm: transhuge_stress: skip the test when thp not available ...
10 days	Merge tag 'memblock-v7.1-rc1' of ↵	Linus Torvalds	-9/+51
	git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock updates from Mike Rapoport: - improve debuggability of reserve_mem kernel parameter handling with print outs in case of a failure and debugfs info showing what was actually reserved - Make memblock_free_late() and free_reserved_area() use the same core logic for freeing the memory to buddy and ensure it takes care of updating memblock arrays when ARCH_KEEP_MEMBLOCK is enabled. * tag 'memblock-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: x86/alternative: delay freeing of smp_locks section memblock: warn when freeing reserved memory before memory map is initialized memblock, treewide: make memblock_free() handle late freeing memblock: make free_reserved_area() update memblock if ARCH_KEEP_MEMBLOCK=y memblock: extract page freeing from free_reserved_area() into a helper memblock: make free_reserved_area() more robust mm: move free_reserved_area() to mm/memblock.c powerpc: opal-core: pair alloc_pages_exact() with free_pages_exact() powerpc: fadump: pair alloc_pages_exact() with free_pages_exact() memblock: reserve_mem: fix end caclulation in reserve_mem_release_by_name() memblock: move reserve_bootmem_range() to memblock.c and make it static memblock: Add reserve_mem debugfs info memblock: Print out errors on reserve_mem parser
10 days	Merge tag 'perf-tools-for-v7.1-2026-04-17' of ↵	Linus Torvalds	-1676/+6797
	git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "perf report: - Add 'comm_nodigit' sort key to combine similar threads that only have different numbers in the comm. In the following example, the 'comm_nodigit' will have samples from all threads starting with "bpfrb/" into an entry "bpfrb/<N>". $ perf report -s comm_nodigit,comm -H ... # # Overhead CommandNoDigit / Command # ........... ........................ # 20.30% swapper 20.30% swapper 13.37% chrome 13.37% chrome 10.07% bpfrb/<N> 7.47% bpfrb/0 0.70% bpfrb/1 0.47% bpfrb/3 0.46% bpfrb/2 0.25% bpfrb/4 0.23% bpfrb/5 0.20% bpfrb/6 0.14% bpfrb/10 0.07% bpfrb/7 - Support flat layout for symfs. The --symfs option is to specify the location of debugging symbol files. The default 'hierarchy' layout would search the symbol file using the same path of the original file under the symfs root. The new 'flat' layout would search only in the root directory. - Update 'simd' sort key for ARM SIMD flags to cover ASE/SME and more predicate flags. perf stat: - Add --pmu-filter option to select specific PMUs. This would be useful when you measure metrics from multiple instance of uncore PMUs with similar names. # perf stat -M cpa_p0_avg_bw Performance counter stats for 'system wide': 19,417,779,115 hisi_sicl0_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl0_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl0_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl0_cpa0/cpa_p0_rd_dat_32b/ 19,417,751,103 hisi_sicl10_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl10_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl10_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl10_cpa0/cpa_p0_rd_dat_32b/ 19,417,730,679 hisi_sicl2_cpa0/cpa_cycles/ # 0.31 cpa_p0_avg_bw 75,635,749 hisi_sicl2_cpa0/cpa_p0_wr_dat/ 18,520,640 hisi_sicl2_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl2_cpa0/cpa_p0_rd_dat_32b/ 19,417,674,227 hisi_sicl8_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl8_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl8_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl8_cpa0/cpa_p0_rd_dat_32b/ 19.417734480 seconds time elapsed With --pmu-filter, users can select only hisi_sicl2_cpa0 PMU. # perf stat --pmu-filter hisi_sicl2_cpa0 -M cpa_p0_avg_bw Performance counter stats for 'system wide': 6,234,093,559 cpa_cycles # 0.60 cpa_p0_avg_bw 50,548,465 cpa_p0_wr_dat 7,552,182 cpa_p0_rd_dat_64b 0 cpa_p0_rd_dat_32b 6.234139320 seconds time elapsed Data type profiling: - Quality improvements by tracking register state more precisely - Ensure array members to get the type - Handle more cases for global variables Vendor event/metric updates: - Update various Intel events and metrics - Add NVIDIA Tegra 410 Olympus events Internal changes: - Verify perf.data header for maliciously crafted files - Update perf test to cover more usages and make them robust - Move a couple of copied kernel headers not to annoy objtool build - Fix a bug in map sorting in name order - Remove some unused codes Misc: - Fix module symbol resolution with non-zero text address - Add -t/--threads option to `perf bench mem mmap` - Track duration of exit() syscall by `perf trace -s` - Add core.addr2line-timeout and core.addr2line-disable-warn config items" tag 'perf-tools-for-v7.1-2026-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (131 commits) perf loongarch: Fix build failure with CONFIG_LIBDW_DWARF_UNWIND perf annotate: Use jump__delete when freeing LoongArch jumps perf test: Fixes for check branch stack sampling perf test: Fix inet_pton probe failure and unroll call graph perf build: fix "argument list too long" in second location perf header: Add sanity checks to HEADER_BPF_BTF processing perf header: Sanity check HEADER_BPF_PROG_INFO perf header: Sanity check HEADER_PMU_CAPS perf header: Sanity check HEADER_HYBRID_TOPOLOGY perf header: Sanity check HEADER_CACHE perf header: Sanity check HEADER_GROUP_DESC perf header: Sanity check HEADER_PMU_MAPPINGS perf header: Sanity check HEADER_MEM_TOPOLOGY perf header: Sanity check HEADER_NUMA_TOPOLOGY perf header: Sanity check HEADER_CPU_TOPOLOGY perf header: Sanity check HEADER_NRCPUS and HEADER_CPU_DOMAIN_INFO perf header: Bump up the max number of command line args allowed perf header: Validate nr_domains when reading HEADER_CPU_DOMAIN_INFO perf sample: Fix documentation typo perf arm_spe: Improve SIMD flags setting ...
11 days	selftests: mm: skip charge_reserved_hugetlb without killall	Cao Ruichuang	-0/+5
	charge_reserved_hugetlb.sh tears down background writers with killall from psmisc. Minimal Ubuntu images do not always provide that tool, so the selftest fails in cleanup for an environment reason rather than for the hugetlb behavior it is trying to cover. Skip the test when killall is unavailable, similar to the existing root check, so these environments report the dependency clearly instead of failing the test. Link: https://lore.kernel.org/20260410044139.67480-1-create0818@163.com Signed-off-by: Cao Ruichuang <create0818@163.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 days	selftests/mm: transhuge_stress: skip the test when thp not available	Chunyu Hu	-0/+4
	The test requires thp, skip the test when thp is not available to avoid false positive. Tested with thp disabled kernel. Before the fix: # -------------------------------- # running ./transhuge-stress -d 20 # -------------------------------- # TAP version 13 # 1..1 # transhuge-stress: allocate 1453 transhuge pages, using 2907 MiB virtual memory and 11 MiB of ram # Bail out! MADV_HUGEPAGE# Planned tests != run tests (1 != 0) # # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0 # [FAIL] not ok 60 transhuge-stress -d 20 # exit=1 After the fix: # -------------------------------- # running ./transhuge-stress -d 20 # -------------------------------- # TAP version 13 # 1..0 # SKIP Transparent Hugepages not available # [SKIP] ok 5 transhuge-stress -d 20 # SKIP Link: https://lore.kernel.org/20260402014543.1671131-7-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Li Wang <liwang@redhat.com> Cc: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 days	selftests/mm: split_huge_page_test: skip the test when thp is not available	Chunyu Hu	-0/+4
	When thp is not enabled on some kernel config such as realtime kernel, the test will report failure. Fix the false positive by skipping the test directly when thp is not enabled. Tested with thp disabled kernel: Before The fix: # -------------------------------------------------- # running ./split_huge_page_test /tmp/xfs_dir_Ywup9p # -------------------------------------------------- # TAP version 13 # Bail out! Reading PMD pagesize failed # # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0 # [FAIL] not ok 61 split_huge_page_test /tmp/xfs_dir_Ywup9p # exit=1 After the fix: # -------------------------------------------------- # running ./split_huge_page_test /tmp/xfs_dir_YHPUPl # -------------------------------------------------- # TAP version 13 # 1..0 # SKIP Transparent Hugepages not available # [SKIP] ok 6 split_huge_page_test /tmp/xfs_dir_YHPUPl # SKIP Link: https://lore.kernel.org/20260402014543.1671131-6-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Li Wang <liwang@redhat.com> Cc: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 days	selftests/mm/vm_util: robust write_file()	Chunyu Hu	-3/+12
	Add three more checks for buflen and numwritten. The buflen should be at least two, that means at least one char and the null-end. The error case check is added by checking numwriten < 0 instead of numwritten < 1. And the truncate case is checked. The test will exit if any of these conditions aren't met. Additionally, add more print information when a write failure occurs or a truncated write happens, providing clearer diagnostics. Link: https://lore.kernel.org/20260402014543.1671131-5-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 days	selftests/mm: move write_file helper to vm_util	Chunyu Hu	-48/+20
	thp_settings provides write_file() helper for safely writing to a file and exit when write failure happens. It's a very low level helper and many sub tests need such a helper, not only thp tests. split_huge_page_test also defines a write_file locally. The two have minior differences in return type and used exit api. And there would be conflicts if split_huge_page_test wanted to include thp_settings.h because of different prototype, making it less convenient. It's possisble to merge the two, although some tests don't use the kselftest infrastrucutre for testing. It would also work when using the ksft_exit_msg() to exit in my test, as the counters are all zero. Output will be like: TAP version 13 1..62 Bail out! /proc/sys/vm/drop_caches1 open failed: No such file or directory # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0 So here we just keep the version in split_huge_page_test, and move it into the vm_util. This makes it easy to maitain and user could just include one vm_util.h when they don't need thp setting helpers. Keep the prototype of void return as the function will exit on any error, return value is not necessary, and will simply the callers like write_num() and write_string(). Link: https://lore.kernel.org/20260402014543.1671131-4-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Suggested-by: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 days	selftests/mm: soft-dirty: skip two tests when thp is not available	Chunyu Hu	-1/+3
	The test_hugepage test contain two sub tests. If just reporting one skip when thp not available, there will be error in the log because the test count don't match the test plan. Change to skip two tests by running the ksft_test_result_skip twice in this case. Without the fix (run test on thp disabled kernel): ./run_vmtests.sh -t soft_dirty # -------------------- # running ./soft-dirty # -------------------- # TAP version 13 # 1..19 # ok 1 Test test_simple # ok 2 Test test_vma_reuse dirty bit of allocated page # ok 3 Test test_vma_reuse dirty bit of reused address page # ok 4 # SKIP Transparent Hugepages not available # ok 5 Test test_mprotect-anon dirty bit of new written page # ok 6 Test test_mprotect-anon soft-dirty clear after clear_refs # ok 7 Test test_mprotect-anon soft-dirty clear after marking RO # ok 8 Test test_mprotect-anon soft-dirty clear after marking RW # ok 9 Test test_mprotect-anon soft-dirty after rewritten # ok 10 Test test_mprotect-file dirty bit of new written page # ok 11 Test test_mprotect-file soft-dirty clear after clear_refs # ok 12 Test test_mprotect-file soft-dirty clear after marking RO # ok 13 Test test_mprotect-file soft-dirty clear after marking RW # ok 14 Test test_mprotect-file soft-dirty after rewritten # ok 15 Test test_merge-anon soft-dirty after remap merge 1st pg # ok 16 Test test_merge-anon soft-dirty after remap merge 2nd pg # ok 17 Test test_merge-anon soft-dirty after mprotect merge 1st pg # ok 18 Test test_merge-anon soft-dirty after mprotect merge 2nd pg # # 1 skipped test(s) detected. Consider enabling relevant config options to improve coverage. # # Planned tests != run tests (19 != 18) # # Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0 # [FAIL] not ok 52 soft-dirty # exit=1 With the fix (run test on thp disabled kernel): ./run_vmtests.sh -t soft_dirty # -------------------- # running ./soft-dirty # TAP version 13 # -------------------- # running ./soft-dirty # -------------------- # TAP version 13 # 1..19 # ok 1 Test test_simple # ok 2 Test test_vma_reuse dirty bit of allocated page # ok 3 Test test_vma_reuse dirty bit of reused address page # # Transparent Hugepages not available # ok 4 # SKIP Test test_hugepage huge page allocation # ok 5 # SKIP Test test_hugepage huge page dirty bit # ok 6 Test test_mprotect-anon dirty bit of new written page # ok 7 Test test_mprotect-anon soft-dirty clear after clear_refs # ok 8 Test test_mprotect-anon soft-dirty clear after marking RO # ok 9 Test test_mprotect-anon soft-dirty clear after marking RW # ok 10 Test test_mprotect-anon soft-dirty after rewritten # ok 11 Test test_mprotect-file dirty bit of new written page # ok 12 Test test_mprotect-file soft-dirty clear after clear_refs # ok 13 Test test_mprotect-file soft-dirty clear after marking RO # ok 14 Test test_mprotect-file soft-dirty clear after marking RW # ok 15 Test test_mprotect-file soft-dirty after rewritten # ok 16 Test test_merge-anon soft-dirty after remap merge 1st pg # ok 17 Test test_merge-anon soft-dirty after remap merge 2nd pg # ok 18 Test test_merge-anon soft-dirty after mprotect merge 1st pg # ok 19 Test test_merge-anon soft-dirty after mprotect merge 2nd pg # # 2 skipped test(s) detected. Consider enabling relevant config options to improve coverage. # # Totals: pass:17 fail:0 xfail:0 xpass:0 skip:2 error:0 # [PASS] ok 1 soft-dirty hwpoison_inject # SUMMARY: PASS=1 SKIP=0 FAIL=0 1..1 Link: https://lore.kernel.org/20260402014543.1671131-3-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Li Wang <liwang@redhat.com> Cc: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 days	selftests/mm/guard-regions: skip collapse test when thp not enabled	Chunyu Hu	-0/+4
	Patch series "selftests/mm: skip several tests when thp is not available", v8. There are several tests requires transprarent hugepages, when run on thp disabled kernel such as realtime kernel, there will be false negative. Mark those tests as skip when thp is not available. This patch (of 6): When thp is not available, just skip the collape tests to avoid the false negative. Without the change, run with a thp disabled kernel: ./run_vmtests.sh -t madv_guard -n 1 <snip/> # RUN guard_regions.anon.collapse ... # guard-regions.c:2217:collapse:Expected madvise(ptr, size, MADV_NOHUGEPAGE) (-1) == 0 (0) # collapse: Test terminated by assertion # FAIL guard_regions.anon.collapse not ok 2 guard_regions.anon.collapse <snip/> # RUN guard_regions.shmem.collapse ... # guard-regions.c:2217:collapse:Expected madvise(ptr, size, MADV_NOHUGEPAGE) (-1) == 0 (0) # collapse: Test terminated by assertion # FAIL guard_regions.shmem.collapse not ok 32 guard_regions.shmem.collapse <snip/> # RUN guard_regions.file.collapse ... # guard-regions.c:2217:collapse:Expected madvise(ptr, size, MADV_NOHUGEPAGE) (-1) == 0 (0) # collapse: Test terminated by assertion # FAIL guard_regions.file.collapse not ok 62 guard_regions.file.collapse <snip/> # FAILED: 87 / 90 tests passed. # 17 skipped test(s) detected. Consider enabling relevant config options to improve coverage. # Totals: pass:70 fail:3 xfail:0 xpass:0 skip:17 error:0 With this change, run with thp disabled kernel: ./run_vmtests.sh -t madv_guard -n 1 <snip/> # RUN guard_regions.anon.collapse ... # SKIP Transparent Hugepages not available # OK guard_regions.anon.collapse ok 2 guard_regions.anon.collapse # SKIP Transparent Hugepages not available <snip/> # RUN guard_regions.file.collapse ... # SKIP Transparent Hugepages not available # OK guard_regions.file.collapse ok 62 guard_regions.file.collapse # SKIP Transparent Hugepages not available <snip/> # RUN guard_regions.shmem.collapse ... # SKIP Transparent Hugepages not available # OK guard_regions.shmem.collapse ok 32 guard_regions.shmem.collapse # SKIP Transparent Hugepages not available <snip/> # PASSED: 90 / 90 tests passed. # 20 skipped test(s) detected. Consider enabling relevant config options to improve coverage. # Totals: pass:70 fail:0 xfail:0 xpass:0 skip:20 error:0 Link: https://lore.kernel.org/20260402014543.1671131-1-chuhu@redhat.com Link: https://lore.kernel.org/20260402014543.1671131-2-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Li Wang <liwang@redhat.com> Cc: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 days	selftests/mm: hmm-tests: don't hardcode THP size to 2MB	Alistair Popple	-67/+16
	Several HMM tests hardcode TWOMEG as the THP size. This is wrong on architectures where the PMD size is not 2MB such as arm64 with 64K base pages where THP is 512MB. Fix this by using read_pmd_pagesize() from vm_util instead. While here also replace the custom file_read_ulong() helper used to parse the default hugetlbfs page size from /proc/meminfo with the existing default_huge_page_size() from vm_util. Link: https://lore.kernel.org/20260331063445.3551404-3-apopple@nvidia.com Link: https://lore.kernel.org/linux-mm/8bd0396a-8997-4d2e-a13f-5aac033083d7@linux.dev/ Fixes: fee9f6d1b8df ("mm/hmm/test: add selftests for HMM") Fixes: 519071529d2a ("selftests/mm/hmm-tests: new tests for zone device THP migration") Signed-off-by: Alistair Popple <apopple@nvidia.com> Reported-by: Zenghui Yu <zenghui.yu@linux.dev> Closes: https://lore.kernel.org/linux-mm/8bd0396a-8997-4d2e-a13f-5aac033083d7@linux.dev/ Reviewed-by: Balbir Singh <balbirs@nvidia.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger,kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 days	selftests/mm: skip hugetlb_dio tests when DIO alignment is incompatible	Li Wang	-22/+69
	hugetlb_dio test uses sub-page offsets (pagesize / 2) to verify that hugepages used as DIO user buffers are correctly unpinned at completion. However, on filesystems with a logical block size larger than half the page size (e.g., 4K-sector block devices), these unaligned DIO writes are rejected with -EINVAL, causing the test to fail unexpectedly. Add get_dio_alignment() to query the filesystem's required DIO alignment via statx(STATX_DIOALIGN) and skip individual test cases whose file offset or write size is not a multiple of that alignment. Aligned cases continue to run so the core coverage is preserved. While here, open the temporary file once in main() and share the fd across all test cases instead of reopening it in each invocation. === Reproduce Steps === # dd if=/dev/zero of=/tmp/test.img bs=1M count=512 # losetup --sector-size 4096 /dev/loop0 /tmp/test.img # mkfs.xfs /dev/loop0 # mkdir -p /mnt/dio_test # mount /dev/loop0 /mnt/dio_test // Modify test to open /mnt/dio_test and rebuild it: - fd = open("/tmp", O_TMPFILE \| O_RDWR \| O_DIRECT, 0664); + fd = open("/mnt/dio_test", O_TMPFILE \| O_RDWR \| O_DIRECT, 0664); # getconf PAGESIZE 4096 # echo 100 >/proc/sys/vm/nr_hugepages # ./hugetlb_dio TAP version 13 1..4 # No. Free pages before allocation : 100 # No. Free pages after munmap : 100 ok 1 free huge pages from 0-12288 Bail out! Error writing to file : Invalid argument (22) # Planned tests != run tests (4 != 1) # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 Link: https://lore.kernel.org/20260401090520.24018-1-liwang@redhat.com Signed-off-by: Li Wang <liwang@redhat.com> Suggested-by: Mike Rapoport <rppt@kernel.org> Suggested-by: David Hildenbrand <david@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 days	tools/testing/selftests: add merge test for partial msealed range	Lorenzo Stoakes (Oracle)	-0/+88
	Commit 2697dd8ae721 ("mm/mseal: update VMA end correctly on merge") fixed an issue in the loop which iterates through VMAs applying mseal, which was triggered by mseal()'ing a range of VMAs where the second was mseal()'d and the first mergeable with it, once mseal()'d. Add a regression test to assert that this behaviour is correct. We place it in the merge selftests as this is strictly an issue with merging (via a vma_modify() invocation). It also asserts that mseal()'d ranges are correctly merged as you'd expect. The test is implemented such that it is skipped if mseal() is not available on the system. [rppt@kernel.org: fix inclusions, to fix handle_uprobe_upon_merged_vma()] Link: https://lore.kernel.org/ac_mCIUQWRAbuH8F@kernel.org [ljs@kernel.org: simplifications per Pedro] Link: https://lore.kernel.org/1c9c922d-5cb5-4cff-9273-b737cdb57ca1@lucifer.local Link: https://lore.kernel.org/20260331073627.50010-1-ljs@kernel.org Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Signed-off-by: Mike Rapoport <rppt@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 days	selftests: liveupdate: add test for double preservation	Pasha Tatashin	-0/+41
	Verify that a file can only be preserved once across all active sessions. Attempting to preserve it a second time, whether in the same or a different session, should fail with EBUSY. Link: https://lore.kernel.org/20260326163943.574070-4-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Cc: David Matlack <dmatlack@google.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 days	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	Linus Torvalds	-58/+516
	Pull bpf fixes from Alexei Starovoitov: "Most of the diff stat comes from Xu Kuohai's fix to emit ENDBR/BTI, since all JITs had to be touched to move constant blinding out and pass bpf_verifier_env in. - Fix use-after-free in arena_vm_close on fork (Alexei Starovoitov) - Dissociate struct_ops program with map if map_update fails (Amery Hung) - Fix out-of-range and off-by-one bugs in arm64 JIT (Daniel Borkmann) - Fix precedence bug in convert_bpf_ld_abs alignment check (Daniel Borkmann) - Fix arg tracking for imprecise/multi-offset in BPF_ST/STX insns (Eduard Zingerman) - Copy token from main to subprogs to fix missing kallsyms (Eduard Zingerman) - Prevent double close and leak of btf objects in libbpf (Jiri Olsa) - Fix af_unix null-ptr-deref in sockmap (Michal Luczaj) - Fix NULL deref in map_kptr_match_type for scalar regs (Mykyta Yatsenko) - Avoid unnecessary IPIs. Remove redundant bpf_flush_icache() in arm64 and riscv JITs (Puranjay Mohan) - Fix out of bounds access. Validate node_id in arena_alloc_pages() (Puranjay Mohan) - Reject BPF-to-BPF calls and callbacks in arm32 JIT (Puranjay Mohan) - Refactor all JITs to pass bpf_verifier_env to emit ENDBR/BTI for indirect jump targets on x86-64, arm64 JITs (Xu Kuohai) - Allow UTF-8 literals in bpf_bprintf_prepare() (Yihan Ding)" * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (32 commits) bpf, arm32: Reject BPF-to-BPF calls and callbacks in the JIT bpf: Dissociate struct_ops program with map if map_update fails bpf: Validate node_id in arena_alloc_pages() libbpf: Prevent double close and leak of btf objects selftests/bpf: cover UTF-8 trace_printk output bpf: allow UTF-8 literals in bpf_bprintf_prepare() selftests/bpf: Reject scalar store into kptr slot bpf: Fix NULL deref in map_kptr_match_type for scalar regs bpf: Fix precedence bug in convert_bpf_ld_abs alignment check bpf, arm64: Emit BTI for indirect jump target bpf, x86: Emit ENDBR for indirect jump targets bpf: Add helper to detect indirect jump targets bpf: Pass bpf_verifier_env to JIT bpf: Move constants blinding out of arch-specific JITs bpf, sockmap: Take state lock for af_unix iter bpf, sockmap: Fix af_unix null-ptr-deref in proto update selftests/bpf: Extend bpf_iter_unix to attempt deadlocking bpf, sockmap: Fix af_unix iter deadlock bpf, sockmap: Annotate af_unix sock:: Sk_state data-races selftests/bpf: verify kallsyms entries for token-loaded subprograms ...
11 days	Merge tag 'cxl-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl	Linus Torvalds	-13/+557
	Pull CXL (Compute Express Link) updates from Dave Jiang: "The significant change of interest is the handling of soft reserved memory conflict between CXL and HMEM. In essence CXL will be the first to claim the soft reserved memory ranges that belongs to CXL and attempt to enumerate them with best effort. If CXL is not able to enumerate the ranges it will punt them to HMEM. There are also MAINTAINERS email changes from Dan Williams and Jonathan Cameron" * tag 'cxl-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (37 commits) MAINTAINERS: Update Jonathan Cameron's email address cxl/hdm: Add support for 32 switch decoders MAINTAINERS: Update address for Dan Williams tools/testing/cxl: Enable replay of user regions as auto regions cxl/region: Add a region sysfs interface for region lock status tools/testing/cxl: Test dax_hmem takeover of CXL regions tools/testing/cxl: Simulate auto-assembly failure dax/hmem: Parent dax_hmem devices dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices dax/hmem: Reduce visibility of dax_cxl coordination symbols cxl/region: Constify cxl_region_resource_contains() cxl/region: Limit visibility of cxl_region_contains_resource() dax/cxl: Fix HMEM dependencies cxl/region: Fix use-after-free from auto assembly failure cxl/core: Check existence of cxl_memdev_state in poison test cxl/core: use cleanup.h for devm_cxl_add_dax_region cxl/core/region: move dax region device logic into region_dax.c cxl/core/region: move pmem region driver logic into region_pmem.c dax/hmem, cxl: Defer and resolve Soft Reserved ownership cxl/region: Add helper to check Soft Reserved containment by CXL regions ...
11 days	Merge tag 'dma-mapping-7.1-2026-04-16' of ↵	Linus Torvalds	-3/+20
	git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: - added support for batched cache sync, what improves performance of dma_map/unmap_sg() operations on ARM64 architecture (Barry Song) - introduced DMA_ATTR_CC_SHARED attribute for explicitly shared memory used in confidential computing (Jiri Pirko) - refactored spaghetti-like code in drivers/of/of_reserved_mem.c and its clients (Marek Szyprowski, shared branch with device-tree updates to avoid merge conflicts) - prepared Contiguous Memory Allocator related code for making dma-buf drivers modularized (Maxime Ripard) - added support for benchmarking dma_map_sg() calls to tools/dma utility (Qinxin Xia) * tag 'dma-mapping-7.1-2026-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: (24 commits) dma-buf: heaps: system: document system_cc_shared heap dma-buf: heaps: system: add system_cc_shared heap for explicitly shared memory dma-mapping: introduce DMA_ATTR_CC_SHARED for shared memory mm: cma: Export cma_alloc(), cma_release() and cma_get_name() dma: contiguous: Export dev_get_cma_area() dma: contiguous: Make dma_contiguous_default_area static dma: contiguous: Make dev_get_cma_area() a proper function dma: contiguous: Turn heap registration logic around of: reserved_mem: rework fdt_init_reserved_mem_node() of: reserved_mem: clarify fdt_scan_reserved_mem*() functions of: reserved_mem: rearrange code a bit of: reserved_mem: replace CMA quirks by generic methods of: reserved_mem: switch to ops based OF_DECLARE() of: reserved_mem: use -ENODEV instead of -ENOENT of: reserved_mem: remove fdt node from the structure dma-mapping: fix false kernel-doc comment marker dma-mapping: Support batch mode for dma_direct_{map,unmap}_sg dma-mapping: Separate DMA sync issuing and completion waiting arm64: Provide dcache_inval_poc_nosync helper arm64: Provide dcache_clean_poc_nosync helper ...
11 days	Merge tag 'trace-v7.1' of ↵	Linus Torvalds	-0/+34
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Fix printf format warning for bprintf sunrpc uses a trace_printk() that triggers a printf warning during the compile. Move the __printf() attribute around for when debugging is not enabled the warning will go away - Remove redundant check for EVENT_FILE_FL_FREED in event_filter_write() The FREED flag is checked in the call to event_file_file() and then checked again right afterward, which is unneeded - Clean up event_file_file() and event_file_data() helpers These helper functions played a different role in the past, but now with eventfs, the READ_ONCE() isn't needed. Simplify the code a bit and also add a warning to event_file_data() if the file or its data is not present - Remove updating file->private_data in tracing open All access to the file private data is handled by the helper functions, which do not use file->private_data. Stop updating it on open - Show ENUM names in function arguments via BTF in function tracing When showing the function arguments when func-args option is set for function tracing, if one of the arguments is found to be an enum, show the name of the enum instead of its number - Add new trace_call__##name() API for tracepoints Tracepoints are enabled via static_branch() blocks, where when not enabled, there's only a nop that is in the code where the execution will just skip over it. When tracing is enabled, the nop is converted to a direct jump to the tracepoint code. Sometimes more calculations are required to be performed to update the parameters of the tracepoint. In this case, trace_##name##_enabled() is called which is a static_branch() that gets enabled only when the tracepoint is enabled. This allows the extra calculations to also be skipped by the nop: if (trace_foo_enabled()) { x = bar(); trace_foo(x); } Where the x=bar() is only performed when foo is enabled. The problem with this approach is that there's now two static_branch() calls. One for checking if the tracepoint is enabled, and then again to know if the tracepoint should be called. The second one is redundant Introduce trace_call__foo() that will call the foo() tracepoint directly without doing a static_branch(): if (trace_foo_enabled()) { x = bar(); trace_call__foo(); } - Update various locations to use the new trace_call__##name() API - Move snapshot code out of trace.c Cleaning up trace.c to not be a "dump all", move the snapshot code out of it and into a new trace_snapshot.c file - Clean up some "%.s" to "%s" - Allow boot kernel command line options to be called multiple times Have options like: ftrace_filter=foo ftrace_filter=bar ftrace_filter=zoo Equal to: ftrace_filter=foo,bar,zoo - Fix ipi_raise event CPU field to be a CPU field The ipi_raise target_cpus field is defined as a __bitmask(). There is now a __cpumask() field definition. Update the field to use that - Have hist_field_name() use a snprintf() and not a series of strcat() It's safer to use snprintf() that a series of strcat() - Fix tracepoint regfunc balancing A tracepoint can define a "reg" and "unreg" function that gets called before the tracepoint is enabled, and after it is disabled respectively. But on error, after the "reg" func is called and the tracepoint is not enabled, the "unreg" function is not called to tear down what the "reg" function performed - Fix output that shows what histograms are enabled Event variables are displayed incorrectly in the histogram output Instead of "sched.sched_wakeup.$var", it is showing "$sched.sched_wakeup.var" where the '$' is in the incorrect location - Some other simple cleanups * tag 'trace-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (24 commits) selftests/ftrace: Add test case for fully-qualified variable references tracing: Fix fully-qualified variable reference printing in histograms tracepoint: balance regfunc() on func_add() failure in tracepoint_add_func() tracing: Rebuild full_name on each hist_field_name() call tracing: Report ipi_raise target CPUs as cpumask tracing: Remove duplicate latency_fsnotify() stub tracing: Preserve repeated trace_trigger boot parameters tracing: Append repeated boot-time tracing parameters tracing: Remove spurious default precision from show_event_trigger/filter formats cpufreq: Use trace_call__##name() at guarded tracepoint call sites tracing: Remove tracing_alloc_snapshot() when snapshot isn't defined tracing: Move snapshot code out of trace.c and into trace_snapshot.c mm: damon: Use trace_call__##name() at guarded tracepoint call sites btrfs: Use trace_call__##name() at guarded tracepoint call sites spi: Use trace_call__##name() at guarded tracepoint call sites i2c: Use trace_call__##name() at guarded tracepoint call sites kernel: Use trace_call__##name() at guarded tracepoint call sites tracepoint: Add trace_call__##name() API tracing: trace_mmap.h: fix a kernel-doc warning tracing: Pretty-print enum parameters in function arguments ...
11 days	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	-341/+1876
	Pull kvm updates from Paolo Bonzini: "Arm: - Add support for tracing in the standalone EL2 hypervisor code, which should help both debugging and performance analysis. This uses the new infrastructure for 'remote' trace buffers that can be exposed by non-kernel entities such as firmware, and which came through the tracing tree - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting point for supporting the new GIC architecture in KVM - Finally add support for pKVM protected guests, where pages are unmapped from the host as they are faulted into the guest and can be shared back from the guest using pKVM hypercalls. Protected guests are created using a new machine type identifier. As the elusive guestmem has not yet delivered on its promises, anonymous memory is also supported This is only a first step towards full isolation from the host; for example, the CPU register state and DMA accesses are not yet isolated. Because this does not really yet bring fully what it promises, it is hidden behind CONFIG_ARM_PKVM_GUEST + 'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is created. Caveat emptor - Rework the dreaded user_mem_abort() function to make it more maintainable, reducing the amount of state being exposed to the various helpers and rendering a substantial amount of state immutable - Expand the Stage-2 page table dumper to support NV shadow page tables on a per-VM basis - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow - Fix both SPE and TRBE in non-VHE configurations so that they do not generate spurious, out of context table walks that ultimately lead to very bad HW lockups - A small set of patches fixing the Stage-2 MMU freeing in error cases - Tighten-up accepted SMC immediate value to be only #0 for host SMCCC calls - The usual cleanups and other selftest churn LoongArch: - Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel() - Add DMSINTC irqchip in kernel support RISC-V: - Fix steal time shared memory alignment checks - Fix vector context allocation leak - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi() - Fix double-free of sdata in kvm_pmu_clear_snapshot_area() - Fix integer overflow in kvm_pmu_validate_counter_mask() - Fix shift-out-of-bounds in make_xfence_request() - Fix lost write protection on huge pages during dirty logging - Split huge pages during fault handling for dirty logging - Skip CSR restore if VCPU is reloaded on the same core - Implement kvm_arch_has_default_irqchip() for KVM selftests - Factored-out ISA checks into separate sources - Added hideleg to struct kvm_vcpu_config - Factored-out VCPU config into separate sources - Support configuration of per-VM HGATP mode from KVM user space s390: - Support for ESA (31-bit) guests inside nested hypervisors - Remove restriction on memslot alignment, which is not needed anymore with the new gmap code - Fix LPSW/E to update the bear (which of course is the breaking event address register) x86: - Shut up various UBSAN warnings on reading module parameter before they were initialized - Don't zero-allocate page tables that are used for splitting hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in the page table and thus write all bytes - As an optimization, bail early when trying to unsync 4KiB mappings if the target gfn can just be mapped with a 2MiB hugepage x86 generic: - Copy single-chunk MMIO write values into struct kvm_vcpu (more precisely struct kvm_mmio_fragment) to fix use-after-free stack bugs where KVM would dereference stack pointer after an exit to userspace - Clean up and comment the emulated MMIO code to try to make it easier to maintain (not necessarily "easy", but "easier") - Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not all of VMX and SVM enabling) as it is needed for trusted I/O - Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions - Immediately fail the build if a required #define is missing in one of KVM's headers that is included multiple times - Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected exception, mostly to prevent syzkaller from abusing the uAPI to trigger WARNs, but also because it can help prevent userspace from unintentionally crashing the VM - Exempt SMM from CPUID faulting on Intel, as per the spec - Misc hardening and cleanup changes x86 (AMD): - Fix and optimize IRQ window inhibit handling for AVIC; make it per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple vCPUs have to-be-injected IRQs - Clean up and optimize the OSVW handling, avoiding a bug in which KVM would overwrite state when enabling virtualization on multiple CPUs in parallel. This should not be a problem because OSVW should usually be the same for all CPUs - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a "too large" size based purely on user input - Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as doing so for an SNP guest will crash the host due to an RMP violation page fault - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries are required to hold kvm->lock, and enforce it by lockdep. Fix various bugs where sev_guest() was not ensured to be stable for the whole duration of a function or ioctl - Convert a pile of kvm->lock SEV code to guard() - Play nicer with userspace that does not enable KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6 as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the payload would end up in EXITINFO2 rather than CR2, for example). Only set CR2 and DR6 when consumption of the payload is imminent, but on the other hand force delivery of the payload in all paths where userspace retrieves CR2 or DR6 - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT instead of vmcb02->save.cr2. The value is out of sync after a save/restore or after a #PF is injected into L2 - Fix a class of nSVM bugs where some fields written by the CPU are not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so are not up-to-date when saved by KVM_GET_NESTED_STATE - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly initialized after save+restore - Add a variety of missing nSVM consistency checks - Fix several bugs where KVM failed to correctly update VMCB fields on nested #VMEXIT - Fix several bugs where KVM failed to correctly synthesize #UD or #GP for SVM-related instructions - Add support for save+restore of virtualized LBRs (on SVM) - Refactor various helpers and macros to improve clarity and (hopefully) make the code easier to maintain - Aggressively sanitize fields when copying from vmcb12, to guard against unintentionally allowing L1 to utilize yet-to-be-defined features - Fix several bugs where KVM botched rAX legality checks when emulating SVM instructions. There are remaining issues in that KVM doesn't handle size prefix overrides for 64-bit guests - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails instead of somewhat arbitrarily synthesizing #GP (i.e. don't double down on AMD's architectural but sketchy behavior of generating #GP for "unsupported" addresses) - Cache all used vmcb12 fields to further harden against TOCTOU bugs x86 (Intel): - Drop obsolete branch hint prefixes from the VMX instruction macros - Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a register input when appropriate - Code cleanups guest_memfd: - Don't mark guest_memfd folios as accessed, as guest_memfd doesn't support reclaim, the memory is unevictable, and there is no storage to write back to LoongArch selftests: - Add KVM PMU test cases s390 selftests: - Enable more memory selftests x86 selftests: - Add support for Hygon CPUs in KVM selftests - Fix a bug in the MSR test where it would get false failures on AMD/Hygon CPUs with exactly one of RDPID or RDTSCP - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test for a bug where the kernel would attempt to collapse guest_memfd folios against KVM's will" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (373 commits) KVM: x86: use inlines instead of macros for is_sev_*guest x86/virt: Treat SVM as unsupported when running as an SEV+ guest KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper KVM: SEV: use mutex guard in snp_handle_guest_req() KVM: SEV: use mutex guard in sev_mem_enc_unregister_region() KVM: SEV: use mutex guard in sev_mem_enc_ioctl() KVM: SEV: use mutex guard in snp_launch_update() KVM: SEV: Assert that kvm->lock is held when querying SEV+ support KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe" KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y KVM: SEV: WARN on unhandled VM type when initializing VM KVM: LoongArch: selftests: Add PMU overflow interrupt test KVM: LoongArch: selftests: Add basic PMU event counting test KVM: LoongArch: selftests: Add cpucfg read/write helpers LoongArch: KVM: Add DMSINTC inject msi to vCPU LoongArch: KVM: Add DMSINTC device support LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function LoongArch: KVM: Move host CSR_GSTAT save and restore in context switch LoongArch: KVM: Move host CSR_EENTRY save and restore in context switch ...
12 days	Merge tag 'mm-nonmm-stable-2026-04-15-04-20' of ↵	Linus Torvalds	-77/+494
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "pid: make sub-init creation retryable" (Oleg Nesterov) Make creation of init in a new namespace more robust by clearing away some historical cruft which is no longer needed. Also some documentation fixups - "selftests/fchmodat2: Error handling and general" (Mark Brown) Fix and a cleanup for the fchmodat2() syscall selftest - "lib: polynomial: Move to math/ and clean up" (Andy Shevchenko) - "hung_task: Provide runtime reset interface for hung task detector" (Aaron Tomlin) Give administrators the ability to zero out /proc/sys/kernel/hung_task_detect_count - "tools/getdelays: use the static UAPI headers from tools/include/uapi" (Thomas Weißschuh) Teach getdelays to use the in-kernel UAPI headers rather than the system-provided ones - "watchdog/hardlockup: Improvements to hardlockup" (Mayank Rungta) Several cleanups and fixups to the hardlockup detector code and its documentation - "lib/bch: fix undefined behavior from signed left-shifts" (Josh Law) A couple of small/theoretical fixes in the bch code - "ocfs2/dlm: fix two bugs in dlm_match_regions()" (Junrui Luo) - "cleanup the RAID5 XOR library" (Christoph Hellwig) A quite far-reaching cleanup to this code. I can't do better than to quote Christoph: "The XOR library used for the RAID5 parity is a bit of a mess right now. The main file sits in crypto/ despite not being cryptography and not using the crypto API, with the generic implementations sitting in include/asm-generic and the arch implementations sitting in an asm/ header in theory. The latter doesn't work for many cases, so architectures often build the code directly into the core kernel, or create another module for the architecture code. Change this to a single module in lib/ that also contains the architecture optimizations, similar to the library work Eric Biggers has done for the CRC and crypto libraries later. After that it changes to better calling conventions that allow for smarter architecture implementations (although none is contained here yet), and uses static_call to avoid indirection function call overhead" - "lib/list_sort: Clean up list_sort() scheduling workarounds" (Kuan-Wei Chiu) Clean up this library code by removing a hacky thing which was added for UBIFS, which UBIFS doesn't actually need - "Fix bugs in extract_iter_to_sg()" (Christian Ehrhardt) Fix a few bugs in the scatterlist code, add in-kernel tests for the now-fixed bugs and fix a leak in the test itself - "kdump: Enable LUKS-encrypted dump target support in ARM64 and PowerPC" (Coiby Xu) Enable support of the LUKS-encrypted device dump target on arm64 and powerpc - "ocfs2: consolidate extent list validation into block read callbacks" (Joseph Qi) Cleanup, simplify, and make more robust ocfs2's validation of extent list fields (Kernel test robot loves mounting corrupted fs images!) * tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (127 commits) ocfs2: validate group add input before caching ocfs2: validate bg_bits during freefrag scan ocfs2: fix listxattr handling when the buffer is full doc: watchdog: fix typos etc update Sean's email address ocfs2: use get_random_u32() where appropriate ocfs2: split transactions in dio completion to avoid credit exhaustion ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path() ocfs2: validate extent block list fields during block read ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec() ocfs2: validate dx_root extent list fields during block read ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY ocfs2: handle invalid dinode in ocfs2_group_extend .get_maintainer.ignore: add Askar ocfs2: validate bg_list extent bounds in discontig groups checkpatch: exclude forward declarations of const structs tools/accounting: handle truncated taskstats netlink messages taskstats: set version in TGID exit notifications ocfs2/heartbeat: fix slot mapping rollback leaks on error paths arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel ...
12 days	libbpf: Prevent double close and leak of btf objects	Jiri Olsa	-10/+11
	Sashiko found possible double close of btf object fd [1], which happens when strdup in load_module_btfs fails at which point the obj->btf_module_cnt is already incremented. The error path close btf fd and so does later cleanup code in bpf_object_post_load_cleanup function. Also libbpf_ensure_mem failure leaves btf object not assigned and it's leaked. Replacing the err_out label with break to make the error path less confusing as suggested by Alan. Incrementing obj->btf_module_cnt only if there's no failure and releasing btf object in error path. Fixes: 91abb4a6d79d ("libbpf: Support attachment of BPF tracing programs to kernel modules") [1] https://sashiko.dev/#/patchset/20260324081846.2334094-1-jolsa%40kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260416100034.1610852-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
12 days	selftests/bpf: cover UTF-8 trace_printk output	Yihan Ding	-6/+32
	Extend trace_printk coverage to verify that UTF-8 literal text is emitted successfully and that '%' parsing still rejects non-ASCII bytes once format parsing starts. Use an explicitly invalid format string for the negative case so the ASCII-only parser expectation is visible from the test code itself. Signed-off-by: Yihan Ding <dingyihan@uniontech.com> Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/20260416120142.1420646-3-dingyihan@uniontech.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
12 days	bpf: allow UTF-8 literals in bpf_bprintf_prepare()	Yihan Ding	-1/+2
	bpf_bprintf_prepare() only needs ASCII parsing for conversion specifiers. Plain text can safely carry bytes >= 0x80, so allow UTF-8 literals outside '%' sequences while keeping ASCII control bytes rejected and format specifiers ASCII-only. This keeps existing parsing rules for format directives unchanged, while allowing helpers such as bpf_trace_printk() to emit UTF-8 literal text. Update test_snprintf_negative() in the same commit so selftests keep matching the new plain-text vs format-specifier split during bisection. Fixes: 48cac3f4a96d ("bpf: Implement formatted output helpers with bstr_printf") Signed-off-by: Yihan Ding <dingyihan@uniontech.com> Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/20260416120142.1420646-2-dingyihan@uniontech.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
12 days	selftests/bpf: Reject scalar store into kptr slot	Mykyta Yatsenko	-0/+15
	Verify that the verifier rejects a direct scalar write to a kptr map value slot without crashing. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260416-kptr_crash-v1-2-5589356584b4@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
12 days	Merge tag 'livepatching-for-7.1' of ↵	Linus Torvalds	-0/+230
	git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching Pull livepatching updates from Petr Mladek: - Add two new selftests * tag 'livepatching-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: selftests/livepatch: add test for module function patching selftests: livepatch: test-ftrace: livepatch a traced function
12 days	Merge tag 'vfio-v7.1-rc1' of https://github.com/awilliam/linux-vfio	Linus Torvalds	-5/+17
	Pull VFIO updates from Alex Williamson: - Update QAT vfio-pci variant driver for Gen 5, 420xx devices (Vijay Sundar Selvamani, Suman Kumar Chakraborty, Giovanni Cabiddu) - Fix vfio selftest MMIO DMA mapping selftest (Alex Mastro) - Conversions to const struct class in support of class_create() deprecation (Jori Koolstra) - Improve selftest compiler compatibility by avoiding initializer on variable-length array (Manish Honap) - Define new uAPI for drivers supporting migration to advise user- space of new initial data for reducing target startup latency. Implemented for mlx5 vfio-pci variant driver (Yishai Hadas) - Enable vfio selftests on aarch64, not just cross-compiles reporting arm64 (Ted Logan) - Update vfio selftest driver support to include additional DSA devices (Yi Lai) - Unconditionally include debugfs root pointer in vfio device struct, avoiding a build failure seen in hisi_acc variant driver without debugfs otherwise (Arnd Bergmann) - Add support for the s390 ISM (Internal Shared Memory) device via a new variant driver. The device is unique in the size of its BAR space (256TiB) and lack of mmap support (Julian Ruess) - Enforce that vfio-pci drivers implement a name in their ops structure for use in sequestering SR-IOV VFs (Alex Williamson) - Prune leftover group notifier code (Paolo Bonzini) - Fix Xe vfio-pci variant driver to avoid migration support as a dependency in the reset path and missing release call (Michał Winiarski) * tag 'vfio-v7.1-rc1' of https://github.com/awilliam/linux-vfio: (23 commits) vfio/xe: Add a missing vfio_pci_core_release_dev() vfio/xe: Reorganize the init to decouple migration from reset vfio: remove dead notifier code vfio/pci: Require vfio_device_ops.name MAINTAINERS: add VFIO ISM PCI DRIVER section vfio/ism: Implement vfio_pci driver for ISM devices vfio/pci: Rename vfio_config_do_rw() to vfio_pci_config_rw_single() and export it vfio: unhide vdev->debug_root vfio/qat: add support for Intel QAT 420xx VFs vfio: selftests: Support DMR and GNR-D DSA devices vfio: selftests: Build tests on aarch64 vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO vfio/mlx5: consider inflight SAVE during PRE_COPY net/mlx5: Add IFC bits for migration state vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase vfio: selftests: Fix VLA initialisation in vfio_pci_irq_set() vfio: uapi: fix comment typo vfio: mdev: replace mtty_dev->vd_class with a const struct class ...
13 days	Merge branch 'for-7.1/module-function-test' into for-linus	Petr Mladek	-0/+194

13 days	Merge tag 'v7.0-rc6' into perf-tools	Namhyung Kim	-36/+686
	To get the latest updates and fixes. Signed-off-by: Namhyung Kim <namhyung@kernel.org>
13 days	Merge tag 'trace-rtla-v7.1' of ↵	Linus Torvalds	-402/+736
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull RTLA updates from Steven Rostedt: - Simplify option parsing Auto-generate getopt_long() optstring for short options from long options array, avoiding the need to specify it manually and reducing the surface for mistakes. - Add unit tests Implement unit tests (make unit-tests) using libcheck, next to existing runtime tests (make check). Currently, three functions from utils.c are tested. - Add --stack-format option In addition to stopping stack pointer decoding (with -s/--stack option) on first unresolvable pointer, allow also skipping unresolvable pointers and displaying everything, configurable with a new option. - Unify number of CPUs into one global variable Use one global variable, nr_cpus, to store the number of CPUs instead of retrieving it and passing it at multiple places. - Fix behavior in various corner cases Make RTLA behave correctly in several corner cases: memory allocation failure, invalid value read from kernel side, thread creation failure, malformed time value input, and read/write failure or interruption by signal. - Improve string handling Simplify several places in the code that handle strings, including parsing of action arguments. A few new helper functions and variables are added for that purpose. - Get rid of magic numbers Few places handling paths use a magic number of 1024. Replace it with MAX_PATH and ARRAY_SIZE() macro. - Unify threshold handling Code that handles response to latency threshold is duplicated between tools, which has led to bugs in the past. Unify it into a new helper as much as possible. - Fix segfault on SIGINT during cleanup The SIGINT handler touches dynamically allocated memory. Detach it before freeing it during cleanup to prevent segmentation fault and discarding of output buffers. Also, properly document SIGINT handling while at it. * tag 'trace-rtla-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (28 commits) Documentation/rtla: Document SIGINT behavior rtla: Fix segfault on multiple SIGINTs rtla/utils: Fix loop condition in PID validation rtla/utils: Fix resource leak in set_comm_sched_attr() rtla/trace: Fix I/O handling in save_trace_to_file() rtla/trace: Fix write loop in trace_event_save_hist() rtla/timerlat: Simplify RTLA_NO_BPF environment variable check rtla: Use str_has_prefix() for option prefix check rtla: Enforce exact match for time unit suffixes rtla: Use str_has_prefix() for prefix checks rtla: Add str_has_prefix() helper function rtla: Handle pthread_create() failure properly rtla/timerlat: Add bounds check for softirq vector rtla: Simplify code by caching string lengths rtla: Replace magic number with MAX_PATH rtla: Introduce common_threshold_handler() helper rtla/actions: Simplify argument parsing rtla: Use strdup() to simplify code rtla: Exit on memory allocation failures during initialization tools/rtla: Remove unneeded nr_cpus from for_each_monitored_cpu ...
13 days	selftests/bpf: Extend bpf_iter_unix to attempt deadlocking	Michal Luczaj	-0/+10
	Updating a sockmap from a unix iterator prog may lead to a deadlock. Piggyback on the original selftest. Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260414-unix-proto-update-null-ptr-deref-v4-3-2af6fe97918e@rbox.co
13 days	Merge tag 'trace-rv-v7.1' of ↵	Linus Torvalds	-245/+982
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull runtime verification updates from Steven Rostedt: - Refactor da_monitor header to share handlers across monitor types No functional changes, only less code duplication. - Add Hybrid Automata model class Add a new model class that extends deterministic automata by adding constraints on transitions and states. Those constraints can take into account wall-clock time and as such allow RV monitor to make assertions on real time. Add documentation and code generation scripts. - Add stall monitor as hybrid automaton example Add a monitor that triggers a violation when a task is stalling as an example of automaton working with real time variables. - Convert the opid monitor to a hybrid automaton The opid monitor can be heavily simplified if written as a hybrid automaton: instead of tracking preempt and interrupt enable/disable events, it can just run constraints on the preemption/interrupt states when events like wakeup and need_resched verify. - Add support for per-object monitors in DA/HA Allow writing deterministic and hybrid automata monitors for generic objects (e.g. any struct), by exploiting a hash table where objects are saved. This allows to track more than just tasks in RV. For instance it will be used to track deadline entities in deadline monitors. - Add deadline tracepoints and move some deadline utilities Prepare the ground for deadline monitors by defining events and exporting helpers. - Add nomiss deadline monitor Add first example of deadline monitor asserting all entities complete before their deadline. - Improve rvgen error handling Introduce AutomataError exception class and better handle expected exceptions while showing a backtrace for unexpected ones. - Improve python code quality in rvgen Refactor the rvgen generation scripts to align with python best practices: use f-strings instead of %, use len() instead of __len__(), remove semicolons, use context managers for file operations, fix whitespace violations, extract magic strings into constants, remove unused imports and methods. - Fix small bugs in rvgen The generator scripts presented some corner case bugs: logical error in validating what a correct dot file looks like, fix an isinstance() check, enforce a dot file has an initial state, fix type annotations and typos in comments. - rvgen refactoring Refactor automata.py to use iterator-based parsing and handle required arguments directly in argparse. - Allow epoll in rtapp-sleep monitor The epoll_wait call is now rt-friendly so it should be allowed in the sleep monitor as a valid sleep method. * tag 'trace-rv-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (32 commits) rv: Allow epoll in rtapp-sleep monitor rv/rvgen: fix _fill_states() return type annotation rv/rvgen: fix unbound loop variable warning rv/rvgen: enforce presence of initial state rv/rvgen: extract node marker string to class constant rv/rvgen: fix isinstance check in Variable.expand() rv/rvgen: make monitor arguments required in rvgen rv/rvgen: remove unused __get_main_name method rv/rvgen: remove unused sys import from dot2c rv/rvgen: refactor automata.py to use iterator-based parsing rv/rvgen: use class constant for init marker rv/rvgen: fix DOT file validation logic error rv/rvgen: fix PEP 8 whitespace violations rv/rvgen: fix typos in automata and generator docstring and comments rv/rvgen: use context managers for file operations rv/rvgen: remove unnecessary semicolons rv/rvgen: replace __len__() calls with len() rv/rvgen: replace % string formatting with f-strings rv/rvgen: remove bare except clauses in generator rv/rvgen: introduce AutomataError exception class ...
13 days	Merge tag 'ktest-v7.1' of ↵	Linus Torvalds	-42/+127
	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest Pull ktest updates from Steven Rostedt: - Fix undef warning when WARNINGS_FILE is unset The check_buildlog() references WARNINGS_FILE even when it's not set. Perl triggers a warning in this case. Check if the WARNINGS_FILE is defined before checking if the file it represents exists. - Fix how LOG_FILE is resolved LOG_FILE is expanded immediately after the config file is parsed. If LOG_FILE depends on variables from the tests it will use stale values instead of using the test variables. Have LOG_FILE also resolve test variables. - Treat a undefined self reference variable as empty Variables can recursively include itself for appending. Currently, if the references itself and it is not defined, it leaves the variable in the define: "VAR = ${VAR} foo" keeps the ${VAR} around. Have it removed instead. - Fix clearing of variables per tests If a variable has a defined default, a test can not clear it by assigning the variable to empty. Fix this by clearing the variable for a test when the test config has that variable assigned to nothing. - Fix run_command() to catch stderr in the shell command parsing Switch to Perl list form open to use "sh -c" wrapper to run shell commands to have the log file catch shell parsing errors. - Fix console output during reboot cycle The POWER_CYCLE callback during reboot() can miss output from the next boot making ktest miss the boot string it was waiting for. - Add PRE_KTEST_DIE for PRE_KTEST failures If the command for PRE_KTEST fails, ktest does not fail (this was by design as this command was used to add patches that may or may not apply). Add PRE_KTEST_DIE value to force ktest to fail if PRE_KTEST fails. - Run POST_KTEST hooks on failure and cancellation PRE_KTEST always runs before a ktest test, have POST_KTEST always run after a test even if the test fails or is cancelled to do the teardown of PRE_KTEST. - Add a --dry-run mode Add --dry-run to parse the config, print the results and exit without running any of the tests. - Store failures from the dodie() path as well The STORE_FAILURES saves the logs on failure, but there's failure paths that miss storing. Perform STORE_FAILURES in dodie() to capture these failures too. * tag 'ktest-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest: ktest: Store failure logs also in fatal paths ktest: Add a --dry-run mode ktest: Run POST_KTEST hooks on failure and cancellation ktest: Add PRE_KTEST_DIE for PRE_KTEST failures ktest: Stop dropping console output during power-cycle reboot ktest: Run commands through list-form shell open ktest: Honor empty per-test option overrides ktest: Treat undefined self-reference as empty ktest: Resolve LOG_FILE in test option context ktest: Avoid undef warning when WARNINGS_FILE is unset
13 days	selftests/bpf: verify kallsyms entries for token-loaded subprograms	Eduard Zingerman	-23/+149
	Add a test that loads an XDP program with a global subprogram using a BPF token from a user namespace, then verifies that both the main program and the subprogram appear in /proc/kallsyms. This exercises the bpf_prog_kallsyms_add() path for subprograms and would have caught the missing aux->token copy in bpf_jit_subprogs(). load_kallsyms_local() filters out kallsyms with zero addresses. For a process with limited capabilities to read kallsym addresses the following sysctl variables have to be set to zero: - /proc/sys/kernel/perf_event_paranoid - /proc/sys/kernel/kptr_restrict Set these variables using sysctl_set() utility function extracted from unpriv_bpf_disabled.c to a separate c/header. Since the test modifies global system state, mark it as serial. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260415-subprog-token-fix-v4-2-9bd000e8b068@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days	Merge tag 'trace-ringbuffer-v7.1' of ↵	Linus Torvalds	-0/+498
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer updates from Steven Rostedt: - Add remote buffers for pKVM pKVM has a hypervisor component that is used to protect the guest from the host kernel. This hypervisor is a black box to the kernel as the kernel is to user space. The remote buffers are used to have a memory mapping between the hypervisor and the kernel where kernel may send commands to enable tracing within the hypervisor. Then the kernel will read this memory mapping just like user space can read the memory mapped ring buffer of the kernel tracing system. Since the hypervisor only has a single context, it doesn't need to worry about races between normal context, interrupt context and NMIs like the kernel does. The ring buffer it uses doesn't need to be as complex. The remote buffers are a simple version of the ring buffer that works in a single context. They are still per-CPU and use sub buffers. The data layout is the same as the kernel's ring buffer to share the same parsing. Currently, only ARM64 implements pKVM, but there's work to implement it also in x86. The remote buffer code is separated out from the ARM implementation so that it can be used in the future by x86. The ARM64 updates for pKVM is in the ARM/KVM tree and it merged in the remote buffers of this tree. - Make the backup instance non reusable The backup instance is a copy of the persistent ring buffer so that the persistent ring buffer could start recording again without using the data from the previous boot. The backup isn't for normal tracing. It is made read-only, and after it is consumed, it is automatically removed. - Have backup copy persistent instance before it starts recording To allow the persistent ring buffer to start recording from the kernel command line commands, move the copy of the backup instance to before the the command line options start recording. - Report header_page overwrite field as "char" and not "int' The rust parser of the header_page file was triggering a warning when it defined the overwrite variable as "int" but it was only a single byte in size. - Fix memory barriers for the trace_buffer CPU mask When a CPU comes online, the bit is set to allow readers to know that the CPU buffer is allocated. The bit is set after the allocation is done, and a smp_wmb() is performed after the allocation and before the setting of the bit. But instead of adding a smp_rmb() to all readers, since once a buffer is created for a CPU it is not deleted if that CPU goes offline, so this allocation is almost always done at boot up before any readers exist. If for the unlikely case where a CPU comes online for the first time after the system boot has finished, send an IPI to all CPUs to force the smp_rmb() for each CPU. - Show clock function being used in debugging ring buffer data When the ring buffer checks are enabled and the ring buffer detects an inconsistency in the times of the invents, print out the clock being used when the error occurred. There was a very hard to hit bug that would happen every so often and it ended up being only triggered when the jiffies clock was being used. If the bug showed the clock being used, it would have been much easier to find the problem (which was an internal function was being traced which caused the clock accounting to go off). * tag 'trace-ringbuffer-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits) ring-buffer: Prevent off-by-one array access in ring_buffer_desc_page() ring-buffer: Report header_page overwrite as char tracing: Allow backup to save persistent ring buffer before it starts tracing/Documentation: Add a section about backup instance tracing: Remove the backup instance automatically after read tracing: Make the backup instance non-reusable ring-buffer: Enforce read ordering of trace_buffer cpumask and buffers ring-buffer: Show what clock function is used on timestamp errors tracing: Check for undefined symbols in simple_ring_buffer tracing: load/unload page callbacks for simple_ring_buffer Documentation: tracing: Add tracing remotes tracing: selftests: Add trace remote tests tracing: Add a trace remote module for testing tracing: Introduce simple_ring_buffer ring-buffer: Export buffer_data_page and macros tracing: Add helpers to create trace remote events tracing: Add events/ root files to trace remotes tracing: Add events to trace remotes tracing: Add init callback to trace remotes tracing: Add non-consuming read to trace remotes ...
13 days	Merge tag 'iommu-updates-v7.1' of ↵	Linus Torvalds	-0/+27
	git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core: - Support for RISC-V IO-page-table format in generic iommupt code ARM-SMMU Updates: - Introduction of an "invalidation array" for SMMUv3, which enables future scalability work and optimisations for devices with a large number of SMMUv3 instances - Update the conditions under which the SMMUv3 driver works around hardware errata for invalidation on MMU-700 implementations - Fix broken command filtering for the host view of NVIDIA's "cmdqv" SMMUv3 extension - MMU-500 device-tree binding additions for Qualcomm Eliza & Hawi SoCs Intel VT-d: - Support for dirty tracking on domains attached to PASID - Removal of unnecessary read()/write() wrappers - Improvements to the invalidation paths AMD Vi: - Race-condition fixed in debugfs code - Make log buffer allocation NUMA aware RISC-V: - IO-TLB flushing improvements - Minor fixes" * tag 'iommu-updates-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (48 commits) iommu/vt-d: Restore IOMMU_CAP_CACHE_COHERENCY dt-bindings: arm-smmu: qcom: Add compatible for Hawi SoC iommu/amd: Invalidate IRT cache for DMA aliases iommu/riscv: Remove overflows on the invalidation path iommu/amd: Fix clone_alias() to use the original device's devid iommu/vt-d: Remove the remaining pages along the invalidation path iommu/vt-d: Pass size_order to qi_desc_piotlb() not npages iommu/vt-d: Split piotlb invalidation into range and all iommu/vt-d: Remove dmar_writel() and dmar_writeq() iommu/vt-d: Remove dmar_readl() and dmar_readq() iommufd/selftest: Test dirty tracking on PASID iommu/vt-d: Support dirty tracking on PASID iommu/vt-d: Rename device_set_dirty_tracking() and pass dmar_domain pointer iommu/vt-d: Block PASID attachment to nested domain with dirty tracking iommu/dma: Always allow DMA-FQ when iommupt provides the iommu_domain iommu/riscv: Fix signedness bug iommu/amd: Fix illegal cap/mmio access in IOMMU debugfs iommu/amd: Fix illegal device-id access in IOMMU debugfs iommu/tegra241-cmdqv: Update uAPI to clarify HYP_OWN requirement iommu/tegra241-cmdqv: Set supports_cmd op in tegra241_vcmdq_hw_init() ...
13 days	Merge tag 'pci-v7.1-changes' of ↵	Linus Torvalds	-0/+8
	git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Allow TLP Processing Hints to be enabled for RCiEPs (George Abraham P) - Enable AtomicOps only if we know the Root Port supports them (Gerd Bayer) - Don't enable AtomicOps for RCiEPs since none of them need Atomic Ops and we can't tell whether the Root Complex would support them (Gerd Bayer) - Leave Precision Time Measurement disabled until a driver enables it to avoid PCIe errors (Mika Westerberg) - Make pci_set_vga_state() fail if bridge doesn't support VGA routing, i.e., PCI_BRIDGE_CTL_VGA is not writable, and return errors to vga_get() callers including userspace via /dev/vga_arbiter (Simon Richter) - Validate max-link-speed from DT in j721e, brcmstb, mediatek-gen3, rzg3s drivers (where the actual controller constraints are known), and remove validation from the generic OF DT accessor (Hans Zhang) - Remove pc110pad driver (no longer useful after 486 CPU support removed) and no_pci_devices() (pc110pad was the last user) (Dmitry Torokhov, Heiner Kallweit) Resource management: - Prevent assigning space to unimplemented bridge windows; previously we mistakenly assumed prefetchable window existed and assigned space and put a BAR there (Ahmed Naseef) - Avoid shrinking bridge windows to fit in the initial Root Port window; fixes one problem with devices with large BARs connected via switches, e.g., Thunderbolt (Ilpo Järvinen) - Pass full extent of empty space, not just the aligned space, to resource_alignf callback so free space before the requested alignment can be used (Ilpo Järvinen) - Place small resources before larger ones for better utilization of address space (Ilpo Järvinen) - Fix alignment calculation for resource size larger than align, e.g., bridge windows larger than the 1MB required alignment (Ilpo Järvinen) Reset: - Update slot handling so all ARI functions are treated as being in the same slot. They're all reset by Secondary Bus Reset, but previously drivers of ARI functions that appeared to be on a non-zero device weren't notified and fatal hardware errors could result (Keith Busch) - Make sysfs reset_subordinate hotplug safe to avoid spurious hotplug events (Keith Busch) - Hide Secondary Bus Reset ('bus') from sysfs reset_methods if masked by CXL because it has no effect (Vidya Sagar) - Avoid FLR for AMD NPU device, where it causes the device to hang (Lizhi Hou) Error handling: - Clear only error bits in PCIe Device Status to avoid accidentally clearing Emergency Power Reduction Detected (Shuai Xue) - Check for AER errors even in devices without drivers (Lukas Wunner) - Initialize ratelimit info so DPC and EDR paths log AER error information (Kuppuswamy Sathyanarayanan) Power control: - Add UPD720201/UPD720202 USB 3.0 xHCI Host Controller .compatible so generic pwrctrl driver can control it (Neil Armstrong) Hotplug: - Set LED_HW_PLUGGABLE for NPEM hotplug-capable ports so LED core doesn't complain when setting brightness fails because the endpoint is gone (Richard Cheng) Peer-to-peer DMA: - Allow wildcards in list of host bridges that support peer-to-peer DMA between hierarchy domains and add all Google SoCs (Jacob Moroni) Endpoint framework: - Advertise dynamic inbound mapping support in pci-epf-test and update host pci_endpoint_test to skip doorbell testing if not advertised by endpoint (Koichiro Den) - Return 0, not remaining timeout, when MHI eDMA ops complete so mhi_ep_ring_add_element() doesn't interpret non-zero as failure (Daniel Hodges) - Remove vntb and ntb duplicate resource teardown that leads to oops when .allow_link() fails or .drop_link() is called (Koichiro Den) - Disable vntb delayed work before clearing BAR mappings and doorbells to avoid oops caused by doing the work after resources have been torn down (Koichiro Den) - Add a way to describe reserved subregions within BARs, e.g., platform-owned fixed register windows, and use it for the RK3588 BAR4 DMA ctrl window (Koichiro Den) - Add BAR_DISABLED for BARs that will never be available to an EPF driver, and change some BAR_RESERVED annotations to BAR_DISABLED (Niklas Cassel) - Add NTB .get_dma_dev() callback for cases where DMA API requires a different device, e.g., vNTB devices (Koichiro Den) - Add reserved region types for MSI-X Table and PBA so Endpoint controllers can them as describe hardware-owned regions in a BAR_RESERVED BAR (Manikanta Maddireddy) - Make Tegra194/234 BAR0 programmable and remove 1MB size limit (Manikanta Maddireddy) - Expose Tegra BAR2 (MSI-X) and BAR4 (DMA) as 64-bit BAR_RESERVED (Manikanta Maddireddy) - Add Tegra194 and Tegra234 device table entries to pci_endpoint_test (Manikanta Maddireddy) - Skip the BAR subrange selftest if there are not enough inbound window resources to run the test (Christian Bruel) New native PCIe controller drivers: - Add DT binding and driver for Andes QiLai SoC PCIe host controller (Randolph Lin) - Add DT binding and driver for ESWIN PCIe Root Complex (Senchuan Zhang) Baikal T-1 PCIe controller driver: - Remove driver since it never quite became usable (Andy Shevchenko) Cadence PCIe controller driver: - Implement byte/word config reads with dword (32-bit) reads because some Cadence controllers don't support sub-dword accesses (Aksh Garg) CIX Sky1 PCIe controller driver: - Add 'power-domains' to DT binding for SCMI power domain (Gary Yang) Freescale i.MX6 PCIe controller driver: - Add i.MX94 and i.MX943 to fsl,imx6q-pcie-ep DT binding (Richard Zhu) - Delay instead of polling for L2/L3 Ready after PME_Turn_off when suspending i.MX6SX because LTSSM registers are inaccessible (Richard Zhu) - Separate PERST# assertion (for resetting endpoints) from core reset (for resetting the RC itself) to prepare for new DTs with PERST# GPIO in per-Root Port nodes (Sherry Sun) - Retain Root Port MSI capability on i.MX7D, i.MX8MM, and i.MX8MQ so MSI from downstream devices will work (Richard Zhu) - Fix i.MX95 reference clock source selection when internal refclk is used (Franz Schnyder) Freescale Layerscape PCIe controller driver: - Allow building as a removable module (Sascha Hauer) MediaTek PCIe Gen3 controller driver: - Use dev_err_probe() to simplify error paths and make deferred probe messages visible in /sys/kernel/debug/devices_deferred (Chen-Yu Tsai) - Power off device if setup fails (Chen-Yu Tsai) - Integrate new pwrctrl API to enable power control for WiFi/BT adapters on mainboard or in PCIe or M.2 slots (Chen-Yu Tsai) NVIDIA Tegra194 PCIe controller driver: - Poll less aggressively and non-atomically for PME_TO_Ack during transition to L2 (Vidya Sagar) - Disable LTSSM after transition to Detect on surprise link down to stop toggling between Polling and Detect (Manikanta Maddireddy) - Don't force the device into the D0 state before L2 when suspending or shutting down the controller (Vidya Sagar) - Disable PERST# IRQ only in Endpoint mode because it's not registered in Root Port mode (Manikanta Maddireddy) - Handle 'nvidia,refclk-select' as optional (Vidya Sagar) - Disable direct speed change in Endpoint mode so link speed change is controlled by the host (Vidya Sagar) - Set LTR values before link up to avoid bogus LTR messages with 0 latency (Vidya Sagar) - Allow system suspend when the Endpoint link is down (Vidya Sagar) - Use DWC IP core version, not Tegra custom values, to avoid DWC core version check warnings (Manikanta Maddireddy) - Apply ECRC workaround to devices based on DesignWare 5.00a as well as 4.90a (Manikanta Maddireddy) - Disable PM Substate L1.2 in Endpoint mode to work around Tegra234 erratum (Vidya Sagar) - Delay post-PERST# cleanup until core is powered on to avoid CBB timeout (Manikanta Maddireddy) - Assert CLKREQ# so switches that forward it to their downstream side can bring up those links successfully (Vidya Sagar) - Calibrate pipe to UPHY for Endpoint mode to reset stale PLL state from any previous bad link state (Vidya Sagar) - Remove IRQF_ONESHOT flag from Endpoint interrupt registration so DMA driver and Endpoint controller driver can share the interrupt line (Vidya Sagar) - Enable DMA interrupt to support DMA in both Root Port and Endpoint modes (Vidya Sagar) - Enable hardware link retraining after link goes down in Endpoint mode (Vidya Sagar) - Add DT binding and driver support for core clock monitoring (Vidya Sagar) Qualcomm PCIe controller driver: - Advertise 'Hot-Plug Capable' and set 'No Command Completed Support' since Qcom Root Ports support hotplug events like DL_Up/Down and can accept writes to Slot Control without delays between writes (Krishna Chaitanya Chundru) Renesas R-Car PCIe controller driver: - Mark Endpoint BAR0 and BAR2 as Resizable (Koichiro Den) - Reduce EPC BAR alignment requirement to 4K (Koichiro Den) Renesas RZ/G3S PCIe controller driver: - Add RZ/G3E to DT binding and to driver (John Madieu) - Assert (not deassert) resets in probe error path (John Madieu) - Assert resets in suspend path in reverse order they were deasserted during probe (John Madieu) - Rework inbound window algorithm to prevent mapping more than intended region and enforce alignment on size, to prepare for RZ/G3E support (John Madieu) Rockchip DesignWare PCIe controller driver: - Add tracepoints for PCIe controller LTSSM transitions and link rate changes (Shawn Lin) - Trace LTSSM events collected by the dw-rockchip debug FIFO (Shawn Lin) SOPHGO PCIe controller driver: - Disable ASPM L0s and L1 on Sophgo 2042 PCIe Root Ports that advertise support for them (Yao Zi) Synopsys DesignWare PCIe controller driver: - Continue with system suspend even if an Endpoint doesn't respond with PME_TO_Ack message (Manivannan Sadhasivam) - Set Endpoint MSI-X Table Size in the correct function of a multi-function device when configuring MSI-X, not in Function 0 (Aksh Garg) - Set Max Link Width and Max Link Speed for all functions of a multi-function device, not just Function 0 (Aksh Garg) - Expose PCIe event counters in groups 5-7 in debugfs (Hans Zhang) Miscellaneous: - Warn only once about invalid ACS kernel parameter format (Richard Cheng) - Suppress FW_BUG warning when writing sysfs 'numa_node' with the current value (Li RongQing) - Drop redundant 'depends on PCI' from Kconfig (Julian Braha)" * tag 'pci-v7.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (165 commits) PCI/P2PDMA: Add Google SoCs to the P2P DMA host bridge list PCI/P2PDMA: Allow wildcard Device IDs in host bridge list PCI: sg2042: Avoid L0s and L1 on Sophgo 2042 PCIe Root Ports PCI: cadence: Add flags for disabling ASPM capability for broken Root Ports PCI: tegra194: Add core monitor clock support dt-bindings: PCI: tegra194: Add monitor clock support PCI: tegra194: Enable hardware hot reset mode in Endpoint mode PCI: tegra194: Enable DMA interrupt PCI: tegra194: Remove IRQF_ONESHOT flag during Endpoint interrupt registration PCI: tegra194: Calibrate pipe to UPHY for Endpoint mode PCI: tegra194: Assert CLKREQ# explicitly by default PCI: tegra194: Fix CBB timeout caused by DBI access before core power-on PCI: tegra194: Disable L1.2 capability of Tegra234 EP PCI: dwc: Apply ECRC workaround to DesignWare 5.00a as well PCI: tegra194: Use DWC IP core version PCI: tegra194: Free up Endpoint resources during remove() PCI: tegra194: Allow system suspend when the Endpoint link is not up PCI: tegra194: Set LTR message request before PCIe link up in Endpoint mode PCI: tegra194: Disable direct speed change for Endpoint mode PCI: tegra194: Use devm_gpiod_get_optional() to parse "nvidia,refclk-select" ...
13 days	Merge tag 'locking_futex_for_v7.1_rc1' of ↵	Linus Torvalds	-44/+16
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull futex selftest updates from Borislav Petkov: - Correct the version guard for the futex_numa_mpol test to require libnuma 2.0.18 instead of 2.0.16, which is the version that actually introduced numa_set_mempolicy_home_node() used by the test - Allow the futex_numa_mpol selftest to build and run on systems without libnuma installed with affected test gracefully being skipped instead of failing to compile - Use the proper assertion macros so that individual sub-test failures are correctly propagated and the test suite reports failure when something goes wrong * tag 'locking_futex_for_v7.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/futex: Bump up libnuma version check selftests/futex: Conditionally include libnuma support selftests/futex: Fix incorrect result reporting of futex_requeue test item
13 days	Merge tag 'mm-stable-2026-04-13-21-45' of ↵	Linus Torvalds	-412/+1747
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel\|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
13 days	selftests/bpf: Test small task local data allocation	Amery Hung	-4/+74
	Make sure task local data is working correctly for different allocation sizes. Existing task local data selftests allocate the maximum amount of data possible but miss the garbage data issue when only small amount of data is allocated. Therefore, test small data allocations as well. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260413190259.358442-4-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days	selftests/bpf: Fix tld_get_data() returning garbage data	Amery Hung	-4/+11
	BPF side tld_get_data() currently may return garbage when tld_data_u is not aligned to page_size. This can happen when small amount of memory is allocated for tld_data_u. The misalignment is supposed to be allowed and the BPF side will use tld_data_u->start to reference the tld_data_u in a page. However, since "start" is within tld_data_u, there is no way to know the correct "start" in the first place. As a result, BPF programs will see garbage data. The selftest did not catch this since it tries to allocate the maximum amount of data possible (i.e., a page) such that tld_data_u->start is always correct. Fix it by moving tld_data_u->start to tld_data_map->start. The original field is now renamed as unused instead of removing it because BPF side tld_get_data() views off = 0 returned from tld_fetch_key() as uninitialized. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260413190259.358442-3-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days	selftests/bpf: Prevent allocating data larger than a page	Amery Hung	-6/+15
	Fix a bug in the task local data library that may allocate more than a a page for tld_data_u. This may happen when users set a too large TLD_DYN_DATA_SIZE, so check it when creating dynamic TLD fields and fix the corresponding selftest. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260413190259.358442-2-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days	Merge tag 'sched_ext-for-7.1' of ↵	Linus Torvalds	-168/+1463
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext updates from Tejun Heo: - cgroup sub-scheduler groundwork Multiple BPF schedulers can be attached to cgroups and the dispatch path is made hierarchical. This involves substantial restructuring of the core dispatch, bypass, watchdog, and dump paths to be per-scheduler, along with new infrastructure for scheduler ownership enforcement, lifecycle management, and cgroup subtree iteration The enqueue path is not yet updated and will follow in a later cycle - scx_bpf_dsq_reenq() generalized to support any DSQ including remote local DSQs and user DSQs Built on top of this, SCX_ENQ_IMMED guarantees that tasks dispatched to local DSQs either run immediately or get reenqueued back through ops.enqueue(), giving schedulers tighter control over queueing latency Also useful for opportunistic CPU sharing across sub-schedulers - ops.dequeue() was only invoked when the core knew a task was in BPF data structures, missing scheduling property change events and skipping callbacks for non-local DSQ dispatches from ops.select_cpu() Fixed to guarantee exactly one ops.dequeue() call when a task leaves BPF scheduler custody - Kfunc access validation moved from runtime to BPF verifier time, removing runtime mask enforcement - Idle SMT sibling prioritization in the idle CPU selection path - Documentation, selftest, and tooling updates. Misc bug fixes and cleanups * tag 'sched_ext-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (134 commits) tools/sched_ext: Add explicit cast from void* in RESIZE_ARRAY() sched_ext: Make string params of __ENUM_set() const tools/sched_ext: Kick home CPU for stranded tasks in scx_qmap sched_ext: Drop spurious warning on kick during scheduler disable sched_ext: Warn on task-based SCX op recursion sched_ext: Rename scx_kf_allowed_on_arg_tasks() to scx_kf_arg_task_ok() sched_ext: Remove runtime kfunc mask enforcement sched_ext: Add verifier-time kfunc context filter sched_ext: Drop redundant rq-locked check from scx_bpf_task_cgroup() sched_ext: Decouple kfunc unlocked-context check from kf_mask sched_ext: Fix ops.cgroup_move() invocation kf_mask and rq tracking sched_ext: Track @p's rq lock across set_cpus_allowed_scx -> ops.set_cpumask sched_ext: Add select_cpu kfuncs to scx_kfunc_ids_unlocked sched_ext: Drop TRACING access to select_cpu kfuncs selftests/sched_ext: Fix wrong DSQ ID in peek_dsq error message sched_ext: Documentation: improve accuracy of task lifecycle pseudo-code selftests/sched_ext: Improve runner error reporting for invalid arguments sched_ext: Documentation: Fix scx_bpf_move_to_local kfunc name sched_ext: Documentation: Add ops.dequeue() to task lifecycle tools/sched_ext: Fix off-by-one in scx_sdt payload zeroing ...
13 days	Merge tag 'wq-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq	Linus Torvalds	-12/+8
	Pull workqueue updates from Tejun Heo: - New default WQ_AFFN_CACHE_SHARD affinity scope subdivides LLCs into smaller shards to improve scalability on machines with many CPUs per LLC - Misc: - system_dfl_long_wq for long unbound works - devm_alloc_workqueue() for device-managed allocation - sysfs exposure for ordered workqueues and the EFI workqueue - removal of HK_TYPE_WQ from wq_unbound_cpumask - various small fixes * tag 'wq-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (21 commits) workqueue: validate cpumask_first() result in llc_populate_cpu_shard_id() workqueue: use NR_STD_WORKER_POOLS instead of hardcoded value workqueue: avoid unguarded 64-bit division docs: workqueue: document WQ_AFFN_CACHE_SHARD affinity scope workqueue: add test_workqueue benchmark module tools/workqueue: add CACHE_SHARD support to wq_dump.py workqueue: set WQ_AFFN_CACHE_SHARD as the default affinity scope workqueue: add WQ_AFFN_CACHE_SHARD affinity scope workqueue: fix typo in WQ_AFFN_SMT comment workqueue: Remove HK_TYPE_WQ from affecting wq_unbound_cpumask workqueue: unlink pwqs from wq->pwqs list in alloc_and_link_pwqs() error path workqueue: Remove NULL wq WARN in __queue_delayed_work() workqueue: fix parse_affn_scope() prefix matching bug workqueue: devres: Add device-managed allocate workqueue workqueue: Add system_dfl_long_wq for long unbound works tools/workqueue/wq_dump.py: add NODE prefix to all node columns tools/workqueue/wq_dump.py: fix column alignment in node_nr/max_active section tools/workqueue/wq_dump.py: remove backslash separator from node_nr/max_active header efi: Allow to expose the workqueue via sysfs workqueue: Allow to expose ordered workqueues via sysfs ...
13 days	Merge tag 'sound-7.1-rc1' of ↵	Linus Torvalds	-1/+9
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "Nothing too thrilling here, but we see lots of driver updates and bug fixes, including quirk additions and refactoring works, while there have been little changes in the core functionality. Here are some highlights: Core: - Add validation for the control API put callback - Fixes in compress-offload API timestamp handling - Continued ASoC core API cleanups ASoC: - Add support for bus keepers (for Apple devices in future) - Enhancements to the SDCA support, including retaskable jacks - Test improvements for Cirrus Logic drivers - Lots of fixes for the NXP, nVidia and Qualcomm - Support for AMD RPL DMIC, Cirrus Logic CS42L43 and CS47L47, nVidia machines with CPCAP and WM8962 USB-audio: - Quirks for Huawei Headset, Focusrite Novation, MV-Silicon, Studio 1824, Arturia AF16Rig, Hotone Audio, Feaulle Rainbow, PreSonus AudioBox, Moondrop Ju Jiu, Scarlett 18i20, etc - Extended mixer volume quirk handling - UAF and other fixes for us144mkii, 6fire and caiaq drivers HD-audio: - Add quirks or fixes for Acer, Lenovo, HP, ASUS machines - Fixes & cleanups of GPIO helper code Misc: - Add suspend/resume support for multiple legacy ISA and Apple drivers - Further regression fixes for ctxfi driver" * tag 'sound-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (359 commits) ALSA: control: Validate buf_len before strnlen() in snd_ctl_elem_init_enum_names() ALSA: usb-audio: Fix missing error handling for get_min_max() ALSA: hda/realtek - fixed speaker no sound update ALSA: hda/realtek: Add quirk for Acer PT316-51S headset mic ALSA: usb-audio: Exclude Scarlett 18i20 1st Gen from SKIP_IFACE_SETUP ALSA: hda/realtek: Add quirk for Legion S7 15IMH ALSA: hda/realtek: Add quirk for HP Spectre x360 14-ea ALSA: caiaq: take a reference on the USB device in create_card() ASoC: dt-bindings: rockchip: convert rk3399-gru-sound to DT Schema ALSA: sscape: Add suspend and resume support ALSA: sscape: Cache per-card resources for board reinitialization ALSA: usb-audio: Do not expose sticky mixers ALSA: usb-audio: Move volume control resolution check into a function ALSA: usb-audio: Add error checks against get_min_max() ALSA: usb-audio: Add quirk for PreSonus AudioBox USB ALSA: interwave: guard PM-only restore helpers with CONFIG_PM ALSA: usb-audio: Evaluate packsize caps at the right place ALSA: sc6000: Restore board setup across suspend ALSA: sc6000: Keep the programmed board state in card-private data ALSA: 6fire: fix use-after-free on disconnect ...
13 days	selftests/bpf: arg tracking for imprecise/multi-offset BPF_ST/STX	Eduard Zingerman	-0/+193
	Add test cases for clear_stack_for_all_offs and dst_is_local_fp handling of multi-offset and ARG_IMPRECISE stack pointers: - st_imm_join_with_multi_off: BPF_ST through multi-offset dst should join at_stack with none instead of overwriting both candidate slots. - st_imm_join_with_imprecise_off: BPF_ST through offset-imprecise dst should join at_stack with none instead of clearing all slots. - st_imm_join_with_single_off: a canary checking that BPF_ST with a known offset overwrites slot instead of joining. - imprecise_dst_spill_join: BPF_STX through ARG_IMPRECISE dst should be recognized as a local spill and join at_stack with the written value. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260413-stacklive-fixes-v2-2-398e126e5cf3@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days	selftests/bpf: Fix timer_start_deadlock failure due to hrtimer change	Shung-Hsi Yu	-4/+4
	Since commit f2e388a019e4 ("hrtimer: Reduce trace noise in hrtimer_start()"), hrtimer_cancel tracepoint is no longer called when a hrtimer is re-armed. So instead of a hrtimer_cancel followed by hrtimer_start tracepoint events, there is now only a since hrtimer_start tracepoint event with the new was_armed field set to 1, to indicated that the hrtimer was previously armed. Update timer_start_deadlock accordingly so it traces hrtimer_start tracepoint instead, with was_armed used as guard. Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Tested-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260415120329.129192-1-shung-hsi.yu@suse.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
14 days	tools/accounting: handle truncated taskstats netlink messages	Yiyang Chen	-8/+73
	procacct and getdelays use a fixed receive buffer for taskstats generic netlink messages. A multi-threaded process exit can emit a single PID+TGID notification large enough to exceed that buffer on newer kernels. Switch to recvmsg() so MSG_TRUNC is detected explicitly, increase the message buffer size, and report truncated datagrams clearly instead of misparsing them as fatal netlink errors. Also print the taskstats version in debug output to make version mismatches easier to diagnose while inspecting taskstats traffic. Link: https://lkml.kernel.org/r/520308bb4cbbaf8dc2c7296b5f60f11e12fb30a5.1774810498.git.cyyzero16@gmail.com Signed-off-by: Yiyang Chen <cyyzero16@gmail.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dr. Thomas Orgis <thomas.orgis@uni-hamburg.de> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Wang Yaxin <wang.yaxin@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
14 days	Merge tag 'kernel-7.1-rc1.misc' of ↵	Linus Torvalds	-1/+240
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pid_namespace updates from Christian Brauner: - pid_namespace: make init creation more flexible Annotate ->child_reaper accesses with {READ,WRITE}_ONCE() to protect the unlocked readers from cpu/compiler reordering, and enforce that pid 1 in a pid namespace is always the first allocated pid (the set_tid path already required this). On top of that, allow opening pid_for_children before the pid namespace init has been created. This lets one process create the pid namespace and a different process create the init via setns(), which makes clone3(set_tid) usable in all cases evenly and is particularly useful to CRIU when restoring nested containers. A new selftest covers both the basic create-pidns-then-init flow and the cross-process variant, and a MAINTAINERS entry for the pid namespace code is added. - unrelated signal cleanup: update outdated comment for the removed freezable_schedule() * tag 'kernel-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: signal: update outdated comment for removed freezable_schedule() MAINTAINERS: add a pid namespace entry selftests: Add tests for creating pidns init via setns pid_namespace: allow opening pid_for_children before init was created pid: check init is created first after idr alloc pid_namespace: avoid optimization of accesses to ->child_reaper
14 days	Merge tag 'vfs-7.1-rc1.mount.v2' of ↵	Linus Torvalds	-106/+3663
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: - Add FSMOUNT_NAMESPACE flag to fsmount() that creates a new mount namespace with the newly created filesystem attached to a copy of the real rootfs. This returns a namespace file descriptor instead of an O_PATH mount fd, similar to how OPEN_TREE_NAMESPACE works for open_tree(). This allows creating a new filesystem and immediately placing it in a new mount namespace in a single operation, which is useful for container runtimes and other namespace-based isolation mechanisms. This accompanies OPEN_TREE_NAMESPACE and avoids a needless detour via OPEN_TREE_NAMESPACE to get the same effect. Will be especially useful when you mount an actual filesystem to be used as the container rootfs. - Currently, creating a new mount namespace always copies the entire mount tree from the caller's namespace. For containers and sandboxes that intend to build their mount table from scratch this is wasteful: they inherit a potentially large mount tree only to immediately tear it down. This series adds support for creating a mount namespace that contains only a clone of the root mount, with none of the child mounts. Two new flags are introduced: - CLONE_EMPTY_MNTNS (0x400000000) for clone3(), using the 64-bit flag space - UNSHARE_EMPTY_MNTNS (0x00100000) for unshare() Both flags imply CLONE_NEWNS. The resulting namespace contains a single nullfs root mount with an immutable empty directory. The intended workflow is to then mount a real filesystem (e.g., tmpfs) over the root and build the mount table from there. - Allow MOVE_MOUNT_BENEATH to target the caller's rootfs, allowing to switch out the rootfs without pivot_root(2). The traditional approach to switching the rootfs involves pivot_root(2) or a chroot_fs_refs()-based mechanism that atomically updates fs->root for all tasks sharing the same fs_struct. This has consequences for fork(), unshare(CLONE_FS), and setns(). This series instead decomposes root-switching into individually atomic, locally-scoped steps: fd_tree = open_tree(-EBADF, "/newroot", OPEN_TREE_CLONE \| OPEN_TREE_CLOEXEC); fchdir(fd_tree); move_mount(fd_tree, "", AT_FDCWD, "/", MOVE_MOUNT_BENEATH \| MOVE_MOUNT_F_EMPTY_PATH); chroot("."); umount2(".", MNT_DETACH); Since each step only modifies the caller's own state, the fork/unshare/setns races are eliminated by design. A key step to making this possible is to remove the locked mount restriction. Originally MOVE_MOUNT_BENEATH doesn't support mounting beneath a mount that is locked. The locked mount protects the underlying mount from being revealed. This is a core mechanism of unshare(CLONE_NEWUSER \| CLONE_NEWNS). The mounts in the new mount namespace become locked. That effectively makes the new mount table useless as the caller cannot ever get rid of any of the mounts no matter how useless they are. We can lift this restriction though. We simply transfer the locked property from the top mount to the mount beneath. This works because what we care about is to protect the underlying mount aka the parent. The mount mounted between the parent and the top mount takes over the job of protecting the parent mount from the top mount mount. This leaves us free to remove the locked property from the top mount which can consequently be unmounted: unshare(CLONE_NEWUSER \| CLONE_NEWNS) and we inherit a clone of procfs on /proc then currently we cannot unmount it as: umount -l /proc will fail with EINVAL because the procfs mount is locked. After this series we can now do: mount --beneath -t tmpfs tmpfs /proc umount -l /proc after which a tmpfs mount has been placed beneath the procfs mount. The tmpfs mount has become locked and the procfs mount has become unlocked. This means you can safely modify an inherited mount table after unprivileged namespace creation. Afterwards we simply make it possible to move a mount beneath the rootfs allowing to upgrade the rootfs. Removing the locked restriction makes this very useful for containers created with unshare(CLONE_NEWUSER \| CLONE_NEWNS) to reshuffle an inherited mount table safely and MOVE_MOUNT_BENEATH makes it possible to switch out the rootfs instead of using the costly pivot_root(2). * tag 'vfs-7.1-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/namespaces: remove unused utils.h include from listns_efault_test selftests/fsmount_ns: add missing TARGETS and fix cap test selftests/empty_mntns: fix wrong CLONE_EMPTY_MNTNS hex value in comment selftests/empty_mntns: fix statmount_alloc() signature mismatch selftests/statmount: remove duplicate wait_for_pid() mount: always duplicate mount selftests/filesystems: add MOVE_MOUNT_BENEATH rootfs tests move_mount: allow MOVE_MOUNT_BENEATH on the rootfs move_mount: transfer MNT_LOCKED selftests/filesystems: add clone3 tests for empty mount namespaces selftests/filesystems: add tests for empty mount namespaces namespace: allow creating empty mount namespaces selftests: add FSMOUNT_NAMESPACE tests selftests/statmount: add statmount_alloc() helper tools: update mount.h header mount: add FSMOUNT_NAMESPACE mount: simplify __do_loopback() mount: start iterating from start of rbtree