aboutsummaryrefslogtreecommitdiffstats
path: root/fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
6 daysMerge tag 'v6.19rc8-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2-1/+4
Pull smb client fixes from Steve French: "Two small client memory leak fixes" * tag 'v6.19rc8-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb/client: fix memory leak in SendReceive() smb/client: fix memory leak in smb2_open_file()
6 daysMerge tag 'for-6.19-rc8-tag' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A regression fix for a memory leak when raid56 is used" * tag 'for-6.19-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: raid56: fix memory leak of btrfs_raid_bio::stripe_uptodate_bitmap
6 daysbtrfs: raid56: fix memory leak of btrfs_raid_bio::stripe_uptodate_bitmapFilipe Manana1-0/+1
We allocate the bitmap but we never free it in free_raid_bio_pointers(). Fix this by adding a bitmap_free() call against the stripe_uptodate_bitmap of a raid bio. Fixes: 1810350b04ef ("btrfs: raid56: move sector_ptr::uptodate into a dedicated bitmap") Reported-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-btrfs/20260126045315.GA31641@lst.de/ Reviewed-by: Qu Wenruo <wqu@suse.com> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
7 dayssmb/client: fix memory leak in SendReceive()ChenXiaoSong1-1/+3
Reproducer: 1. server: supports SMB1, directories are exported read-only 2. client: mount -t cifs -o vers=1.0 //${server_ip}/export /mnt 3. client: dd if=/dev/zero of=/mnt/file bs=512 count=1000 oflag=direct 4. client: umount /mnt 5. client: sleep 1 6. client: modprobe -r cifs The error message is as follows: ============================================================================= BUG cifs_small_rq (Not tainted): Objects remaining on __kmem_cache_shutdown() ----------------------------------------------------------------------------- Object 0x00000000d34491e6 @offset=896 Object 0x00000000bde9fab3 @offset=4480 Object 0x00000000104a1f70 @offset=6272 Object 0x0000000092a51bb5 @offset=7616 Object 0x000000006714a7db @offset=13440 ... WARNING: mm/slub.c:1251 at __kmem_cache_shutdown+0x379/0x3f0, CPU#7: modprobe/712 ... Call Trace: <TASK> kmem_cache_destroy+0x69/0x160 cifs_destroy_request_bufs+0x39/0x40 [cifs] cleanup_module+0x43/0xfc0 [cifs] __se_sys_delete_module+0x1d5/0x300 __x64_sys_delete_module+0x1a/0x30 x64_sys_call+0x2299/0x2ff0 do_syscall_64+0x6e/0x270 entry_SYSCALL_64_after_hwframe+0x76/0x7e ... kmem_cache_destroy cifs_small_rq: Slab cache still has objects when called from cifs_destroy_request_bufs+0x39/0x40 [cifs] WARNING: mm/slab_common.c:532 at kmem_cache_destroy+0x142/0x160, CPU#7: modprobe/712 Link: https://lore.kernel.org/linux-cifs/9751f02d-d1df-4265-a7d6-b19761b21834@linux.dev/T/#mf14808c144448b715f711ce5f0477a071f08eaf6 Fixes: 6be09580df5c ("cifs: Make smb1's SendReceive() wrap cifs_send_recv()") Reported-by: Paulo Alcantara <pc@manguebit.org> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
7 dayssmb/client: fix memory leak in smb2_open_file()ChenXiaoSong1-0/+1
Reproducer: 1. server: directories are exported read-only 2. client: mount -t cifs //${server_ip}/export /mnt 3. client: dd if=/dev/zero of=/mnt/file bs=512 count=1000 oflag=direct 4. client: umount /mnt 5. client: sleep 1 6. client: modprobe -r cifs The error message is as follows: ============================================================================= BUG cifs_small_rq (Not tainted): Objects remaining on __kmem_cache_shutdown() ----------------------------------------------------------------------------- Object 0x00000000d47521be @offset=14336 ... WARNING: mm/slub.c:1251 at __kmem_cache_shutdown+0x34e/0x440, CPU#0: modprobe/1577 ... Call Trace: <TASK> kmem_cache_destroy+0x94/0x190 cifs_destroy_request_bufs+0x3e/0x50 [cifs] cleanup_module+0x4e/0x540 [cifs] __se_sys_delete_module+0x278/0x400 __x64_sys_delete_module+0x5f/0x70 x64_sys_call+0x2299/0x2ff0 do_syscall_64+0x89/0x350 entry_SYSCALL_64_after_hwframe+0x76/0x7e ... kmem_cache_destroy cifs_small_rq: Slab cache still has objects when called from cifs_destroy_request_bufs+0x3e/0x50 [cifs] WARNING: mm/slab_common.c:532 at kmem_cache_destroy+0x16b/0x190, CPU#0: modprobe/1577 Link: https://lore.kernel.org/linux-cifs/9751f02d-d1df-4265-a7d6-b19761b21834@linux.dev/T/#mf14808c144448b715f711ce5f0477a071f08eaf6 Fixes: e255612b5ed9 ("cifs: Add fallback for SMB2 CREATE without FILE_READ_ATTRIBUTES") Reported-by: Paulo Alcantara <pc@manguebit.org> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Reviewed-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
9 daysMerge tag 'efi-fixes-for-v6.19-3' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fix from Ard Biesheuvel: - Fix regression in efivarfs error propagation * tag 'efi-fixes-for-v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efivarfs: fix error propagation in efivar_entry_get()
11 daysMerge tag 'for-6.19-rc7-tag' of ↵Linus Torvalds4-26/+3
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix leaked folio refcount on s390x when using hw zlib compression acceleration - remove own threshold from ->writepages() which could collide with cgroup limits and lead to a deadlock when metadadata are not written because the amount is under the internal limit * tag 'for-6.19-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zlib: fix the folio leak on S390 hardware acceleration btrfs: do not strictly require dirty metadata threshold for metadata writepages
14 daysMerge tag 'vfs-6.19-rc8.fixes' of ↵Linus Torvalds13-39/+75
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix the the buggy conversion of fuse_reverse_inval_entry() introduced during the creation rework - Disallow nfs delegation requests for directories by setting simple_nosetlease() - Require an opt-in for getting readdir flag bits outside of S_DT_MASK set in d_type - Fix scheduling delayed writeback work by only scheduling when the dirty time expiry interval is non-zero and cancel the delayed work if the interval is set to zero - Use rounded_jiffies_interval for dirty time work - Check the return value of sb_set_blocksize() for romfs - Wait for batched folios to be stable in __iomap_get_folio() - Use private naming for fuse hash size - Fix the stale dentry cleanup to prevent a race that causes a UAF * tag 'vfs-6.19-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: document d_dispose_if_unused() fuse: shrink once after all buckets have been scanned fuse: clean up fuse_dentry_tree_work() fuse: add need_resched() before unlocking bucket fuse: make sure dentry is evicted if stale fuse: fix race when disposing stale dentries fuse: use private naming for fuse hash size writeback: use round_jiffies_relative for dirtytime_work iomap: wait for batched folios to be stable in __iomap_get_folio romfs: check sb_set_blocksize() return value docs: clarify that dirtytime_expire_seconds=0 disables writeback writeback: fix 100% CPU usage when dirtytime_expire_interval is 0 readdir: require opt-in for d_type flags vboxsf: don't allow delegations to be set on directories ceph: don't allow delegations to be set on directories gfs2: don't allow delegations to be set on directories 9p: don't allow delegations to be set on directories smb/client: properly disallow delegations on directories nfs: properly disallow delegation requests on directories fuse: fix conversion of fuse_reverse_inval_entry() to start_removing()
2026-01-23Merge tag 'v6.19-rc6-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds2-9/+9
Pull smb server fixes from Steve French: - Use the original nents value for ib_dma_unmap_sg(), preventing potential memory corruption in the RDMA transport layer - Fix a naming discrepancy in the kernel-doc for ksmbd_vfs_kern_path_start_removing() as identified by sparse static analysis - Reset smb_direct_port to its default value during initialization to ensure the correct port is used when switching between different RDMA device types without module reload * tag 'v6.19-rc6-server-fixes' of git://git.samba.org/ksmbd: smb: server: reset smb_direct_port = SMB_DIRECT_PORT_INFINIBAND on init smb: server: fix comment for ksmbd_vfs_kern_path_start_removing() ksmbd: smbd: fix dma_unmap_sg() nents
2026-01-22smb: server: reset smb_direct_port = SMB_DIRECT_PORT_INFINIBAND on initStefan Metzmacher1-0/+1
This allows testing with different devices (iwrap vs. non-iwarp) without 'rmmod ksmbd && modprobe ksmbd', but instead 'ksmbd.control -s && ksmbd.mountd' is enough. In the long run we want to listen on iwarp and non-iwarp at the same time, but requires more changes, most likely also in the rdma layer. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-01-22smb: server: fix comment for ksmbd_vfs_kern_path_start_removing()Stefan Metzmacher1-1/+1
This was found by sparse... Fixes: 1ead2213dd7d ("smb/server: use end_removing_noperm for for target of smb2_create_link()") Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: NeilBrown <neil@brown.name> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-01-22ksmbd: smbd: fix dma_unmap_sg() nentsThomas Fourier1-8/+7
The dma_unmap_sg() functions should be called with the same nents as the dma_map_sg(), not the value the map function returned. Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Cc: <stable@vger.kernel.org> Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-01-21btrfs: zlib: fix the folio leak on S390 hardware accelerationQu Wenruo1-0/+1
[BUG] After commit aa60fe12b4f4 ("btrfs: zlib: refactor S390x HW acceleration buffer preparation"), we no longer release the folio of the page cache of folio returned by btrfs_compress_filemap_get_folio() for S390 hardware acceleration path. [CAUSE] Before that commit, we call kumap_local() and folio_put() after handling each folio. Although the timing is not ideal (it release previous folio at the beginning of the loop, and rely on some extra cleanup out of the loop), it at least handles the folio release correctly. Meanwhile the refactored code is easier to read, it lacks the call to release the filemap folio. [FIX] Add the missing folio_put() for copy_data_into_buffer(). CC: linux-s390@vger.kernel.org # 6.18+ Fixes: aa60fe12b4f4 ("btrfs: zlib: refactor S390x HW acceleration buffer preparation") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-21btrfs: do not strictly require dirty metadata threshold for metadata writepagesQu Wenruo3-26/+2
[BUG] There is an internal report that over 1000 processes are waiting at the io_schedule_timeout() of balance_dirty_pages(), causing a system hang and trigger a kernel coredump. The kernel is v6.4 kernel based, but the root problem still applies to any upstream kernel before v6.18. [CAUSE] From Jan Kara for his wisdom on the dirty page balance behavior first. This cgroup dirty limit was what was actually playing the role here because the cgroup had only a small amount of memory and so the dirty limit for it was something like 16MB. Dirty throttling is responsible for enforcing that nobody can dirty (significantly) more dirty memory than there's dirty limit. Thus when a task is dirtying pages it periodically enters into balance_dirty_pages() and we let it sleep there to slow down the dirtying. When the system is over dirty limit already (either globally or within a cgroup of the running task), we will not let the task exit from balance_dirty_pages() until the number of dirty pages drops below the limit. So in this particular case, as I already mentioned, there was a cgroup with relatively small amount of memory and as a result with dirty limit set at 16MB. A task from that cgroup has dirtied about 28MB worth of pages in btrfs btree inode and these were practically the only dirty pages in that cgroup. So that means the only way to reduce the dirty pages of that cgroup is to writeback the dirty pages of btrfs btree inode, and only after that those processes can exit balance_dirty_pages(). Now back to the btrfs part, btree_writepages() is responsible for writing back dirty btree inode pages. The problem here is, there is a btrfs internal threshold that if the btree inode's dirty bytes are below the 32M threshold, it will not do any writeback. This behavior is to batch as much metadata as possible so we won't write back those tree blocks and then later re-COW them again for another modification. This internal 32MiB is higher than the existing dirty page size (28MiB), meaning no writeback will happen, causing a deadlock between btrfs and cgroup: - Btrfs doesn't want to write back btree inode until more dirty pages - Cgroup/MM doesn't want more dirty pages for btrfs btree inode Thus any process touching that btree inode is put into sleep until the number of dirty pages is reduced. Thanks Jan Kara a lot for the analysis of the root cause. [ENHANCEMENT] Since kernel commit b55102826d7d ("btrfs: set AS_KERNEL_FILE on the btree_inode"), btrfs btree inode pages will only be charged to the root cgroup which should have a much larger limit than btrfs' 32MiB threshold. So it should not affect newer kernels. But for all current LTS kernels, they are all affected by this problem, and backporting the whole AS_KERNEL_FILE may not be a good idea. Even for newer kernels I still think it's a good idea to get rid of the internal threshold at btree_writepages(), since for most cases cgroup/MM has a better view of full system memory usage than btrfs' fixed threshold. For internal callers using btrfs_btree_balance_dirty() since that function is already doing internal threshold check, we don't need to bother them. But for external callers of btree_writepages(), just respect their requests and write back whatever they want, ignoring the internal btrfs threshold to avoid such deadlock on btree inode dirty page balancing. CC: stable@vger.kernel.org CC: Jan Kara <jack@suse.cz> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-21Merge tag 'for-6.19-rc6-tag' of ↵Linus Torvalds5-2/+73
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - protect reading super block vs setting block size externally (found by syzbot) - make sure no transaction is started in read-only mode even with some rescue mount option combinations - fix checksum calculation of backup super blocks when block-group-tree is enabled - more extensive mount-time checks of device items that could be left after device replace and attempting degraded mount - fix build warning with -Wmaybe-uninitialized on loongarch64-gcc 12 * tag 'for-6.19-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: add extra device item checks at mount btrfs: fix missing fields in superblock backup with BLOCK_GROUP_TREE btrfs: reject new transactions if the fs is fully read-only btrfs: sync read disk super and set block size btrfs: fix Wmaybe-uninitialized warning in replay_one_buffer()
2026-01-20btrfs: add extra device item checks at mountQu Wenruo3-0/+48
[BUG] There is a bug report where after a dev-replace, the replace source device with devid 4 is properly erased (dump tree shows it's the old devid 4), but the target device is still using devid 0. When the user tries to mount the fs degraded, the mount failed with the following errors: BTRFS: device fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 devid 5 transid 1394395 /dev/sda (8:0) scanned by btrfs (261) BTRFS: device fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 devid 6 transid 1394395 /dev/sde (8:64) scanned by btrfs (261) BTRFS: device fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 devid 0 transid 1394395 /dev/sdd (8:48) scanned by btrfs (261) BTRFS: device fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 devid 3 transid 1394395 /dev/sdf (8:80) scanned by btrfs (261) BTRFS info (device sdd): first mount of filesystem 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 BTRFS info (device sdd): using crc32c (crc32c-intel) checksum algorithm BTRFS warning (device sdd): devid 4 uuid 01e2081c-9c2a-4071-b9f4-e1b27e571ff5 is missing BTRFS info (device sdd): bdev <missing disk> errs: wr 84994544, rd 15567, flush 65872, corrupt 0, gen 0 BTRFS info (device sdd): bdev /dev/sdd errs: wr 71489901, rd 0, flush 30001, corrupt 0, gen 0 BTRFS error (device sdd): replace without active item, run 'device scan --forget' on the target device BTRFS error (device sdd): failed to init dev_replace: -117 BTRFS error (device sdd): open_ctree failed: -117 [CAUSE] The devid 0 didn't get its devid updated is its own problem, here I'm only focusing on the mount failure itself. The mount is not caused by the missing device, as the fs has RAID1C3 for metadata and RAID10 for data, thus is completely able to tolerate one missing device. The device tree shows the dev-replace has properly finished: item 7 key (0 DEV_REPLACE 0) itemoff 15931 itemsize 72 src devid -1 cursor left 11091821199360 cursor right 11091821199360 mode ALWAYS state FINISHED write errors 0 uncorrectable read errors 0 ^^^^^^^^ And the chunk tree shows there is no devid 0: leaf 37980736602112 items 23 free space 12548 generation 1394388 owner CHUNK_TREE leaf 37980736602112 flags 0x1(WRITTEN) backref revision 1 fs uuid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 chunk uuid d074c661-6311-4570-b59f-a5c83fd37f8e item 0 key (DEV_ITEMS DEV_ITEM 3) itemoff 16185 itemsize 98 devid 3 total_bytes 20000588955648 bytes_used 8282877984768 io_align 4096 io_width 4096 sector_size 4096 type 0 generation 0 start_offset 0 dev_group 0 seek_speed 0 bandwidth 0 uuid 0d596b69-fb0d-4031-b4af-a301d0868b8b fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 ... Which shows the first device is devid 3. But there is indeed /dev/sdd with devid 0: superblock: bytenr=65536, device=/dev/sdd --------------------------------------------------------- csum_type 0 (crc32c) csum_size 4 csum 0xd4bed87e [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 ... uuid_tree_generation 1394388 dev_item.uuid ee6532ad-5442-45f7-87fb-7703e29ed934 dev_item.fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 [match] dev_item.type 0 dev_item.total_bytes 20000588955648 dev_item.bytes_used 8292541661184 dev_item.io_align 0 dev_item.io_width 0 dev_item.sector_size 0 dev_item.devid 0 <<< So this means device scan will register sdd as devid 0 into the fs, then during btrfs_init_dev_replace(), we located the replace progress item, found the previous replace is finished, but we still need to check if the dev-replace target device (devid 0) exists. If that device exists, we error out showing that error message. But to be honest the end user may not really remember which device is the replace target device, thus not sure what to do in the next step. [ENHANCEMENT] To make the error more obvious, and tell the end user which devices should be unregistered: - Introduce BTRFS_DEV_STATE_ITEM_FOUND flag During device item read from the chunk tree, set the flag for each found device item. - Verify there is no device without the above flag during mount Even missing device should have that flag set. If we found a device without that flag set, it means it's an unexpected one and should be rejected. - More detailed error message on what to do next This will show all unexpected devices and tell the end user to use 'btrfs dev scan --forget' to forget them or remove them before mount. There is an example dmesg where a device of a valid filesystem is modified to have devid 0, then try degraded mount: BTRFS info (device dm-6): first mount of filesystem 7c873869-844c-4b39-bd75-a96148bf4656 BTRFS info (device dm-6): using crc32c checksum algorithm BTRFS warning (device dm-6): devid 3 uuid b4a9f35b-db42-4ac4-b55a-cbf81d3b9683 is missing BTRFS error (device dm-6): devid 0 path /dev/mapper/test-scratch3 is registered but not found in chunk tree BTRFS error (device dm-6): please remove above devices or use 'btrfs device scan --forget <dev>' to unregister them before mount BTRFS error (device dm-6): open_ctree failed: -117 Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-20btrfs: fix missing fields in superblock backup with BLOCK_GROUP_TREEMark Harmstone1-1/+1
When the BLOCK_GROUP_TREE compat_ro flag is set, the extent root and csum root fields are getting missed. This is because EXTENT_TREE_V2 treated these differently, and when they were split off this special-casing was mistakenly assigned to BGT rather than the rump EXTENT_TREE_V2. There's no reason why the existence of the block group tree should mean that we don't record the details of the last commit's extent root and csum root. Fix the code in backup_super_roots() so that the correct check gets made. Fixes: 1c56ab991903 ("btrfs: separate BLOCK_GROUP_TREE compat RO flag from EXTENT_TREE_V2") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-20btrfs: reject new transactions if the fs is fully read-onlyQu Wenruo2-0/+21
[BUG] There is a bug report where a heavily fuzzed fs is mounted with all rescue mount options, which leads to the following warnings during unmount: BTRFS: Transaction aborted (error -22) Modules linked in: CPU: 0 UID: 0 PID: 9758 Comm: repro.out Not tainted 6.19.0-rc5-00002-gb71e635feefc #7 PREEMPT(full) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:find_free_extent_update_loop fs/btrfs/extent-tree.c:4208 [inline] RIP: 0010:find_free_extent+0x52f0/0x5d20 fs/btrfs/extent-tree.c:4611 Call Trace: <TASK> btrfs_reserve_extent+0x2cd/0x790 fs/btrfs/extent-tree.c:4705 btrfs_alloc_tree_block+0x1e1/0x10e0 fs/btrfs/extent-tree.c:5157 btrfs_force_cow_block+0x578/0x2410 fs/btrfs/ctree.c:517 btrfs_cow_block+0x3c4/0xa80 fs/btrfs/ctree.c:708 btrfs_search_slot+0xcad/0x2b50 fs/btrfs/ctree.c:2130 btrfs_truncate_inode_items+0x45d/0x2350 fs/btrfs/inode-item.c:499 btrfs_evict_inode+0x923/0xe70 fs/btrfs/inode.c:5628 evict+0x5f4/0xae0 fs/inode.c:837 __dentry_kill+0x209/0x660 fs/dcache.c:670 finish_dput+0xc9/0x480 fs/dcache.c:879 shrink_dcache_for_umount+0xa0/0x170 fs/dcache.c:1661 generic_shutdown_super+0x67/0x2c0 fs/super.c:621 kill_anon_super+0x3b/0x70 fs/super.c:1289 btrfs_kill_super+0x41/0x50 fs/btrfs/super.c:2127 deactivate_locked_super+0xbc/0x130 fs/super.c:474 cleanup_mnt+0x425/0x4c0 fs/namespace.c:1318 task_work_run+0x1d4/0x260 kernel/task_work.c:233 exit_task_work include/linux/task_work.h:40 [inline] do_exit+0x694/0x22f0 kernel/exit.c:971 do_group_exit+0x21c/0x2d0 kernel/exit.c:1112 __do_sys_exit_group kernel/exit.c:1123 [inline] __se_sys_exit_group kernel/exit.c:1121 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1121 x64_sys_call+0x2210/0x2210 arch/x86/include/generated/asm/syscalls_64.h:232 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xe8/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x44f639 Code: Unable to access opcode bytes at 0x44f60f. RSP: 002b:00007ffc15c4e088 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 00000000004c32f0 RCX: 000000000044f639 RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 RBP: 0000000000000001 R08: ffffffffffffffc0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004c32f0 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 </TASK> Since rescue mount options will mark the full fs read-only, there should be no new transaction triggered. But during unmount we will evict all inodes, which can trigger a new transaction, and triggers warnings on a heavily corrupted fs. [CAUSE] Btrfs allows new transaction even on a read-only fs, this is to allow log replay happen even on read-only mounts, just like what ext4/xfs do. However with rescue mount options, the fs is fully read-only and cannot be remounted read-write, thus in that case we should also reject any new transactions. [FIX] If we find the fs has rescue mount options, we should treat the fs as error, so that no new transaction can be started. Reported-by: Jiaming Zhang <r772577952@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CANypQFYw8Nt8stgbhoycFojOoUmt+BoZ-z8WJOZVxcogDdwm=Q@mail.gmail.com/ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-20btrfs: sync read disk super and set block sizeEdward Adam Davis1-0/+2
When the user performs a btrfs mount, the block device is not set correctly. The user sets the block size of the block device to 0x4000 by executing the BLKBSZSET command. Since the block size change also changes the mapping->flags value, this further affects the result of the mapping_min_folio_order() calculation. Let's analyze the following two scenarios: Scenario 1: Without executing the BLKBSZSET command, the block size is 0x1000, and mapping_min_folio_order() returns 0; Scenario 2: After executing the BLKBSZSET command, the block size is 0x4000, and mapping_min_folio_order() returns 2. do_read_cache_folio() allocates a folio before the BLKBSZSET command is executed. This results in the allocated folio having an order value of 0. Later, after BLKBSZSET is executed, the block size increases to 0x4000, and the mapping_min_folio_order() calculation result becomes 2. This leads to two undesirable consequences: 1. filemap_add_folio() triggers a VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping)) assertion. 2. The syzbot report [1] shows a null pointer dereference in create_empty_buffers() due to a buffer head allocation failure. Synchronization should be established based on the inode between the BLKBSZSET command and read cache page to prevent inconsistencies in block size or mapping flags before and after folio allocation. [1] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:create_empty_buffers+0x4d/0x480 fs/buffer.c:1694 Call Trace: folio_create_buffers+0x109/0x150 fs/buffer.c:1802 block_read_full_folio+0x14c/0x850 fs/buffer.c:2403 filemap_read_folio+0xc8/0x2a0 mm/filemap.c:2496 do_read_cache_folio+0x266/0x5c0 mm/filemap.c:4096 do_read_cache_page mm/filemap.c:4162 [inline] read_cache_page_gfp+0x29/0x120 mm/filemap.c:4195 btrfs_read_disk_super+0x192/0x500 fs/btrfs/volumes.c:1367 Reported-by: syzbot+b4a2af3000eaa84d95d5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b4a2af3000eaa84d95d5 Signed-off-by: Edward Adam Davis <eadavis@qq.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-20btrfs: fix Wmaybe-uninitialized warning in replay_one_buffer()Qiang Ma1-1/+1
Warning was found when compiling using loongarch64-gcc 12.3.1: $ make CFLAGS_tree-log.o=-Wmaybe-uninitialized In file included from fs/btrfs/ctree.h:21, from fs/btrfs/tree-log.c:12: fs/btrfs/accessors.h: In function 'replay_one_buffer': fs/btrfs/accessors.h:66:16: warning: 'inode_item' may be used uninitialized [-Wmaybe-uninitialized] 66 | return btrfs_get_##bits(eb, s, offsetof(type, member)); \ | ^~~~~~~~~~ fs/btrfs/tree-log.c:2803:42: note: 'inode_item' declared here 2803 | struct btrfs_inode_item *inode_item; | ^~~~~~~~~~ Initialize the inode_item to NULL, the compiler does not seem to see the relation between the first 'wc->log_key.type == BTRFS_INODE_ITEM_KEY' check and the other one that also checks the replay phase. Signed-off-by: Qiang Ma <maqianga@uniontech.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-19fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()Joanne Koong2-2/+9
Above the while() loop in wait_sb_inodes(), we document that we must wait for all pages under writeback for data integrity. Consequently, if a mapping, like fuse, traditionally does not have data integrity semantics, there is no need to wait at all; we can simply skip these inodes. This restores fuse back to prior behavior where syncs are no-ops. This fixes a user regression where if a system is running a faulty fuse server that does not reply to issued write requests, this causes wait_sb_inodes() to wait forever. Link: https://lkml.kernel.org/r/20260105211737.4105620-2-joannelkoong@gmail.com Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com> Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Tested-by: J. Neuschäfer <j.neuschaefer@gmx.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Bernd Schubert <bschubert@ddn.com> Cc: Bonaccorso Salvatore <carnil@debian.org> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-19efivarfs: fix error propagation in efivar_entry_get()Kohei Enju1-1/+1
efivar_entry_get() always returns success even if the underlying __efivar_entry_get() fails, masking errors. This may result in uninitialized heap memory being copied to userspace in the efivarfs_file_read() path. Fix it by returning the error from __efivar_entry_get(). Fixes: 2d82e6227ea1 ("efi: vars: Move efivar caching layer into efivarfs") Cc: <stable@vger.kernel.org> # v6.1+ Signed-off-by: Kohei Enju <kohei@enjuk.jp> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2026-01-18Merge tag 'ext4_for_linus-6.19-rc6' of ↵Linus Torvalds2-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: - Fix an inconsistency in structure size on 32-bit platforms caused by padding differences for the new EXT4_IOC_[GS]ET_TUNE_SB_PARAM ioctls - Fix a buffer leak on the error path when dropping the refcount an xattr value stored in an inode - Fix missing locking on the error path for the file defragmentation ioctl leading to a BUG * tag 'ext4_for_linus-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix iloc.bh leak in ext4_xattr_inode_update_ref ext4: add missing down_write_data_sem in mext_move_extent(). ext4: fix ext4_tune_sb_params padding
2026-01-18ext4: fix iloc.bh leak in ext4_xattr_inode_update_refYang Erkun1-0/+1
The error branch for ext4_xattr_inode_update_ref forget to release the refcount for iloc.bh. Find this when review code. Fixes: 57295e835408 ("ext4: guard against EA inode refcount underflow in xattr update") Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20251213055706.3417529-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2026-01-18ext4: add missing down_write_data_sem in mext_move_extent().Julian Sun1-0/+2
Commit 962e8a01eab9 ("ext4: introduce mext_move_extent()") attempts to call ext4_swap_extents() on the failure path to recover the swapped extents, but fails to acquire locks for the two inode->i_data_sem, triggering the BUG_ON statement in ext4_swap_extents(). This issue can be fixed by calling ext4_double_down_write_data_sem() before ext4_swap_extents(). Signed-off-by: Julian Sun <sunjunchao@bytedance.com> Reported-by: syzbot+4ea6bd8737669b423aae@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69368649.a70a0220.38f243.0093.GAE@google.com/ Fixes: 962e8a01eab9 ("ext4: introduce mext_move_extent()") Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://patch.msgid.link/20251208123713.1971068-1-sunjunchao@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-01-17Merge tag 'for-6.19-rc5-tag' of ↵Linus Torvalds8-68/+41
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - with large folios in use, fix partial incorrect update of a reflinked range - fix potential deadlock in iget when lookup fails and eviction is needed - in send, validate inline extent type while detecting file holes - fix memory leak after an error when creating a space info - remove zone statistics from sysfs again, the output size limitations make it unusable, we'll do it in another way in another release - test fixes: - return proper error codes from block remapping tests - fix tree root leaks in qgroup tests after errors * tag 'for-6.19-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: remove zoned statistics from sysfs btrfs: fix memory leaks in create_space_info() error paths btrfs: invalidate pages instead of truncate after reflinking btrfs: update the Kconfig string for CONFIG_BTRFS_EXPERIMENTAL btrfs: send: check for inline extents in range_is_hole_in_parent() btrfs: tests: fix return 0 on rmap test failure btrfs: tests: fix root tree leak in btrfs_test_qgroups() btrfs: release path before iget_failed() in btrfs_read_locked_inode()
2026-01-16vfs: document d_dispose_if_unused()Miklos Szeredi1-0/+10
Add a warning about the danger of using this function without proper locking preventing eviction. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260114145344.468856-7-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16fuse: shrink once after all buckets have been scannedMiklos Szeredi1-1/+1
In fuse_dentry_tree_work() move the shrink_dentry_list() out from the loop. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260114145344.468856-6-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16fuse: clean up fuse_dentry_tree_work()Miklos Szeredi1-14/+14
- Change time_after64() time_before64(), since the latter is exclusively used in this file to compare dentry/inode timeout with current time. - Move the break statement from the else branch to the if branch, reducing indentation. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260114145344.468856-5-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16fuse: add need_resched() before unlocking bucketMiklos Szeredi1-3/+5
In fuse_dentry_tree_work() no need to unlock/lock dentry_hash[i].lock on each iteration. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260114145344.468856-4-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16fuse: make sure dentry is evicted if staleMiklos Szeredi1-0/+4
d_dispose_if_unused() may find the dentry with a positive refcount, in which case it won't be put on the dispose list even though it has already timed out. "Reinstall" the d_delete() callback, which was optimized out in fuse_dentry_settime(). This will result in the dentry being evicted as soon as the refcount hits zero. Fixes: ab84ad597386 ("fuse: new work queue to periodically invalidate expired dentries") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260114145344.468856-3-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16fuse: fix race when disposing stale dentriesMiklos Szeredi1-9/+2
In fuse_dentry_tree_work() just before d_dispose_if_unused() the dentry could get evicted, resulting in UAF. Move unlocking dentry_hash[i].lock to after the dispose. To do this, fuse_dentry_tree_del_node() needs to be moved from fuse_dentry_prune() to fuse_dentry_release() to prevent an ABBA deadlock. The lock ordering becomes: -> dentry_bucket.lock -> dentry.d_lock Reported-by: Al Viro <viro@zeniv.linux.org.uk> Closes: https://lore.kernel.org/all/20251206014242.GO1712166@ZenIV/ Fixes: ab84ad597386 ("fuse: new work queue to periodically invalidate expired dentries") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260114145344.468856-2-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16Merge tag 'xfs-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds5-39/+41
Pull xfs fixes from Carlos Maiolino: "Just a few obvious fixes and some 'cosmetic' changes" * tag 'xfs-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: set max_agbno to allow sparse alloc of last full inode chunk xfs: Fix xfs_grow_last_rtg() xfs: improve the assert at the top of xfs_log_cover xfs: fix an overly long line in xfs_rtgroup_calc_geometry xfs: mark __xfs_rtgroup_extents static xfs: Fix the return value of xfs_rtcopy_summary() xfs: fix memory leak in xfs_growfs_check_rtgeom()
2026-01-16fuse: use private naming for fuse hash sizeJens Axboe1-7/+7
With a mix of include dependencies, the compiler warns that: fs/fuse/dir.c:35:9: warning: ?HASH_BITS? redefined 35 | #define HASH_BITS 5 | ^~~~~~~~~ In file included from ./include/linux/io_uring_types.h:5, from ./include/linux/bpf.h:34, from ./include/linux/security.h:35, from ./include/linux/fs_context.h:14, from fs/fuse/dir.c:13: ./include/linux/hashtable.h:28:9: note: this is the location of the previous definition 28 | #define HASH_BITS(name) ilog2(HASH_SIZE(name)) | ^~~~~~~~~ fs/fuse/dir.c:36:9: warning: ?HASH_SIZE? redefined 36 | #define HASH_SIZE (1 << HASH_BITS) | ^~~~~~~~~ ./include/linux/hashtable.h:27:9: note: this is the location of the previous definition 27 | #define HASH_SIZE(name) (ARRAY_SIZE(name)) | ^~~~~~~~~ Hence rename the HASH_SIZE/HASH_BITS in fuse, by prefixing them with FUSE_ instead. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://patch.msgid.link/195c9525-281c-4302-9549-f3d9259416c6@kernel.dk Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-15Merge tag 'nfs-for-6.19-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds15-101/+238
Pull NFS client fixes from Trond Myklebust: - Fix another deadlock involving nfs_release_folio() - localio: - Stop I/O upon hitting a fatal error - Deal with page offsets that are > PAGE_SIZE - Fix size read races in truncate, fallocate and copy offload - Several bugfixes for the NFSv4.x directory delegation client code - pNFS: - Fix a deadlock when returning delegations during open - Fix memory leaks in various error paths * tag 'nfs-for-6.19-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Fix size read races in truncate, fallocate and copy offload NFS: Don't immediately return directory delegations when disabled NFS/localio: Deal with page bases that are > PAGE_SIZE NFS/localio: Stop further I/O upon hitting an error NFSv4.x: Directory delegations don't require any state recovery NFSv4: Don't free slots prematurely if requesting a directory delegation NFSv4: Fix nfs_clear_verifier_delegated() for delegated directories NFS: Fix directory delegation verifier checks pnfs/blocklayout: Fix memory leak in bl_parse_scsi() pnfs/flexfiles: Fix memory leak in nfs4_ff_alloc_deviceid_node() NFS: Fix a deadlock involving nfs_release_folio() pNFS: Fix a deadlock when returning a delegation during open()
2026-01-15NFS: Fix size read races in truncate, fallocate and copy offloadTrond Myklebust3-14/+27
If the pre-operation file size is read before locking the inode and quiescing O_DIRECT writes, then nfs_truncate_last_folio() might end up overwriting valid file data. Fixes: b1817b18ff20 ("NFS: Protect against 'eof page pollution'") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-01-14btrfs: remove zoned statistics from sysfsJohannes Thumshirn1-52/+0
Remove the newly introduced zoned statistics from sysfs, as sysfs can only show a single page this will truncate the output on a busy filesystem. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-14writeback: use round_jiffies_relative for dirtytime_workZhao Mengmeng1-2/+4
The dirtytime_work is a background housekeeping task that flushes dirty inodes, using round_jiffies_relative() will allow kernel to batch this work with other aligned system tasks, reducing power consumption. Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn> Link: https://patch.msgid.link/20260113082614.231580-1-zhaomzhao@126.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14iomap: wait for batched folios to be stable in __iomap_get_folioChristoph Hellwig1-0/+1
__iomap_get_folio needs to wait for writeback to finish if the file requires folios to be stable for writes. For the regular path this is taken care of by __filemap_get_folio, but for the newly added batch lookup it has to be done manually. This fixes xfs/131 failures when running on PI-capable hardware. Fixes: 395ed1ef0012 ("iomap: optional zero range dirty folio processing") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260113153943.3323869-1-hch@lst.de Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-13Merge tag 'gfs2-for-6.19-rc6' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 revert from Andreas Gruenbacher: "Revert bad commit "gfs2: Fix use of bio_chain" I was originally assuming that there must be a bug in gfs2 because gfs2 chains bios in the opposite direction of what bio_chain_and_submit() expects. It turns out that the bio chains are set up in "reverse direction" intentionally so that the first bio's bi_end_io callback is invoked rather than the last bio's callback. We want the first bio's callback invoked for the following reason: The initial bio starts page aligned and covers one or more pages. When it terminates at a non-page-aligned offset, subsequent bios are added to handle the remaining portion of the final page. Upon completion of the bio chain, all affected pages need to be be marked as read, and only the first bio references all of these pages" * tag 'gfs2-for-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Revert "gfs2: Fix use of bio_chain"
2026-01-13xfs: set max_agbno to allow sparse alloc of last full inode chunkBrian Foster1-5/+6
Sparse inode cluster allocation sets min/max agbno values to avoid allocating an inode cluster that might map to an invalid inode chunk. For example, we can't have an inode record mapped to agbno 0 or that extends past the end of a runt AG of misaligned size. The initial calculation of max_agbno is unnecessarily conservative, however. This has triggered a corner case allocation failure where a small runt AG (i.e. 2063 blocks) is mostly full save for an extent to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this case, which happens to be the offset of the last possible valid inode chunk in the AG. In practice, we should be able to allocate the 4-block cluster at agbno 2052 to map to the parent inode record at agbno 2048, but the max_agbno value precludes it. Note that this can result in filesystem shutdown via dirty trans cancel on stable kernels prior to commit 9eb775968b68 ("xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because the tail AG selection by the allocator sets t_highest_agno on the transaction. If the inode allocator spins around and finds an inode chunk with free inodes in an earlier AG, the subsequent dir name creation path may still fail to allocate due to the AG restriction and cancel. To avoid this problem, update the max_agbno calculation to the agbno prior to the last chunk aligned agbno in the AG. This is not necessarily the last valid allocation target for a sparse chunk, but since inode chunks (i.e. records) are chunk aligned and sparse allocs are cluster sized/aligned, this allows the sb_spino_align alignment restriction to take over and round down the max effective agbno to within the last valid inode chunk in the AG. Note that even though the allocator improvements in the aforementioned commit seem to avoid this particular dirty trans cancel situation, the max_agbno logic improvement still applies as we should be able to allocate from an AG that has been appropriately selected. The more important target for this patch however are older/stable kernels prior to this allocator rework/improvement. Cc: stable@vger.kernel.org # v4.2 Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure") Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13xfs: Fix xfs_grow_last_rtg()Nirjhar Roy (IBM)1-1/+1
The last rtg should be able to grow when the size of the last is less than (and not equal to) sb_rgextents. xfs_growfs with realtime groups fails without this patch. The reason is that, xfs_growfs_rtg() tries to grow the last rt group even when the last rt group is at its maximal size i.e, sb_rgextents. It fails with the following messages: XFS (loop0): Internal error block >= mp->m_rsumblocks at line 253 of file fs/xfs/libxfs/xfs_rtbitmap.c. Caller xfs_rtsummary_read_buf+0x20/0x80 XFS (loop0): Corruption detected. Unmount and run xfs_repair XFS (loop0): Internal error xfs_trans_cancel at line 976 of file fs/xfs/xfs_trans.c. Caller xfs_growfs_rt_bmblock+0x402/0x450 XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0x10a/0x1f0 (fs/xfs/xfs_trans.c:977). Shutting down filesystem. XFS (loop0): Please unmount the filesystem and rectify the problem(s) Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13xfs: improve the assert at the top of xfs_log_coverChristoph Hellwig1-3/+5
Move each condition into a separate assert so that we can see which on triggered. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13xfs: fix an overly long line in xfs_rtgroup_calc_geometryChristoph Hellwig1-1/+2
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13xfs: mark __xfs_rtgroup_extents staticChristoph Hellwig2-27/+25
__xfs_rtgroup_extents is not used outside of xfs_rtgroup.c, so mark it static. Move it and xfs_rtgroup_extents up in the file to avoid forward declarations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13xfs: Fix the return value of xfs_rtcopy_summary()Nirjhar Roy (IBM)1-1/+1
xfs_rtcopy_summary() should return the appropriate error code instead of always returning 0. The caller of this function which is xfs_growfs_rt_bmblock() is already handling the error. Fixes: e94b53ff699c ("xfs: cache last bitmap block in realtime allocator") Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org # v6.7 Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13romfs: check sb_set_blocksize() return valueDeepanshu Kartikey1-1/+4
romfs_fill_super() ignores the return value of sb_set_blocksize(), which can fail if the requested block size is incompatible with the block device's configuration. This can be triggered by setting a loop device's block size larger than PAGE_SIZE using ioctl(LOOP_SET_BLOCK_SIZE, 32768), then mounting a romfs filesystem on that device. When sb_set_blocksize(sb, ROMBSIZE) is called with ROMBSIZE=4096 but the device has logical_block_size=32768, bdev_validate_blocksize() fails because the requested size is smaller than the device's logical block size. sb_set_blocksize() returns 0 (failure), but romfs ignores this and continues mounting. The superblock's block size remains at the device's logical block size (32768). Later, when sb_bread() attempts I/O with this oversized block size, it triggers a kernel BUG in folio_set_bh(): kernel BUG at fs/buffer.c:1582! BUG_ON(size > PAGE_SIZE); Fix by checking the return value of sb_set_blocksize() and failing the mount with -EINVAL if it returns 0. Reported-by: syzbot+9c4e33e12283d9437c25@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9c4e33e12283d9437c25 Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Link: https://patch.msgid.link/20260113084037.1167887-1-kartikey406@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12NFS: Don't immediately return directory delegations when disabledAnna Schumaker1-1/+1
The function nfs_inode_evict_delegation() immediately and synchronously returns a delegation when called. This means we can't call it from nfs4_have_delegation(), since that function could be called under a lock. Instead we should mark the delegation for return and let the state manager handle it for us. Fixes: b6d2a520f463 ("NFS: Add a module option to disable directory delegations") Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-01-12btrfs: fix memory leaks in create_space_info() error pathsJiasheng Jiang1-2/+6
In create_space_info(), the 'space_info' object is allocated at the beginning of the function. However, there are two error paths where the function returns an error code without freeing the allocated memory: 1. When create_space_info_sub_group() fails in zoned mode. 2. When btrfs_sysfs_add_space_info_type() fails. In both cases, 'space_info' has not yet been added to the fs_info->space_info list, resulting in a memory leak. Fix this by adding an error handling label to kfree(space_info) before returning. Fixes: 2be12ef79fe9 ("btrfs: Separate space_info create/update") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-12btrfs: invalidate pages instead of truncate after reflinkingFilipe Manana1-10/+13
Qu reported that generic/164 often fails because the read operations get zeroes when it expects to either get all bytes with a value of 0x61 or 0x62. The issue stems from truncating the pages from the page cache instead of invalidating, as truncating can zero page contents. This zeroing is not just in case the range is not page sized (as it's commented in truncate_inode_pages_range()) but also in case we are using large folios, they need to be split and the splitting fails. Stealing Qu's comment in the thread linked below: "We can have the following case: 0 4K 8K 12K 16K | | | | | |<---- Extent A ----->|<----- Extent B ------>| The page size is still 4K, but the folio we got is 16K. Then if we remap the range for [8K, 16K), then truncate_inode_pages_range() will get the large folio 0 sized 16K, then call truncate_inode_partial_folio(). Which later calls folio_zero_range() for the [8K, 16K) range first, then tries to split the folio into smaller ones to properly drop them from the cache. But if splitting failed (e.g. racing with other operations holding the filemap lock), the partially zeroed large folio will be kept, resulting the range [8K, 16K) being zeroed meanwhile the folio is still a 16K sized large one." So instead of truncating, invalidate the page cache range with a call to filemap_invalidate_inode(), which besides not doing any zeroing also ensures that while it's invalidating folios, no new folios are added. This helps ensure that buffered reads that happen while a reflink operation is in progress always get either the whole old data (the one before the reflink) or the whole new data, which is what generic/164 expects. Link: https://lore.kernel.org/linux-btrfs/7fb9b44f-9680-4c22-a47f-6648cb109ddf@suse.com/ Reported-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>