| Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull smb client fixes from Steve French:
- Fix change notify packet validation check
- Refcount fix (e.g. rename error paths)
- Fix potential UAF due to missing locks on directory lease refcount
* tag 'v6.18rc4-SMB-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: validate change notify buffer before copy
smb: client: fix refcount leak in smb2_set_path_attr
smb: client: fix potential UAF in smb2_close_cached_fid()
|
|
Pull xfs fixes from Carlos Maiolino:
"This contain fixes for the RT and zoned allocator, and a few fixes for
atomic writes"
* tag 'xfs-fixes-6.18-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: free xfs_busy_extents structure when no RT extents are queued
xfs: fix zone selection in xfs_select_open_zone_mru
xfs: fix a rtgroup leak when xfs_init_zone fails
xfs: fix various problems in xfs_atomic_write_cow_iomap_begin
xfs: fix delalloc write failures in software-provided atomic writes
|
|
SMB2_change_notify called smb2_validate_iov() but ignored the return
code, then kmemdup()ed using server provided OutputBufferOffset/Length.
Check the return of smb2_validate_iov() and bail out on error.
Discovered with help from the ZeroPath security tooling.
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Cc: stable@vger.kernel.org
Fixes: e3e9463414f61 ("smb3: improve SMB3 change notification support")
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Pull smb server fixes from Steve French:
- More safely detect RDMA capable devices correctly
* tag 'v6.18-rc4-smb-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: detect RDMA capable netdevs include IPoIB
ksmbd: detect RDMA capable lower devices when bridge and vlan netdev is used
|
|
Pull fscrypt fix from Eric Biggers:
"Fix an UBSAN warning that started occurring when the block layer
started supporting logical_block_size > PAGE_SIZE"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: fix left shift underflow when inode->i_blkbits > PAGE_SHIFT
|
|
kmemleak occasionally reports leaking xfs_busy_extents structure
from xfs_scrub calls after running xfs/528 (but attributed to following
tests), which seems to be caused by not freeing the xfs_busy_extents
structure when tr.queued is 0 and xfs_trim_rtgroup_extents breaks out
of the main loop. Free the structure in this case.
Fixes: a3315d11305f ("xfs: use rtgroup busy extent list for FITRIM")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xfs_select_open_zone_mru needs to pass XFS_ZONE_ALLOC_OK to
xfs_try_use_zone because we only want to tightly pack into zones of the
same or a compatible temperature instead of any available zone.
This got broken in commit 0301dae732a5 ("xfs: refactor hint based zone
allocation"), which failed to update this particular caller when
switching to an enum. xfs/638 sometimes, but not reliably fails due to
this change.
Fixes: 0301dae732a5 ("xfs: refactor hint based zone allocation")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Drop the rtgrop reference when xfs_init_zone fails for a conventional
device.
Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
I think there are several things wrong with this function:
A) xfs_bmapi_write can return a much larger unwritten mapping than what
the caller asked for. We convert part of that range to written, but
return the entire written mapping to iomap even though that's
inaccurate.
B) The arguments to xfs_reflink_convert_cow_locked are wrong -- an
unwritten mapping could be *smaller* than the write range (or even
the hole range). In this case, we convert too much file range to
written state because we then return a smaller mapping to iomap.
C) It doesn't handle delalloc mappings. This I covered in the patch
that I already sent to the list.
D) Reassigning count_fsb to handle the hole means that if the second
cmap lookup attempt succeeds (due to racing with someone else) we
trim the mapping more than is strictly necessary. The changing
meaning of count_fsb makes this harder to notice.
E) The tracepoint is kinda wrong because @length is mutated. That makes
it harder to chase the data flows through this function because you
can't just grep on the pos/bytecount strings.
F) We don't actually check that the br_state = XFS_EXT_NORM assignment
is accurate, i.e that the cow fork actually contains a written
mapping for the range we're interested in
G) Somewhat inadequate documentation of why we need to xfs_trim_extent
so aggressively in this function.
H) Not sure why xfs_iomap_end_fsb is used here, the vfs already clamped
the write range to s_maxbytes.
Fix these issues, and then the atomic writes regressions in generic/760,
generic/617, generic/091, generic/263, and generic/521 all go away for
me.
Cc: stable@vger.kernel.org # v6.16
Fixes: bd1d2c21d5d249 ("xfs: add xfs_atomic_write_cow_iomap_begin()")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
With the 20 Oct 2025 release of fstests, generic/521 fails for me on
regular (aka non-block-atomic-writes) storage:
QA output created by 521
dowrite: write: Input/output error
LOG DUMP (8553 total operations):
1( 1 mod 256): SKIPPED (no operation)
2( 2 mod 256): WRITE 0x7e000 thru 0x8dfff (0x10000 bytes) HOLE
3( 3 mod 256): READ 0x69000 thru 0x79fff (0x11000 bytes)
4( 4 mod 256): FALLOC 0x53c38 thru 0x5e853 (0xac1b bytes) INTERIOR
5( 5 mod 256): COPY 0x55000 thru 0x59fff (0x5000 bytes) to 0x25000 thru 0x29fff
6( 6 mod 256): WRITE 0x74000 thru 0x88fff (0x15000 bytes)
7( 7 mod 256): ZERO 0xedb1 thru 0x11693 (0x28e3 bytes)
with a warning in dmesg from iomap about XFS trying to give it a
delalloc mapping for a directio write. Fix the software atomic write
iomap_begin code to convert the reservation into a written mapping.
This doesn't fix the data corruption problems reported by generic/760,
but it's a start.
Cc: stable@vger.kernel.org # v6.16
Fixes: bd1d2c21d5d249 ("xfs: add xfs_atomic_write_cow_iomap_begin()")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
When simulating an nvme device on qemu with both logical_block_size and
physical_block_size set to 8 KiB, an error trace appears during
partition table reading at boot time. The issue is caused by
inode->i_blkbits being larger than PAGE_SHIFT, which leads to a left
shift of -1 and triggering a UBSAN warning.
[ 2.697306] ------------[ cut here ]------------
[ 2.697309] UBSAN: shift-out-of-bounds in fs/crypto/inline_crypt.c:336:37
[ 2.697311] shift exponent -1 is negative
[ 2.697315] CPU: 3 UID: 0 PID: 274 Comm: (udev-worker) Not tainted 6.18.0-rc2+ #34 PREEMPT(voluntary)
[ 2.697317] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 2.697320] Call Trace:
[ 2.697324] <TASK>
[ 2.697325] dump_stack_lvl+0x76/0xa0
[ 2.697340] dump_stack+0x10/0x20
[ 2.697342] __ubsan_handle_shift_out_of_bounds+0x1e3/0x390
[ 2.697351] bh_get_inode_and_lblk_num.cold+0x12/0x94
[ 2.697359] fscrypt_set_bio_crypt_ctx_bh+0x44/0x90
[ 2.697365] submit_bh_wbc+0xb6/0x190
[ 2.697370] block_read_full_folio+0x194/0x270
[ 2.697371] ? __pfx_blkdev_get_block+0x10/0x10
[ 2.697375] ? __pfx_blkdev_read_folio+0x10/0x10
[ 2.697377] blkdev_read_folio+0x18/0x30
[ 2.697379] filemap_read_folio+0x40/0xe0
[ 2.697382] filemap_get_pages+0x5ef/0x7a0
[ 2.697385] ? mmap_region+0x63/0xd0
[ 2.697389] filemap_read+0x11d/0x520
[ 2.697392] blkdev_read_iter+0x7c/0x180
[ 2.697393] vfs_read+0x261/0x390
[ 2.697397] ksys_read+0x71/0xf0
[ 2.697398] __x64_sys_read+0x19/0x30
[ 2.697399] x64_sys_call+0x1e88/0x26a0
[ 2.697405] do_syscall_64+0x80/0x670
[ 2.697410] ? __x64_sys_newfstat+0x15/0x20
[ 2.697414] ? x64_sys_call+0x204a/0x26a0
[ 2.697415] ? do_syscall_64+0xb8/0x670
[ 2.697417] ? irqentry_exit_to_user_mode+0x2e/0x2a0
[ 2.697420] ? irqentry_exit+0x43/0x50
[ 2.697421] ? exc_page_fault+0x90/0x1b0
[ 2.697422] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 2.697425] RIP: 0033:0x75054cba4a06
[ 2.697426] Code: 5d e8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 19 83 e2 39 83 fa 08 75 11 e8 26 ff ff ff 66 0f 1f 44 00 00 48 8b 45 10 0f 05 <48> 8b 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08
[ 2.697427] RSP: 002b:00007fff973723a0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
[ 2.697430] RAX: ffffffffffffffda RBX: 00005ea9a2c02760 RCX: 000075054cba4a06
[ 2.697432] RDX: 0000000000002000 RSI: 000075054c190000 RDI: 000000000000001b
[ 2.697433] RBP: 00007fff973723c0 R08: 0000000000000000 R09: 0000000000000000
[ 2.697434] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[ 2.697434] R13: 00005ea9a2c027c0 R14: 00005ea9a2be5608 R15: 00005ea9a2be55f0
[ 2.697436] </TASK>
[ 2.697436] ---[ end trace ]---
This situation can happen for block devices because when
CONFIG_TRANSPARENT_HUGEPAGE is enabled, the maximum logical_block_size
is 64 KiB. set_init_blocksize() then sets the block device
inode->i_blkbits to 13, which is within this limit.
File I/O does not trigger this problem because for filesystems that do
not support the FS_LBS feature, sb_set_blocksize() prevents
sb->s_blocksize_bits from being larger than PAGE_SHIFT. During inode
allocation, alloc_inode()->inode_init_always() assigns inode->i_blkbits
from sb->s_blocksize_bits. Currently, only xfs_fs_type has the FS_LBS
flag, and since xfs I/O paths do not reach submit_bh_wbc(), it does not
hit the left-shift underflow issue.
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Fixes: 47dd67532303 ("block/bdev: lift block size restrictions to 64k")
Cc: stable@vger.kernel.org
[EB: use folio_pos() and consolidate the two shifts by i_blkbits]
Link: https://lore.kernel.org/r/20251105003642.42796-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Fix refcount leak in `smb2_set_path_attr` when path conversion fails.
Function `cifs_get_writable_path` returns `cfile` with its reference
counter `cfile->count` increased on success. Function `smb2_compound_op`
would decrease the reference counter for `cfile`, as stated in its
comment. By calling `smb2_rename_path`, the reference counter of `cfile`
would leak if `cifs_convert_path_to_utf16` fails in `smb2_set_path_attr`.
Fixes: 8de9e86c67ba ("cifs: create a helper to find a writeable handle by path name")
Acked-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
find_or_create_cached_dir() could grab a new reference after kref_put()
had seen the refcount drop to zero but before cfid_list_lock is acquired
in smb2_close_cached_fid(), leading to use-after-free.
Switch to kref_put_lock() so cfid_release() is called with
cfid_list_lock held, closing that gap.
Fixes: ebe98f1447bb ("cifs: enable caching of directories for which a lease is held")
Cc: stable@vger.kernel.org
Reported-by: Jay Shin <jaeshin@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Current ksmbd_rdma_capable_netdev fails to mark certain RDMA-capable
inerfaces such as IPoIB as RDMA capable after reverting GUID matching code
due to layer violation.
This patch check the ARPHRD_INFINIBAND type safely identifies an IPoIB
interface without introducing a layer violation, ensuring RDMA
functionality is correctly enabled for these interfaces.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If user set bridge interface as actual RDMA-capable NICs are lower devices,
ksmbd can not detect as RDMA capable. This patch can detect the RDMA
capable lower devices from bridge master or VLAN. With this change, ksmbd
can accept both TCP and RDMA connections through the same bridge IP
address, allowing mixed transport operation without requiring separate
interfaces.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix memory leak in qgroup relation ioctl when qgroup levels are
invalid
- don't write back dirty metadata on filesystem with errors
- properly log renamed links
- properly mark prealloc extent range beyond inode size as dirty (when
no-noles is not enabled)
* tag 'for-6.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: mark dirty extent range for out of bound prealloc extents
btrfs: set inode flag BTRFS_INODE_COPY_EVERYTHING when logging new name
btrfs: fix memory leak of qgroup_list in btrfs_add_qgroup_relation
btrfs: ensure no dirty metadata is written back for an fs with errors
|
|
Pull xfs fixes from Carlos Maiolino:
"Just a single bug fix (and documentation for the issue)"
* tag 'xfs-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: document another racy GC case in xfs_zoned_map_extent
xfs: prevent gc from picking the same zone twice
|
|
Pull smb client fixes from Steve French:
- fix potential UAF in statfs
- DFS fix for expired referrals
- fix minor modinfo typo
- small improvement to reconnect for smbdirect
* tag '6.18-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: call smbd_destroy() in the same splace as kernel_sock_shutdown()/sock_release()
smb: client: handle lack of IPC in dfs_cache_refresh()
smb: client: fix potential cfid UAF in smb2_query_info_compound
cifs: fix typo in enable_gcm_256 module parameter
|
|
Besides blocks being invalidated, there is another case when the original
mapping could have changed between querying the rmap for GC and calling
xfs_zoned_map_extent. Document it there as it took us quite some time
to figure out what is going on while developing the multiple-GC
protection fix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
When we are picking a zone for gc it might already be in the pipeline
which can lead to us moving the same data twice resulting in in write
amplification and a very unfortunate case where we keep on garbage
collecting the zone we just filled with migrated data stopping all
forward progress.
Fix this by introducing a count of on-going GC operations on a zone, and
skip any zone with ongoing GC when picking a new victim.
Fixes: 080d01c41 ("xfs: implement zoned garbage collection")
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
In btrfs_fallocate(), when the allocated range overlaps with a prealloc
extent and the extent starts after i_size, the range doesn't get marked
dirty in file_extent_tree. This results in persisting an incorrect
disk_i_size for the inode when not using the no-holes feature.
This is reproducible since commit 41a2ee75aab0 ("btrfs: introduce
per-inode file extent tree"), then became hidden since commit 3d7db6e8bd22
("btrfs: don't allocate file extent tree for non regular files") and then
visible again after commit 8679d2687c35 ("btrfs: initialize
inode::file_extent_tree after i_mode has been set"), which fixes the
previous commit.
The following reproducer triggers the problem:
$ cat test.sh
MNT=/mnt/test
DEV=/dev/vdb
mkdir -p $MNT
mkfs.btrfs -f -O ^no-holes $DEV
mount $DEV $MNT
touch $MNT/file1
fallocate -n -o 1M -l 2M $MNT/file1
umount $MNT
mount $DEV $MNT
len=$((1 * 1024 * 1024))
fallocate -o 1M -l $len $MNT/file1
du --bytes $MNT/file1
umount $MNT
mount $DEV $MNT
du --bytes $MNT/file1
umount $MNT
Running the reproducer gives the following result:
$ ./test.sh
(...)
2097152 /mnt/test/file1
1048576 /mnt/test/file1
The difference is exactly 1048576 as we assigned.
Fix by adding a call to btrfs_inode_set_file_extent_range() in
btrfs_fallocate_update_isize().
Fixes: 41a2ee75aab0 ("btrfs: introduce per-inode file extent tree")
Signed-off-by: austinchang <austinchang@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If we are logging a new name make sure our inode has the runtime flag
BTRFS_INODE_COPY_EVERYTHING set so that at btrfs_log_inode() we will find
new inode refs/extrefs in the subvolume tree and copy them into the log
tree.
We are currently doing it when adding a new link but we are missing it
when renaming.
An example where this makes a new name not persisted:
1) create symlink with name foo in directory A
2) fsync directory A, which persists the symlink
3) rename the symlink from foo to bar
4) fsync directory A to persist the new symlink name
Step 4 isn't working correctly as it's not logging the new name and also
leaving the old inode ref in the log tree, so after a power failure the
symlink still has the old name of "foo". This is because when we first
fsync directoy A we log the symlink's inode (as it's a new entry) and at
btrfs_log_inode() we set the log mode to LOG_INODE_ALL and then because
we are using that mode and the inode has the runtime flag
BTRFS_INODE_NEEDS_FULL_SYNC set, we clear that flag as well as the flag
BTRFS_INODE_COPY_EVERYTHING. That means the next time we log the inode,
during the rename through the call to btrfs_log_new_name() (calling
btrfs_log_inode_parent() and then btrfs_log_inode()), we will not search
the subvolume tree for new refs/extrefs and jump directory to the
'log_extents' label.
Fix this by making sure we set BTRFS_INODE_COPY_EVERYTHING on an inode
when we are about to log a new name. A test case for fstests will follow
soon.
Reported-by: Vyacheslav Kovalevsky <slava.kovalevskiy.2014@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/ac949c74-90c2-4b9a-b7fd-1ffc5c3175c7@gmail.com/
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When btrfs_add_qgroup_relation() is called with invalid qgroup levels
(src >= dst), the function returns -EINVAL directly without freeing the
preallocated qgroup_list structure passed by the caller. This causes a
memory leak because the caller unconditionally sets the pointer to NULL
after the call, preventing any cleanup.
The issue occurs because the level validation check happens before the
mutex is acquired and before any error handling path that would free
the prealloc pointer. On this early return, the cleanup code at the
'out' label (which includes kfree(prealloc)) is never reached.
In btrfs_ioctl_qgroup_assign(), the code pattern is:
prealloc = kzalloc(sizeof(*prealloc), GFP_KERNEL);
ret = btrfs_add_qgroup_relation(trans, sa->src, sa->dst, prealloc);
prealloc = NULL; // Always set to NULL regardless of return value
...
kfree(prealloc); // This becomes kfree(NULL), does nothing
When the level check fails, 'prealloc' is never freed by either the
callee or the caller, resulting in a 64-byte memory leak per failed
operation. This can be triggered repeatedly by an unprivileged user
with access to a writable btrfs mount, potentially exhausting kernel
memory.
Fix this by freeing prealloc before the early return, ensuring prealloc
is always freed on all error paths.
Fixes: 4addc1ffd67a ("btrfs: qgroup: preallocate memory before adding a relation")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Shardul Bankar <shardulsb08@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
During development of a minor feature (make sure all btrfs_bio::end_io()
is called in task context), I noticed a crash in generic/388, where
metadata writes triggered new works after btrfs_stop_all_workers().
It turns out that it can even happen without any code modification, just
using RAID5 for metadata and the same workload from generic/388 is going
to trigger the use-after-free.
[CAUSE]
If btrfs hits an error, the fs is marked as error, no new
transaction is allowed thus metadata is in a frozen state.
But there are some metadata modifications before that error, and they are
still in the btree inode page cache.
Since there will be no real transaction commit, all those dirty folios
are just kept as is in the page cache, and they can not be invalidated
by invalidate_inode_pages2() call inside close_ctree(), because they are
dirty.
And finally after btrfs_stop_all_workers(), we call iput() on btree
inode, which triggers writeback of those dirty metadata.
And if the fs is using RAID56 metadata, this will trigger RMW and queue
new works into rmw_workers, which is already stopped, causing warning
from queue_work() and use-after-free.
[FIX]
Add a special handling for write_one_eb(), that if the fs is already in
an error state, immediately mark the bbio as failure, instead of really
submitting them.
Then during close_ctree(), iput() will just discard all those dirty
tree blocks without really writing them back, thus no more new jobs for
already stopped-and-freed workqueues.
The extra discard in write_one_eb() also acts as an extra safenet.
E.g. the transaction abort is triggered by some extent/free space
tree corruptions, and since extent/free space tree is already corrupted
some tree blocks may be allocated where they shouldn't be (overwriting
existing tree blocks). In that case writing them back will further
corrupting the fs.
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
kernel_sock_shutdown()/sock_release()
With commit b0432201a11b ("smb: client: let destroy_mr_list() keep
smbdirect_mr_io memory if registered") the changes from commit
214bab448476 ("cifs: Call MID callback before destroying transport") and
commit 1d2a4f57cebd ("cifs:smbd When reconnecting to server, call
smbd_destroy() after all MIDs have been called") are no longer needed.
And it's better to use the same logic flow, so that
the chance of smbdirect related problems is smaller.
Fixes: 214bab448476 ("cifs: Call MID callback before destroying transport")
Fixes: 1d2a4f57cebd ("cifs:smbd When reconnecting to server, call smbd_destroy() after all MIDs have been called")
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
In very rare cases, DFS mounts could end up with SMB sessions without
any IPC connections. These mounts are only possible when having
unexpired cached DFS referrals, hence not requiring any IPC
connections during the mount process.
Try to establish those missing IPC connections when refreshing DFS
referrals. If the server is still rejecting it, then simply ignore
and leave expired cached DFS referral for any potential DFS failovers.
Reported-by: Jay Shin <jaeshin@redhat.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Pull smb server fixes from Steve French:
- Improve check for malformed payload
- Fix free transport smbdirect potential race
- Fix potential race in credit allocation during smbdirect negotiation
* tag 'v6.18-rc3-smb-server-fixes' of git://git.samba.org/ksmbd:
smb: server: let smb_direct_cm_handler() call ib_drain_qp() after smb_direct_disconnect_rdma_work()
smb: server: call smb_direct_post_recv_credits() when the negotiation is done
ksmbd: transport_ipc: validate payload size before reading handle
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Regression fixes:
- Revert the patch that removed the cap on MAX_OPS_PER_COMPOUND
- Address a kernel build issue
Stable fixes:
- Fix crash when a client queries new attributes on forechannel
- Fix rare NFSD crash when tracing is enabled"
* tag 'nfsd-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
Revert "NFSD: Remove the cap on number of operations per NFSv4 COMPOUND"
nfsd: Avoid strlen conflict in nfsd4_encode_components_esc()
NFSD: Fix crash in nfsd4_read_release()
NFSD: Define actions for the new time_deleg FATTR4 attributes
|
|
When smb2_query_info_compound() retries, a previously allocated cfid may
have been freed in the first attempt.
Because cfid wasn't reset on replay, later cleanup could act on a stale
pointer, leading to a potential use-after-free.
Reinitialize cfid to NULL under the replay label.
Example trace (trimmed):
refcount_t: underflow; use-after-free.
WARNING: CPU: 1 PID: 11224 at ../lib/refcount.c:28 refcount_warn_saturate+0x9c/0x110
[...]
RIP: 0010:refcount_warn_saturate+0x9c/0x110
[...]
Call Trace:
<TASK>
smb2_query_info_compound+0x29c/0x5c0 [cifs f90b72658819bd21c94769b6a652029a07a7172f]
? step_into+0x10d/0x690
? __legitimize_path+0x28/0x60
smb2_queryfs+0x6a/0xf0 [cifs f90b72658819bd21c94769b6a652029a07a7172f]
smb311_queryfs+0x12d/0x140 [cifs f90b72658819bd21c94769b6a652029a07a7172f]
? kmem_cache_alloc+0x18a/0x340
? getname_flags+0x46/0x1e0
cifs_statfs+0x9f/0x2b0 [cifs f90b72658819bd21c94769b6a652029a07a7172f]
statfs_by_dentry+0x67/0x90
vfs_statfs+0x16/0xd0
user_statfs+0x54/0xa0
__do_sys_statfs+0x20/0x50
do_syscall_64+0x58/0x80
Cc: stable@kernel.org
Fixes: 4f1fffa237692 ("cifs: commands that are retried should have replay flag set")
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Acked-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb_direct_disconnect_rdma_work()
All handlers triggered by ib_drain_qp() should already see the
broken connection.
smb_direct_cm_handler() is called under a mutex of the rdma_cm,
we should make sure ib_drain_qp() and all rdma layer logic completes
and unlocks the mutex.
It means free_transport() will also already see the connection
as SMBDIRECT_SOCKET_DISCONNECTED, so we need to call
crdma_[un]lock_handler(sc->rdma.cm_id) around
ib_drain_qp(), rdma_destroy_qp(), ib_free_cq() and ib_dealloc_pd().
Otherwise we free resources while the ib_drain_qp() within
smb_direct_cm_handler() is still running.
We have to unlock before rdma_destroy_id() as it locks again.
Fixes: 141fa9824c0f ("ksmbd: call ib_drain_qp when disconnected")
Fixes: 4c564f03e23b ("smb: server: make use of common smbdirect_socket")
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
We now activate sc->recv_io.posted.refill_work and sc->idle.immediate_work
only after a successful negotiation, before sending the negotiation
response.
It means the queue_work(sc->workqueue, &sc->recv_io.posted.refill_work)
in put_recvmsg() of the negotiate request, is a no-op now.
It also means our explicit smb_direct_post_recv_credits() will
have queue_work(sc->workqueue, &sc->idle.immediate_work) as no-op.
This should make sure we don't have races and post any immediate
data_transfer message that tries to grant credits to the peer,
before we send the negotiation response, as that will grant
the initial credits to the peer.
Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers")
Fixes: 1cde0a74a7a8 ("smb: server: don't use delayed_work for post_recv_credits_work")
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
handle_response() dereferences the payload as a 4-byte handle without
verifying that the declared payload size is at least 4 bytes. A malformed
or truncated message from ksmbd.mountd can lead to a 4-byte read past the
declared payload size. Validate the size before dereferencing.
This is a minimal fix to guard the initial handle read.
Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers")
Cc: stable@vger.kernel.org
Reported-by: Qianchang Zhao <pioooooooooip@gmail.com>
Signed-off-by: Qianchang Zhao <pioooooooooip@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Fix typo in description of enable_gcm_256 module parameter
Suggested-by: Thomas Spear <speeddymon@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Remove dead code leftovers after a recent mitigations cleanup which
fail a Clang build
- Make sure a Retbleed mitigation message is printed only when
necessary
- Correct the last Zen1 microcode revision for which Entrysign sha256
check is needed
- Fix a NULL ptr deref when mounting the resctrl fs on a system which
supports assignable counters but where L3 total and local bandwidth
monitoring has been disabled at boot
* tag 'x86_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Remove dead code which might prevent from building
x86/bugs: Qualify RETBLEED_INTEL_MSG
x86/microcode: Fix Entrysign revision check for Zen1/Naples
x86,fs/resctrl: Fix NULL pointer dereference with events force-disabled in mbm_event mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:
- In Device::parent(), do not make any assumptions on the device
context of the parent device
- Check visibility before changing ownership of a sysfs attribute
group
- In topology_parse_cpu_capacity(), replace an incorrect usage of
PTR_ERR_OR_ZERO() with IS_ERR_OR_NULL()
- In devcoredump, fix a circular locking dependency between
struct devcd_entry::mutex and kernfs
- Do not warn about a pending fw_devlink sync state
* tag 'driver-core-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
arch_topology: Fix incorrect error check in topology_parse_cpu_capacity()
rust: device: fix device context of Device::parent()
sysfs: check visibility before changing group attribute ownership
devcoredump: Fix circular locking dependency with devcd->mutex.
driver core: fw_devlink: Don't warn about sync_state() pending
|
|
Pull xfs fixes from Carlos Maiolino:
"The main highlight here is a fix for a bug brought in by the removal
of attr2 mount option, where some installations might actually have
'attr2' explicitly configured in fstab preventing system to boot by
not being able to remount the rootfs as RW.
Besides that there are a couple fix to the zonefs implementation,
changing XFS_ONLINE_SCRUB_STATS to depend on DEBUG_FS (was select
before), and some other minor changes"
* tag 'xfs-fixes-6.18-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix locking in xchk_nlinks_collect_dir
xfs: loudly complain about defunct mount options
xfs: always warn about deprecated mount options
xfs: don't set bt_nr_sectors to a negative number
xfs: don't use __GFP_NOFAIL in xfs_init_fs_context
xfs: cache open zone in inode->i_private
xfs: avoid busy loops in GCD
xfs: XFS_ONLINE_SCRUB_STATS should depend on DEBUG_FS
xfs: do not tightly pack-write large files
xfs: Improve CONFIG_XFS_RT Kconfig help
|
|
Pull smb server fixes from Steve French:
"smbdirect (RDMA) fixes in order avoid potential submission queue
overflows:
- free transport teardown fix
- credit related fixes (five server related, one client related)"
* tag 'v6.18-rc2-smb-server-fixes' of git://git.samba.org/ksmbd:
smb: server: let free_transport() wait for SMBDIRECT_SOCKET_DISCONNECTED
smb: client: make use of smbdirect_socket.send_io.lcredits.*
smb: server: make use of smbdirect_socket.send_io.lcredits.*
smb: server: simplify sibling_list handling in smb_direct_flush_send_list/send_done
smb: server: smb_direct_disconnect_rdma_connection() already wakes all waiters on error
smb: smbdirect: introduce smbdirect_socket.send_io.lcredits.*
smb: server: allocate enough space for RW WRs and ib_drain_qp()
|
|
Pull smb client fixes from Steve French:
- add missing tracepoints
- smbdirect (RDMA) fix
- fix potential issue with credits underflow
- rename fix
- improvement to calc_signature and additional cleanup patch
* tag '6.18-rc2-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: #include cifsglob.h before trace.h to allow structs in tracepoints
cifs: Call the calc_signature functions directly
smb: client: get rid of d_drop() in cifs_do_rename()
cifs: Fix TCP_Server_Info::credits to be signed
cifs: Add a couple of missing smb3_rw_credits tracepoints
smb: client: allocate enough space for MR WRs and ib_drain_qp()
|
|
We should wait for the rdma_cm to become SMBDIRECT_SOCKET_DISCONNECTED!
At least on the client side (with similar code)
wait_event_interruptible() often returns with -ERESTARTSYS instead of
waiting for SMBDIRECT_SOCKET_DISCONNECTED.
We should use wait_event() here too, which makes the code be identical
in client and server, which will help when moving to common functions.
Fixes: b31606097de8 ("smb: server: move smb_direct_disconnect_rdma_work() into free_transport()")
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- in send, fix duplicated rmdir operations when using extrefs
(hardlinks), receive can fail with ENOENT
- fixup of error check when reading extent root in ref-verify and
damaged roots are allowed by mount option (found by smatch)
- fix freeing partially initialized fs info (found by syzkaller)
- fix use-after-free when printing ref_tracking status of delayed
inodes
* tag 'for-6.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: ref-verify: fix IS_ERR() vs NULL check in btrfs_build_ref_tree()
btrfs: fix delayed_node ref_tracker use after free
btrfs: send: fix duplicated rmdir operations when using extrefs
btrfs: directly free partially initialized fs_info in btrfs_check_leaked_roots()
|
|
Make cifs #include cifsglob.h in advance of #including trace.h so that the
structures defined in cifsglob.h can be accessed directly by the cifs
tracepoints rather than the callers having to manually pass in the bits and
pieces.
This should allow the tracepoints to be made more efficient to use as well
as easier to read in the code.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
As the SMB1 and SMB2/3 calc_signature functions are called from separate
sign and verify paths, just call them directly rather than using a function
pointer. The SMB3 calc_signature then jumps to the SMB2 variant if
necessary.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Enzo Matsumiya <ematsumiya@suse.de>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: Tom Talpey <tom@talpey.com>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
There is no need to force a lookup by unhashing the moved dentry after
successfully renaming the file on server. The file metadata will be
re-fetched from server, if necessary, in the next call to
->d_revalidate() anyways.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Fix TCP_Server_Info::credits to be signed, just as echo_credits and
oplock_credits are. This also fixes what ought to get at least a
compilation warning if not an outright error in *get_credits_field() as a
pointer to the unsigned server->credits field is passed back as a pointer
to a signed int.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cifs@vger.kernel.org
Cc: stable@vger.kernel.org
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Acked-by: Pavel Shilovskiy <pshilovskiy@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This makes the logic to prevent on overflow of
the send submission queue with ib_post_send() easier.
As we first get a local credit and then a remote credit
before we mark us as pending.
For now we'll keep the logic around smbdirect_socket.send_io.pending.*,
but that will likely change or be removed completely.
The server will get a similar logic soon, so
we'll be able to share the send code in future.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This introduces logic to prevent on overflow of
the send submission queue with ib_post_send() easier.
As we first get a local credit and then a remote credit
before we mark us as pending.
From reading the git history of the linux smbdirect
implementations in client and server) it was seen
that a peer granted more credits than we requested.
I guess that only happened because of bugs in our
implementation which was active as client and server.
I guess Windows won't do that.
So the local credits make sure we only use the amount
of credits we asked for.
Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers")
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb_direct_flush_send_list/send_done
We have a list handling that is much easier to understand:
1. Before smb_direct_flush_send_list() is called all
struct smbdirect_send_io messages are part of
send_ctx->msg_list
2. Before smb_direct_flush_send_list() calls
smb_direct_post_send() we remove the last
element in send_ctx->msg_list and move all
others into last->sibling_list. As only
last has IB_SEND_SIGNALED and gets a completion
vis send_done().
3. send_done() has an easy way to free all others
in sendmsg->sibling_list (if there are any).
And use list_for_each_entry_safe() instead of
a complex custom logic.
This will help us to share send_done() in common
code soon, as it will work fine for the client too,
where last->sibling_list is currently always an empty list.
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
waiters on error
There's no need to care about pending or credit counters when we
already disconnecting.
And all related wait_event conditions already check for broken
connections too.
This will simplify the code and makes the following changes simpler.
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This will be used to implement a logic in order to make sure
we don't overflow the send submission queue for ib_post_send().
We will initialize the local credits with the
fixed sp->send_credit_target value, which matches
the reserved slots in the submission queue for ib_post_send().
We will be a local credit first and then wait for a remote credit,
if we managed to get both we are allowed to post an
IB_WR_SEND[_WITH_INV]. The local credit is given back to
the pool when we get the local ib_post_send() completion,
while remote credits are granted by the peer.
From reading the git history of the linux smbdirect
implementations in client and server) it was seen
that a peer granted more credits than we requested.
I guess that only happened because of bugs in our
implementation which was active as client and server.
I guess Windows won't do that.
So the local credits make sure we only use the amount
of credits we asked for.
The client already has some logic for this based on
smbdirect_socket.send_io.pending.count, but that
counts in the order direction and makes it complex it
share common logic for various credits classes.
That logic will be replaced soon.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Make use of rdma_rw_mr_factor() to calculate the number of rw
credits and the number of pages per RDMA RW operation.
We get the same numbers for iWarp connections, tested
with siw.ko and irdma.ko (in iWarp mode).
siw:
CIFS: max_qp_rd_atom=128, max_fast_reg_page_list_len = 256
CIFS: max_sgl_rd=0, max_sge_rd=1
CIFS: responder_resources=32 max_frmr_depth=256 mr_io.type=0
CIFS: max_send_wr 384, device reporting max_cqe 3276800 max_qp_wr 32768
ksmbd: max_fast_reg_page_list_len = 256, max_sgl_rd=0, max_sge_rd=1
ksmbd: device reporting max_cqe 3276800 max_qp_wr 32768
ksmbd: Old sc->rw_io.credits: max = 9, num_pages = 256
ksmbd: New sc->rw_io.credits: max = 9, num_pages = 256, maxpages=2048
ksmbd: Info: rdma_send_wr 27 + max_send_wr 256 = 283
irdma (in iWarp mode):
CIFS: max_qp_rd_atom=127, max_fast_reg_page_list_len = 262144
CIFS: max_sgl_rd=0, max_sge_rd=13
CIFS: responder_resources=32 max_frmr_depth=2048 mr_io.type=0
CIFS: max_send_wr 384, device reporting max_cqe 1048574 max_qp_wr 4063
ksmbd: max_fast_reg_page_list_len = 262144, max_sgl_rd=0, max_sge_rd=13
ksmbd: device reporting max_cqe 1048574 max_qp_wr 4063
ksmbd: Old sc->rw_io.credits: max = 9, num_pages = 256
ksmbd: New sc->rw_io.credits: max = 9, num_pages = 256, maxpages=2048
ksmbd: rdma_send_wr 27 + max_send_wr 256 = 283
This means that we get the different correct numbers for ROCE,
tested with rdma_rxe.ko and irdma.ko (in RoCEv2 mode).
rxe:
CIFS: max_qp_rd_atom=128, max_fast_reg_page_list_len = 512
CIFS: max_sgl_rd=0, max_sge_rd=32
CIFS: responder_resources=32 max_frmr_depth=512 mr_io.type=0
CIFS: max_send_wr 384, device reporting max_cqe 32767 max_qp_wr 1048576
ksmbd: max_fast_reg_page_list_len = 512, max_sgl_rd=0, max_sge_rd=32
ksmbd: device reporting max_cqe 32767 max_qp_wr 1048576
ksmbd: Old sc->rw_io.credits: max = 9, num_pages = 256
ksmbd: New sc->rw_io.credits: max = 65, num_pages = 32, maxpages=2048
ksmbd: rdma_send_wr 65 + max_send_wr 256 = 321
irdma (in RoCEv2 mode):
CIFS: max_qp_rd_atom=127, max_fast_reg_page_list_len = 262144,
CIFS: max_sgl_rd=0, max_sge_rd=13
CIFS: responder_resources=32 max_frmr_depth=2048 mr_io.type=0
CIFS: max_send_wr 384, device reporting max_cqe 1048574 max_qp_wr 4063
ksmbd: max_fast_reg_page_list_len = 262144, max_sgl_rd=0, max_sge_rd=13
ksmbd: device reporting max_cqe 1048574 max_qp_wr 4063
ksmbd: Old sc->rw_io.credits: max = 9, num_pages = 256,
ksmbd: New sc->rw_io.credits: max = 159, num_pages = 13, maxpages=2048
ksmbd: rdma_send_wr 159 + max_send_wr 256 = 415
And rely on rdma_rw_init_qp() to setup ib_mr_pool_init() for
RW MRs. ib_mr_pool_destroy() will be called by rdma_rw_cleanup_mrs().
It seems the code was implemented before the rdma_rw_* layer
was fully established in the kernel.
While there also add additional space for ib_drain_qp().
This should make sure ib_post_send() will never fail
because the submission queue is full.
Fixes: ddbdc861e37c ("ksmbd: smbd: introduce read/write credits for RDMA read/write")
Fixes: 4c564f03e23b ("smb: server: make use of common smbdirect_socket")
Fixes: 177368b99243 ("smb: server: make use of common smbdirect_socket_parameters")
Fixes: 95475d8886bd ("smb: server: make use smbdirect_socket.rw_io.credits")
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|