aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/md (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-10-03Merge tag 'for-6.18/dm-changes' of ↵Linus Torvalds34-177/+5613
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mikulas Patocka: - a new dm-pcache target for read/write caching on persistent memory - fix typos in docs - misc small refactoring - mark dm-error with DM_TARGET_PASSES_INTEGRITY - dm-request-based: fix NULL pointer dereference and quiesce_depth out of sync - dm-linear: optimize REQ_PREFLUSH - dm-vdo: return error on corrupted metadata - dm-integrity: support asynchronous hash interface * tag 'for-6.18/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (27 commits) dm raid: use proper md_ro_state enumerators dm-integrity: prefer synchronous hash interface dm-integrity: enable asynchronous hash interface dm-integrity: rename internal_hash dm-integrity: add the "offset" argument dm-integrity: allocate the recalculate buffer with kmalloc dm-integrity: introduce integrity_kmap and integrity_kunmap dm-integrity: replace bvec_kmap_local with kmap_local_page dm-integrity: use internal variable for digestsize dm vdo: return error on corrupted metadata in start_restoring_volume functions dm vdo: Update code to use mem_is_zero dm: optimize REQ_PREFLUSH with data when using the linear target dm-pcache: use int type to store negative error codes dm: fix "writen"->"written" dm-pcache: cleanup: fix coding style report by checkpatch.pl dm-pcache: remove ctrl_lock for pcache_cache_segment dm: fix NULL pointer dereference in __dm_suspend() dm: fix queue start/stop imbalance under suspend/load/resume races dm-pcache: add persistent cache target in device-mapper dm error: mark as DM_TARGET_PASSES_INTEGRITY ...
2025-10-02Merge tag 'for-6.18/block-20250929' of ↵Linus Torvalds27-336/+2333
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - NVMe pull request via Keith: - FC target fixes (Daniel) - Authentication fixes and updates (Martin, Chris) - Admin controller handling (Kamaljit) - Target lockdep assertions (Max) - Keep-alive updates for discovery (Alastair) - Suspend quirk (Georg) - MD pull request via Yu: - Add support for a lockless bitmap. A key feature for the new bitmap are that the IO fastpath is lockless. If a user issues lots of write IO to the same bitmap bit in a short time, only the first write has additional overhead to update bitmap bit, no additional overhead for the following writes. By supporting only resync or recover written data, means in the case creating new array or replacing with a new disk, there is no need to do a full disk resync/recovery. - Switch ->getgeo() and ->bios_param() to using struct gendisk rather than struct block_device. - Rust block changes via Andreas. This series adds configuration via configfs and remote completion to the rnull driver. The series also includes a set of changes to the rust block device driver API: a few cleanup patches, and a few features supporting the rnull changes. The series removes the raw buffer formatting logic from `kernel::block` and improves the logic available in `kernel::string` to support the same use as the removed logic. - floppy arch cleanups - Reduce the number of dereferencing needed for ublk commands - Restrict supported sockets for nbd. Mostly done to eliminate a class of issues perpetually reported by syzbot, by using nonsensical socket setups. - A few s390 dasd block fixes - Fix a few issues around atomic writes - Improve DMA interation for integrity requests - Improve how iovecs are treated with regards to O_DIRECT aligment constraints. We used to require each segment to adhere to the constraints, now only the request as a whole needs to. - Clean up and improve p2p support, enabling use of p2p for metadata payloads - Improve locking of request lookup, using SRCU where appropriate - Use page references properly for brd, avoiding very long RCU sections - Fix ordering of recursively submitted IOs - Clean up and improve updating nr_requests for a live device - Various fixes and cleanups * tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (164 commits) s390/dasd: enforce dma_alignment to ensure proper buffer validation s390/dasd: Return BLK_STS_INVAL for EINVAL from do_dasd_request ublk: remove redundant zone op check in ublk_setup_iod() nvme: Use non zero KATO for persistent discovery connections nvmet: add safety check for subsys lock nvme-core: use nvme_is_io_ctrl() for I/O controller check nvme-core: do ioccsz/iorcsz validation only for I/O controllers nvme-core: add method to check for an I/O controller blk-cgroup: fix possible deadlock while configuring policy blk-mq: fix null-ptr-deref in blk_mq_free_tags() from error path blk-mq: Fix more tag iteration function documentation selftests: ublk: fix behavior when fio is not installed ublk: don't access ublk_queue in ublk_unmap_io() ublk: pass ublk_io to __ublk_complete_rq() ublk: don't access ublk_queue in ublk_need_complete_req() ublk: don't access ublk_queue in ublk_check_commit_and_fetch() ublk: don't pass ublk_queue to ublk_fetch() ublk: don't access ublk_queue in ublk_config_io_buf() ublk: don't access ublk_queue in ublk_check_fetch_buf() ublk: pass q_id and tag to __ublk_check_and_get_req() ...
2025-09-29Merge tag 'dlm-6.18' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This adds a dlm_release_lockspace() flag to request that node-failure recovery be performed for the node leaving the lockspace. The implementation of this flag requires coordination with userland clustering components. It's been requested for use by GFS2" * tag 'dlm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: check for undefined release_option values dlm: handle release_option as unsigned dlm: move to rinfo for all middle conversion cases dlm: handle invalid lockspace member remove dlm: add new flag DLM_RELEASE_RECOVER for dlm_lockspace_release dlm: add new configfs entry release_recover for lockspace members dlm: add new RELEASE_RECOVER uevent attribute for release_lockspace dlm: use defines for force values in dlm_release_lockspace dlm: check for defined force value in dlm_lockspace_release
2025-09-23dm raid: use proper md_ro_state enumeratorsHeinz Mauelshagen1-6/+7
The dm-raid code was using hardcoded integer values to represent the read-only/read-write state of RAID arrays instead of the proper enumeration constants defined in the md_ro_state enumerator type. Changes: - Replace hardcoded integers with the appropriate md_ro_state enumerator values - Add the missing MD_RDONLY setting in the post_suspend function (no failures have been attributed to this inconsistency, the fix ensures correct state transitions for completeness) This improves code clarity and maintainability by using the defined enumeration constants rather than magic numbers, ensuring the code properly conforms to the established API interface. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23dm-integrity: prefer synchronous hash interfaceMikulas Patocka1-23/+24
The previous patch preferred async interface for the purpose of testing. However, the synchronous interface is faster, so it should be preferred. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23dm-integrity: enable asynchronous hash interfaceMikulas Patocka1-44/+199
This commit enables the asynchronous hash interface in dm-integrity. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23dm-integrity: rename internal_hashMikulas Patocka1-8/+11
Rename "internal_hash" to "internal_shash" and introduce a boolean value "internal_hash". Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23dm-integrity: add the "offset" argumentMikulas Patocka1-14/+35
Make sure that the "data" argument passed to integrity_sector_checksum is always page-aligned and add an "offset" argument that specifies the offset from the start of the page. This will enable us to use the asynchronous hash interface later. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23dm-integrity: allocate the recalculate buffer with kmallocMikulas Patocka1-4/+4
Allocate the recalculate buffer with kmalloc rather than vmalloc. This will be needed later, for the simplification of the asynchronous hash interface. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23dm-integrity: introduce integrity_kmap and integrity_kunmapMikulas Patocka1-20/+17
This abstraction will be used later, for the asynchronous hash interface. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23dm-integrity: replace bvec_kmap_local with kmap_local_pageMikulas Patocka1-7/+6
Replace bvec_kmap_local with kmap_local_page - it will be needed for the upcoming patches that make kmap_local_page optional, depending on whether asynchronous hash interface is used or not. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23dm-integrity: use internal variable for digestsizeMikulas Patocka1-11/+14
Instead of calling digestsize() each time the digestsize for the internal hash is needed, store the digestsize in a new field internal_hash_digestsize within struct dm_integrity_c once and use this value when needed. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23dm vdo: return error on corrupted metadata in start_restoring_volume functionsIvan Abramov1-2/+2
The return values of VDO_ASSERT calls that validate metadata are not acted upon. Return UDS_CORRUPT_DATA in case of an error. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: a4eb7e255517 ("dm vdo: implement the volume index") Signed-off-by: Ivan Abramov <i.abramov@mt-integration.ru> Reviewed-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23dm vdo: Update code to use mem_is_zeroBruce Johnston1-14/+3
Remove function that would check if data was all zeroes. Use the built-in kernel function mem_is_zero() instead. Signed-off-by: Bruce Johnston <bjohnsto@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-19Merge tag 'block-6.17-20250918' of git://git.kernel.dk/linuxLinus Torvalds5-0/+5
Pull block fixes from Jens Axboe: "A set of fixes for an issue with md array assembly and drbd for devices supporting write zeros" * tag 'block-6.17-20250918' of git://git.kernel.dk/linux: drbd: init queue_limits->max_hw_wzeroes_unmap_sectors parameter md: init queue_limits->max_hw_wzeroes_unmap_sectors parameter
2025-09-18dm: optimize REQ_PREFLUSH with data when using the linear targetMikulas Patocka2-8/+25
If the table has only linear targets and there is just one underlying device, we can optimize REQ_PREFLUSH with data - we don't have to split it to two bios - a flush and a write. We can pass it to the linear target directly. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
2025-09-17Merge tag 'for-6.17/dm-fixes' of ↵Linus Torvalds3-6/+12
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mikulas Patocka: - fix integer overflow in dm-stripe - limit tag size in dm-integrity to 255 bytes - fix 'alignment inconsistency' warning in dm-raid * tag 'for-6.17/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm-raid: don't set io_min and io_opt for raid1 dm-integrity: limit MAX_TAG_SIZE to 255 dm-stripe: fix a possible integer overflow
2025-09-17dm-raid: don't set io_min and io_opt for raid1Mikulas Patocka1-2/+4
These commands modprobe brd rd_size=1048576 vgcreate vg /dev/ram* lvcreate -m4 -L10 -n lv vg trigger the following warnings: device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency The warnings are caused by the fact that io_min is 512 and physical block size is 4096. If there's chunk-less raid, such as raid1, io_min shouldn't be set to zero because it would be raised to 512 and it would trigger the warning. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: stable@vger.kernel.org
2025-09-17md: init queue_limits->max_hw_wzeroes_unmap_sectors parameterZhang Yi5-0/+5
The parameter max_hw_wzeroes_unmap_sectors in queue_limits should be equal to max_write_zeroes_sectors if it is set to a non-zero value. However, the stacked md drivers call md_init_stacking_limits() to initialize this parameter to UINT_MAX but only adjust max_write_zeroes_sectors when setting limits. Therefore, this discrepancy triggers a value check failure in blk_validate_limits(). $ modprobe scsi_debug num_parts=2 dev_size_mb=8 lbprz=1 lbpws=1 $ mdadm --create /dev/md0 --level=0 --raid-device=2 /dev/sda1 /dev/sda2 mdadm: Defaulting to version 1.2 metadata mdadm: RUN_ARRAY failed: Invalid argument Fix this failure by explicitly setting max_hw_wzeroes_unmap_sectors to max_write_zeroes_sectors. Since the linear and raid0 drivers support write zeroes, so they can support unmap write zeroes operation if all of the backend devices support it. However, the raid1/10/5 drivers don't support write zeroes, so we have to set it to zero. Fixes: 0c40d7cb5ef3 ("block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits") Reported-by: John Garry <john.g.garry@oracle.com> Closes: https://lore.kernel.org/linux-block/803a2183-a0bb-4b7a-92f1-afc5097630d2@oracle.com/ Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Tested-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Li Nan <linan122@huawei.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/linux-raid/20250910111107.3247530-2-yi.zhang@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com>
2025-09-10md/md-llbitmap: Use DIV_ROUND_UP_SECTOR_TNathan Chancellor1-8/+8
When building for 32-bit platforms, there are several link (if builtin) or modpost (if a module) errors due to dividends of type 'sector_t' in DIV_ROUND_UP: arm-linux-gnueabi-ld: drivers/md/md-llbitmap.o: in function `llbitmap_resize': drivers/md/md-llbitmap.c:1017:(.text+0xae8): undefined reference to `__aeabi_uldivmod' arm-linux-gnueabi-ld: drivers/md/md-llbitmap.c:1020:(.text+0xb10): undefined reference to `__aeabi_uldivmod' arm-linux-gnueabi-ld: drivers/md/md-llbitmap.o: in function `llbitmap_end_discard': drivers/md/md-llbitmap.c:1114:(.text+0xf14): undefined reference to `__aeabi_uldivmod' arm-linux-gnueabi-ld: drivers/md/md-llbitmap.o: in function `llbitmap_start_discard': drivers/md/md-llbitmap.c:1097:(.text+0x1808): undefined reference to `__aeabi_uldivmod' arm-linux-gnueabi-ld: drivers/md/md-llbitmap.o: in function `llbitmap_read_sb': drivers/md/md-llbitmap.c:867:(.text+0x2080): undefined reference to `__aeabi_uldivmod' arm-linux-gnueabi-ld: drivers/md/md-llbitmap.o:drivers/md/md-llbitmap.c:895: more undefined references to `__aeabi_uldivmod' follow Use DIV_ROUND_UP_SECTOR_T instead of DIV_ROUND_UP, which exists to handle this exact situation. Fixes: 5ab829f1971d ("md/md-llbitmap: introduce new lockless bitmap") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/raid0: convert raid0_make_request() to use bio_submit_split_bioset()Yu Kuai1-11/+2
Currently, raid0_make_request() will remap the original bio to underlying disks to prevent reordered IO. Now that bio_submit_split_bioset() will put original bio to the head of current->bio_list, it's safe converting to use this helper and bio will still be ordered. CC: Jan Kara <jack@suse.cz> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/md-linear: convert to use bio_submit_split_bioset()Yu Kuai1-12/+3
Unify bio split code, prepare to fix reordered split IO. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/raid5: convert to use bio_submit_split_bioset()Yu Kuai1-6/+4
Unify bio split code, prepare to fix ordering of split IO. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/raid10: convert read/write to use bio_submit_split_bioset()Yu Kuai1-29/+13
Unify bio split code, prepare to fix ordering of split IO, the error path is modified a bit, however no functional changes are intended: - bio_submit_split_bioset() can fail the original bio directly by split error, set R10BIO_Uptodate in this case to notify raid_end_bio_io() that the original bio is returned already. - set R10BIO_Uptodate and set error value to -EIO is useless now, for r10_bio without R10BIO_Uptodate, -EIO will be returned for original bio. And discard is not handled, because discard is only split for unaligned head and tail, and this can be considered slow path, the reorder here does not matter much. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/raid10: add a new r10bio flag R10BIO_ReturnedYu Kuai2-3/+7
The new helper bio_submit_split_bioset() can failed the orginal bio on split errors, prepare to handle this case in raid_end_bio_io(). The flag name is refer to the r1bio flag name. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/raid1: convert to use bio_submit_split_bioset()Yu Kuai2-28/+14
Unify bio split code, and prepare to fix ordering of split IO. Noted that bio_submit_split_bioset() can fail the original bio directly by split error, set R1BIO_Returned in this case to notify raid_end_bio_io() that the original bio is returned already. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/raid0: convert raid0_handle_discard() to use bio_submit_split_bioset()Yu Kuai1-13/+6
Unify bio split code, and prepare to fix ordering of split IO Noted commit 319ff40a5427 ("md/raid0: Fix performance regression for large sequential writes") already fix ordering of split IO by remapping bio to underlying disks before resubmitting it, with the respect md_submit_bio() already split it by sectors, and raid0_make_request() will split at most once for unaligned IO. This is a bit hacky and we'll convert this to solution in general later. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md: fix mssing blktrace bio split eventsYu Kuai5-0/+19
If bio is split by internal handling like chunksize or badblocks, the corresponding trace_block_split() is missing, resulting in blktrace inability to catch BIO split events and making it harder to analyze the BIO sequence. Cc: stable@vger.kernel.org Fixes: 4b1faf931650 ("block: Kill bio_pair_split()") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-09Merge tag 'md-6.18-20250909' of ↵Jens Axboe13-219/+2252
gitolite.kernel.org:pub/scm/linux/kernel/git/mdraid/linux into for-6.18/block Pull MD changes from Yu Kuai: "Redundant data is used to enhance data fault tolerance, and the storage method for redundant data vary depending on the RAID levels. And it's important to maintain the consistency of redundant data. Bitmap is used to record which data blocks have been synchronized and which ones need to be resynchronized or recovered. Each bit in the bitmap represents a segment of data in the array. When a bit is set, it indicates that the multiple redundant copies of that data segment may not be consistent. Data synchronization can be performed based on the bitmap after power failure or readding a disk. If there is no bitmap, a full disk synchronization is required. Due to known performance issues with md-bitmap and the unreasonable implementations: - self-managed IO submitting like filemap_write_page(); - global spin_lock I have decided not to continue optimizing based on the current bitmap implementation, this new bitmap is invented without locking from IO fast path and can be used with fast disks. Key features for the new bitmap: - IO fastpath is lockless, if user issues lots of write IO to the same bitmap bit in a short time, only the first write has additional overhead to update bitmap bit, no additional overhead for the following writes; - support only resync or recover written data, means in the case creating new array or replacing with a new disk, there is no need to do a full disk resync/recovery;" * tag 'md-6.18-20250909' of gitolite.kernel.org:pub/scm/linux/kernel/git/mdraid/linux: (24 commits) md/md-llbitmap: introduce new lockless bitmap md/md-bitmap: make method bitmap_ops->daemon_work optional md: add a new recovery_flag MD_RECOVERY_LAZY_RECOVER md/md-bitmap: add a new method blocks_synced() in bitmap_operations md/md-bitmap: add a new method skip_sync_blocks() in bitmap_operations md/md-bitmap: delay registration of bitmap_ops until creating bitmap md/md-bitmap: add a new sysfs api bitmap_type md: add a new mddev field 'bitmap_id' md/md-bitmap: support discard for bitmap ops md: factor out a helper raid_is_456() md: add a new parameter 'offset' to md_super_write() md/md-bitmap: introduce CONFIG_MD_BITMAP md: check before referencing mddev->bitmap_ops md/dm-raid: check before referencing mddev->bitmap_ops md/raid5: check before referencing mddev->bitmap_ops md/raid10: check before referencing mddev->bitmap_ops md/raid1: check before referencing mddev->bitmap_ops md/raid1: check bitmap before behind write md/md-bitmap: handle the case bitmap is not enabled before end_sync() md/md-bitmap: handle the case bitmap is not enabled before start_sync() ...
2025-09-09block: remove the bi_inline_vecs variable sized array from struct bioChristoph Hellwig3-7/+7
Bios are embedded into other structures, and at least spare is unhappy about embedding structures with variable sized arrays. There's no real need to the array anyway, we can replace it with a helper pointing to the memory just behind the bio, and with the previous cleanups there is very few site doing anything special with it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-09block: add a bio_init_inline helperChristoph Hellwig10-13/+11
Just a simpler wrapper around bio_init for callers that want to initialize a bio with inline bvecs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-08dm-integrity: limit MAX_TAG_SIZE to 255Mikulas Patocka1-1/+1
MAX_TAG_SIZE was 0x1a8 and it may be truncated in the "bi->metadata_size = ic->tag_size" assignment. We need to limit it to 255. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-06md/md-llbitmap: introduce new lockless bitmapYu Kuai7-12/+1676
Redundant data is used to enhance data fault tolerance, and the storage method for redundant data vary depending on the RAID levels. And it's important to maintain the consistency of redundant data. Bitmap is used to record which data blocks have been synchronized and which ones need to be resynchronized or recovered. Each bit in the bitmap represents a segment of data in the array. When a bit is set, it indicates that the multiple redundant copies of that data segment may not be consistent. Data synchronization can be performed based on the bitmap after power failure or readding a disk. If there is no bitmap, a full disk synchronization is required. Due to known performance issues with md-bitmap and the unreasonable implementations: - self-managed IO submitting like filemap_write_page(); - global spin_lock I have decided not to continue optimizing based on the current bitmap implementation, this new bitmap is invented without locking from IO fast path and can be used with fast disks. For designs and details, see the comments in drivers/md-llbitmap.c. Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-12-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Li Nan <linan122@huawei.com>
2025-09-06md/md-bitmap: make method bitmap_ops->daemon_work optionalYu Kuai1-1/+1
daemon_work() will be called by daemon thread, on the one hand, daemon thread doesn't have strict wake-up time; on the other hand, too much work are put to daemon thread, like handle sync IO, handle failed or specail normal IO, handle recovery, and so on. Hence daemon thread may be too busy to clear dirty bits in time. Make bitmap_ops->daemon_work() optional and following patches will use separate async work to clear dirty bits for the new bitmap. Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-11-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Li Nan <linan122@huawei.com>
2025-09-06md: add a new recovery_flag MD_RECOVERY_LAZY_RECOVERYu Kuai3-5/+63
This flag is used by llbitmap in later patches to skip raid456 initial recover and delay building initial xor data to first write. https://lore.kernel.org/linux-raid/20250829080426.1441678-10-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com>
2025-09-06md/md-bitmap: add a new method blocks_synced() in bitmap_operationsYu Kuai2-4/+12
Currently, raid456 must perform a whole array initial recovery to build initail xor data, then IO to the array won't have to read all the blocks in underlying disks. This behavior will affect IO performance a lot, and nowadays there are huge disks and the initial recovery can take a long time. Hence llbitmap will support lazy initial recovery in following patches. This method is used to check if data blocks is synced or not, if not then IO will still have to read all blocks for raid456. Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-9-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com>
2025-09-06md/md-bitmap: add a new method skip_sync_blocks() in bitmap_operationsYu Kuai2-0/+8
This method is used to check if blocks can be skipped before calling into pers->sync_request(), llbitmap will use this method to skip resync for unwritten/clean data blocks, and recovery/check/repair for unwritten data blocks; Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-8-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Xiao Ni <xni@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Li Nan <linan122@huawei.com>
2025-09-06md/md-bitmap: delay registration of bitmap_ops until creating bitmapYu Kuai1-37/+53
Currently bitmap_ops is registered while allocating mddev, this is fine when there is only one bitmap_ops. Delay setting bitmap_ops until creating bitmap, so that user can choose which bitmap to use before running the array. Link: https://lore.kernel.org/linux-raid/20250721171557.34587-7-yukuai@kernel.org Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Li Nan <linan122@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/md-bitmap: add a new sysfs api bitmap_typeYu Kuai1-0/+81
The api will be used by mdadm to set bitmap_type while creating new array or assembling array, prepare to add a new bitmap. Currently available options are: cat /sys/block/md0/md/bitmap_type none [bitmap] Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-6-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Xiao Ni <xni@redhat.com> Reviewed-by: Li Nan <linan122@huawei.com>
2025-09-06md: add a new mddev field 'bitmap_id'Yu Kuai2-6/+33
Prepare to store the bitmap id selected by user, also refactor mddev_set_bitmap_ops a bit in case the value is invalid. Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-5-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Li Nan <linan122@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/md-bitmap: support discard for bitmap opsYu Kuai4-8/+23
Use two new methods {start, end}_discard in bitmap_ops and a new field 'rw' in struct md_io_clone to handle discard IO, prepare to support new md bitmap. Since all bitmap functions to hanlde write IO are the same, also add typedef to make code cleaner. Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-4-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Li Nan <linan122@huawei.com>
2025-09-06md: factor out a helper raid_is_456()Yu Kuai2-8/+7
There are no functional changes, the helper will be used by llbitmap in following patches. Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-3-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Li Nan <linan122@huawei.com>
2025-09-06md: add a new parameter 'offset' to md_super_write()Yu Kuai3-24/+36
The parameter is always set to 0 for now, following patches will use this helper to write llbitmap to underlying disks, allow writing dirty sectors instead of the whole page. Also rename md_super_write to md_write_metadata since there is nothing super-block specific. Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-2-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Li Nan <linan122@huawei.com>
2025-09-06md/md-bitmap: introduce CONFIG_MD_BITMAPYu Kuai6-19/+85
Now that all implementations are internal, it's sensible to add a config option for md-bitmap, and it's a good way for isolation. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-16-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md: check before referencing mddev->bitmap_opsYu Kuai1-20/+48
Prepare to introduce CONFIG_MD_BITMAP. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-15-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/dm-raid: check before referencing mddev->bitmap_opsYu Kuai1-7/+11
Prepare to introduce CONFIG_MD_BITMAP. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-14-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/raid5: check before referencing mddev->bitmap_opsYu Kuai1-7/+12
Prepare to introduce CONFIG_MD_BITMAP. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-13-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/raid10: check before referencing mddev->bitmap_opsYu Kuai1-7/+13
Prepare to introduce CONFIG_MD_BITMAP. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-12-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/raid1: check before referencing mddev->bitmap_opsYu Kuai1-9/+13
Prepare to introduce CONFIG_MD_BITMAP. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-11-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/raid1: check bitmap before behind writeYu Kuai2-23/+28
behind write rely on bitmap, because the number of IO are recorded in bitmap->behind_writes, and callers rely on bitmap_wait_behind_writes() to wait for IO to be done. However, currently callers doesn't check if bitmap is enabeld before calling into behind methods. Hence if behind write start without bitmap, readers will not wait for slow write IO to be done and old data can be read in some corner cases. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-10-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>