summaryrefslogtreecommitdiffstats
path: root/fs/bcachefs/btree_trans_commit.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-06-16bcachefs: Kill unused tracepointsKent Overstreet1-2/+3
Dead code cleanup. Link: https://lore.kernel.org/linux-bcachefs/20250612224059.39fddd07@batman.local.home/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15bcachefs: Delay calculation of trans->journal_u64sAlan Huang1-4/+7
When there is commit error that need split btree leaf, fsck might change the value of trans->journal_entries.u64s, when retry commit, the value of trans->journal_u64s would be incorrect, which will lead to trans->journal_res.u64s underflow, and then out of bounds write will occur: [ 464.496970][T11969] Call trace: [ 464.496973][T11969] show_stack+0x3c/0x88 (C) [ 464.496995][T11969] dump_stack_lvl+0xf8/0x178 [ 464.497014][T11969] dump_stack+0x20/0x30 [ 464.497031][T11969] __bch2_trans_log_str+0x344/0x350 [ 464.497048][T11969] bch2_trans_log_str+0x3c/0x60 [ 464.497065][T11969] __bch2_fsck_err+0x11bc/0x1390 [ 464.497083][T11969] bch2_check_discard_freespace_key+0xad4/0x10d0 [ 464.497100][T11969] bch2_bucket_alloc_freelist+0x99c/0x1130 [ 464.497117][T11969] bch2_bucket_alloc_trans+0x79c/0xcb8 [ 464.497133][T11969] bch2_bucket_alloc_set_trans+0x378/0xc20 [ 464.497151][T11969] __open_bucket_add_buckets+0x7fc/0x1c00 [ 464.497168][T11969] open_bucket_add_buckets+0x184/0x3a8 [ 464.497185][T11969] bch2_alloc_sectors_start_trans+0xa04/0x1da0 [ 464.497203][T11969] bch2_btree_reserve_get+0x6e0/0xef0 [ 464.497220][T11969] bch2_btree_update_start+0x1618/0x2600 [ 464.497239][T11969] bch2_btree_split_leaf+0xcc/0x730 [ 464.497258][T11969] bch2_trans_commit_error+0x22c/0xc30 [ 464.497276][T11969] __bch2_trans_commit+0x207c/0x4e30 [ 464.497292][T11969] bch2_journal_replay+0x9e0/0x1420 [ 464.497305][T11969] __bch2_run_recovery_passes+0x458/0xf98 [ 464.497318][T11969] bch2_run_recovery_passes+0x280/0x478 [ 464.497331][T11969] bch2_fs_recovery+0x24f0/0x3a28 [ 464.497344][T11969] bch2_fs_start+0xb80/0x1248 [ 464.497358][T11969] bch2_fs_get_tree+0xe94/0x1708 [ 464.497377][T11969] vfs_get_tree+0x84/0x2d0 Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15bcachefs: Add missing EBUG_ONAlan Huang1-0/+2
Just like the EBUG_ON in bch2_journal_add_entry(). Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02bcachefs: bch_err_throw()Kent Overstreet1-7/+8
Add a tracepoint for any time we return an error and unwind. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30bcachefs: bch2_check_fix_ptrs() can now repair btree rootsKent Overstreet1-4/+17
This is straightforward enough: check_fix_ptrs() currently only runs before we go RW, so updating the btree root pointer in c->btree_roots suffices - it'll be written out in the first journal write we do. For that, do_bch2_trans_commit_to_journal_replay() now handles JSET_ENTRY_btree_root entries. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Split out accounting in transaction commitKent Overstreet1-20/+31
There can be a lot of rendundancy in accounting updates within a single btree transaction. Split out accounting updates so that they can be deduped, in the next commit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: btree_trans_subbufKent Overstreet1-15/+16
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Debug params are now static_keysKent Overstreet1-2/+2
We'd like users to be able to debug without building custom kernels, so this will help us get rid of CONFIG_BCACHEFS_DEBUG, at least for most things. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch_fs.writes -> enumerated_refsKent Overstreet1-2/+3
Drop the single-purpose write ref code in bcachefs.h, and convert to enumarated refs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: replace strncpy() with memcpy_and_pad in journal_transaction_nameRoxana Nicolescu1-1/+3
Strncpy is now deprecated. The buffer destination is not required to be NULL-terminated, but we also want to zero out the rest of the buffer as it is already done in other places. Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Roxana Nicolescu <nicolescu.roxana@protonmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: kmsan assertsKent Overstreet1-0/+1
Catching these early makes them a lot easier to track down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Kill JOURNAL_ERRORS()Kent Overstreet1-14/+16
Convert these to standard error codes, which means we can pass them outside the journal code, they're easier to pass to tracepoints, etc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: rework bch2_trans_commit_run_triggers()Kent Overstreet1-56/+33
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-12bcachefs: CONFIG_BCACHEFS_INJECT_TRANSACTION_RESTARTSKent Overstreet1-0/+4
Incorrectly handled transaction restarts can be a source of heisenbugs; add a mode where we randomly inject them to shake them out. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21bcachefs: "Journal stuck" timeout now takes into account device latencyKent Overstreet1-1/+1
If a block device (e.g. your typical consumer SSD) is taking multiple seconds for IOs (typically flushes), we don't want to emit the "journal stuck" message prematurely. Also, make sure to drop the btree_trans srcu lock if we're blocking for more than a second. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_trans_unlock_write()Kent Overstreet1-3/+3
New helper for dropping all write locks; which is distinct from the helper the transaction commit path uses, which is faster and only touches updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_btree_node_write_trans()Kent Overstreet1-1/+1
Avoiding screwing up path->lock_seq. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Don't run overwrite triggers before insertKent Overstreet1-44/+37
This breaks when the trigger is inserting updates for the same btree, as the inode trigger now does. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Plumb bkey_validate_context to journal_entry_validateKent Overstreet1-26/+18
This lets us print the exact location in the journal if it was found in the journal, or correctly print if it was found in the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: struct bkey_validate_contextKent Overstreet1-1/+6
Add a new parameter to bkey validate functions, and use it to improve invalid bkey error messages: we can now print the btree and depth it came from, or if it came from the journal, or is a btree root. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bch2_trans_verify_not_unlocked_or_in_restart()Kent Overstreet1-6/+3
Fold two asserts into one. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Kill BCH_TRANS_COMMIT_lazy_rwKent Overstreet1-26/+5
We unconditionally go read-write, if we're going to do so, before journal replay: lazy_rw is obsolete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Add assert for use of journal replay keys for updatesKent Overstreet1-0/+2
The journal replay keys mechanism can only be used for updates in early recovery, when still single threaded. Add some asserts to make sure we never accidentally use it elsewhere. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: kill btree_trans_restart_nounlock()Kent Overstreet1-1/+1
Redundant, the normal btree_trans_restart() doesn't unlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Pull disk accounting hooks out of trans_commit.cKent Overstreet1-29/+6
Also, fix a minor bug in the revert path, where we weren't checking the journal entry type correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-02bcachefs: Fix trans_commit disk accounting revertKent Overstreet1-1/+2
We only are applying JSET_ENTRY_TYPE_write_buffer_keys, revert path was missed. Fixes: a3581ca35d2b ("bcachefs: Fix BCH_TRANS_COMMIT_skip_accounting_apply") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Fix BCH_TRANS_COMMIT_skip_accounting_applyKent Overstreet1-16/+20
This was added to avoid double-counting accounting keys in journal replay. But applied incorrectly (easily done since it applies to the transaction commit, not a particular update), it leads to skipping in-mem accounting for real accounting updates, and failure to give them a version number - which leads to journal replay becoming very confused the next time around. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: rename version -> bversionKent Overstreet1-4/+4
give bversions a more distinct name, to aid in grepping Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Move transaction commit path validation to as late as possibleKent Overstreet1-34/+34
In order to check for accounting keys with version=0, we need to run validation after they've been assigned version numbers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: bch_accounting_modeKent Overstreet1-2/+2
Minor refactoring - replace multiple bool arguments with an enum; prep work for fixing a bug in accounting read. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Remove unused parameterAlan Huang1-1/+1
iter here is unused, remove it. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13bcachefs: Kill __bch2_accounting_mem_mod()Kent Overstreet1-2/+2
The next patch will be adding a disk accounting counter type which is not kept in the in-memory eytzinger tree. As prep, fold __bch2_accounting_mem_mod() into bch2_accounting_mem_mod_locked() so that we can check for that counter type and bail out without calling bpos_to_disk_accounting_pos() twice. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()Kent Overstreet1-58/+14
bkey_fsck_err() was added as an interface that looks like fsck_err(), but previously all it did was ensure that the appropriate error counter was incremented in the superblock. This is a cleanup and bugfix patch that converts it to a wrapper around fsck_err(). This is needed to fix an issue with the upgrade path to disk_accounting_v3, where the "silent fix" error list now includes bkey_fsck errors; fsck_err() handles this in a unified way, and since we need to change printing of bkey fsck errors from the caller to the inner bkey_fsck_err() calls, this ends up being a pretty big change. Als,, rename .invalid() methods to .validate(), for clarity, while we're changing the function signature anyways (to drop the printbuf argument). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13bcachefs: Add a time_stat for blocked on key cache flushKent Overstreet1-0/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13bcachefs: Add hysteresis to waiting on btree key cache flushKent Overstreet1-1/+1
This helps ensure key cache reclaim isn't contending with threads waiting for the key cache to be helped, and fixes a severe performance bug. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: bch2_btree_key_cache_drop() now evictsKent Overstreet1-6/+5
As part of improving btree key cache coherency, the bkey_cached.valid flag is going away. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Kill gc_pos_btree_node()Kent Overstreet1-1/+1
gc_pos is now based on keys, not nodes, for invariantness w.r.t. splits and merges Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: btree_types bitmask cleanupsKent Overstreet1-28/+22
Make things more consistent and ensure that we're using u64 bitfields - key types and btree ids are already around 32 bits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Delete old assertion for online fsckKent Overstreet1-8/+1
the order in which btree_gc walks keys have changed, so we no longer have the sort of issues with online fsck this assertion was warning about. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Convert gc to new accountingKent Overstreet1-2/+2
Rewrite fsck/gc for the new accounting scheme. This adds a second set of in-memory accounting counters for gc to use; like with other parts of gc we run all trigger in TRIGGER_GC mode, then compare what we calculated to existing in-memory accounting at the end. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Disk space accounting rewriteKent Overstreet1-15/+50
Main part of the disk accounting rewrite. This is a wholesale rewrite of the existing disk space accounting, which relies on percepu counters that are sharded by journal buffer, and rolled up and added to each journal write. With the new scheme, every set of counters is a distinct key in the accounting btree; this fixes scaling limitations of the old scheme, where counters took up space in each journal entry and required multiple percpu counters. Now, in memory accounting requires a single set of percpu counters - not multiple for each in flight journal buffer - and in the future we'll probably also have counters that don't use in memory percpu counters, they're not strictly required. An accounting update is now a normal btree update, using the btree write buffer path. At transaction commit time, we apply accounting updates to the in memory counters, which are percpu counters indexed in an eytzinger tree by the accounting key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Accumulate accounting keys in journal replayKent Overstreet1-5/+21
Until accounting keys hit the btree, they are deltas, not new versions of the existing key; this means we have to teach journal replay to accumulate them. Additionally, the journal doesn't track precisely which entries have been flushed to the btree; it only tracks a range of entries that may possibly still need to be flushed. That means we need to compare accounting keys against the version in the btree and only flush updates that are newer. There's another wrinkle with the write buffer: if the write buffer starts flushing accounting keys before journal replay has finished flushing accounting keys, journal replay will see the version number from the new updates and updates from the journal will be lost. To avoid this, journal replay has to flush accounting keys first, and we'll be adding a flag so that write buffer flush knows to hold accounting keys until then. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Use try_cmpxchg() family of functions instead of cmpxchg()Uros Bizjak1-4/+4
Use try_cmpxchg() family of functions instead of cmpxchg (*ptr, old, new) == old. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: s/bkey_invalid_flags/bch_validate_flagsKent Overstreet1-5/+5
We're about to start using bch_validate_flags for superblock section validation - it's no longer bkey specific. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: x-macroize journal flags enumsKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_trans_verify_not_unlocked()Kent Overstreet1-0/+7
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_trans_commit_flags_to_text()Kent Overstreet1-0/+21
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: iter/update/trigger/str_hash flag cleanupKent Overstreet1-12/+12
Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: prt_printf() now respects \r\n\tKent Overstreet1-4/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Fix deadlock in journal replayKent Overstreet1-3/+4
btree_key_can_insert_cached() should be checking the watermark - BCH_TRANS_COMMIT_journal_replay really means nonblocking mode when watermark < reclaim, it was being used incorrectly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>