aboutsummaryrefslogtreecommitdiffstats
path: root/refs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-01-15reftable: write correct max_update_index to headerKarthik Nayak2-10/+11
In 297c09eabb (refs: allow multiple reflog entries for the same refname, 2024-12-16), the reftable backend learned to handle multiple reflog entries within the same transaction. This was done modifying the `update_index` for reflogs with multiple indices. During writing the logs, the `max_update_index` of the writer was modified to ensure the limits were raised to the modified `update_index`s. However, since ref entries are written before the modification to the `max_update_index`, if there are multiple blocks to be written, the reftable backend writes the header with the old `max_update_index`. When all logs are finally written, the footer will be written with the new `min_update_index`. This causes a mismatch between the header and the footer and causes the reftable file to be corrupted. The existing tests only spawn a single block and since headers are lazily written with the first block, the tests didn't capture this bug. To fix the issue, the appropriate `max_update_index` limit must be set even before the first block is written. Add a `max_index` field to the transaction which holds the `max_index` within all its updates, then propagate this value to the reftable backend, wherein this is used to the set the `max_update_index` correctly. Add a test which creates a few thousand reference updates with multiple reflog entries, which should trigger the bug. Reported-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-23Merge branch 'kn/reflog-migration'Junio C Hamano3-53/+140
"git refs migrate" learned to also migrate the reflog data across backends. * kn/reflog-migration: refs: mark invalid refname message for translation refs: add support for migrating reflogs refs: allow multiple reflog entries for the same refname refs: introduce the `ref_transaction_update_reflog` function refs: add `committer_info` to `ref_transaction_add_update()` refs: extract out refname verification in transactions refs/files: add count field to ref_lock refs: add `index` field to `struct ref_udpate` refs: include committer info in `ref_update` struct
2024-12-23Merge branch 'ps/build-sign-compare'Junio C Hamano4-2/+5
Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-19Merge branch 'bf/set-head-symref'Junio C Hamano3-15/+33
When "git fetch $remote" notices that refs/remotes/$remote/HEAD is missing and discovers what branch the other side points with its HEAD, refs/remotes/$remote/HEAD is updated to point to it. * bf/set-head-symref: fetch set_head: handle mirrored bare repositories fetch: set remote/HEAD if it does not exist refs: add create_only option to refs_update_symref_extended refs: add TRANSACTION_CREATE_EXISTS error remote set-head: better output for --auto remote set-head: refactor for readability refs: atomically record overwritten ref in update_symref refs: standardize output of refs_read_symbolic_ref t/t5505-remote: test failure of set-head t/t5505-remote: set default branch to main
2024-12-16refs: allow multiple reflog entries for the same refnameKarthik Nayak2-7/+30
The reference transaction only allows a single update for a given reference to avoid conflicts. This, however, isn't an issue for reflogs. There are no conflicts to be resolved in reflogs and when migrating reflogs between backends we'd have multiple reflog entries for the same refname. So allow multiple reflog updates within a single transaction. Also the reflog creation logic isn't exposed to the end user. While this might change in the future, currently, this reduces the scope of issues to think about. In the reftable backend, the writer sorts all updates based on the update_index before writing to the block. When there are multiple reflogs for a given refname, it is essential that the order of the reflogs is maintained. So add the `index` value to the `update_index`. The `index` field is only set when multiple reflog entries for a given refname are added and as such in most scenarios the old behavior remains. This is required to add reflog migration support to `git refs migrate`. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-16refs: introduce the `ref_transaction_update_reflog` functionKarthik Nayak1-8/+16
Introduce a new function `ref_transaction_update_reflog`, for clients to add a reflog update to a transaction. While the existing function `ref_transaction_update` also allows clients to add a reflog entry, this function does a few things more, It: - Enforces that only a reflog entry is added and does not update the ref itself. - Allows the users to also provide the committer information. This means clients can add reflog entries with custom committer information. The `transaction_refname_valid()` function also modifies the error message selectively based on the type of the update. This change also affects reflog updates which go through `ref_transaction_update()`. A follow up commit will utilize this function to add reflog support to `git refs migrate`. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-16refs: add `committer_info` to `ref_transaction_add_update()`Karthik Nayak3-8/+13
The `ref_transaction_add_update()` creates the `ref_update` struct. To facilitate addition of reflogs in the next commit, the function needs to accommodate setting the `committer_info` field in the struct. So modify the function to also take `committer_info` as an argument and set it accordingly. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-16refs/files: add count field to ref_lockKarthik Nayak1-19/+39
When refs are updated in the files-backend, a lock is obtained for the corresponding file path. This is the case even for reflogs, i.e. a lock is obtained on the reference path instead of the reflog path. This works, since generally, reflogs are updated alongside the ref. The upcoming patches will add support for reflog updates in ref transaction. This means, in a particular transaction we want to have ref updates and reflog updates. For a given ref in a given transaction there can be at most one update. But we can theoretically have multiple reflog updates for a given ref in a given transaction. A great example of this would be when migrating reflogs from one backend to another. There we would batch all the reflog updates for a given reference in a single transaction. The current flow does not support this, because currently refs & reflogs are treated as a single entity and capture the lock together. To separate this, add a count field to ref_lock. With this, multiple updates can hold onto a single ref_lock and the lock will only be released when all of them release the lock. This patch only adds the `count` field to `ref_lock` and adds the logic to increment and decrement the lock. In a follow up commit, we'll separate the reflog update logic from ref updates and utilize this functionality. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-16refs: add `index` field to `struct ref_udpate`Karthik Nayak2-2/+18
The reftable backend, sorts its updates by refname before applying them, this ensures that the references are stored sorted. When migrating reflogs from one backend to another, the order of the reflogs must be maintained. Add a new `index` field to the `ref_update` struct to facilitate this. This field is used in the reftable backend's sort comparison function `transaction_update_cmp`, to ensure that indexed fields maintain their order. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-16refs: include committer info in `ref_update` structKarthik Nayak3-11/+26
The reference backends obtain the committer information from `git_committer_info(0)` when adding a reflog. The upcoming patches introduce support for migrating reflogs between the reference backends. This requires an interface to creating reflogs, including custom committer information. Add a new field `committer_info` to the `ref_update` struct, which is then used by the reference backends. If there is no `committer_info` provided, the reference backends default to using `git_committer_info(0)`. The field itself cannot be set to `git_committer_info(0)` since the values are dynamic and must be obtained right when the reflog is being committed. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-10Merge branch 'ps/reftable-iterator-reuse'Junio C Hamano1-148/+261
Optimize reading random references out of the reftable backend by allowing reuse of iterator objects. * ps/reftable-iterator-reuse: refs/reftable: reuse iterators when reading refs reftable/merged: drain priority queue on reseek reftable/stack: add mechanism to notify callers on reload refs/reftable: refactor reflog expiry to use reftable backend refs/reftable: refactor reading symbolic refs to use reftable backend refs/reftable: read references via `struct reftable_backend` refs/reftable: figure out hash via `reftable_stack` reftable/stack: add accessor for the hash ID refs/reftable: handle reloading stacks in the reftable backend refs/reftable: encapsulate reftable stack
2024-12-10Merge branch 'ps/reftable-detach'Junio C Hamano1-1/+18
Isolates the reftable subsystem from the rest of Git's codebase by using fewer pieces of Git's infrastructure. * ps/reftable-detach: reftable/system: provide thin wrapper for lockfile subsystem reftable/stack: drop only use of `get_locked_file_path()` reftable/system: provide thin wrapper for tempfile subsystem reftable/stack: stop using `fsync_component()` directly reftable/system: stop depending on "hash.h" reftable: explicitly handle hash format IDs reftable/system: move "dir.h" to its only user
2024-12-06global: trivial conversions to fix `-Wsign-compare` warningsPatrick Steinhardt1-4/+1
We have a bunch of loops which iterate up to an unsigned boundary using a signed index, which generates warnigs because we compare a signed and unsigned value in the loop condition. Address these sites for trivial cases and enable `-Wsign-compare` warnings for these code units. This patch only adapts those code units where we can drop the `DISABLE_SIGN_COMPARE_WARNINGS` macro in the same step. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt4-0/+6
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06Merge branch 'sj/refs-symref-referent-fix'Junio C Hamano1-1/+2
A double-free that may not trigger in practice by luck has been corrected in the reference resolution code. * sj/refs-symref-referent-fix: ref-cache: fix invalid free operation in `free_ref_entry`
2024-12-04Merge branch 'sj/ref-contents-check'Junio C Hamano5-22/+193
"git fsck" learned to issue warnings on "curiously formatted" ref contents that have always been taken valid but something Git wouldn't have written itself (e.g., missing terminating end-of-line after the full object name). * sj/ref-contents-check: ref: add symlink ref content check for files backend ref: check whether the target of the symref is a ref ref: add basic symref content check for files backend ref: add more strict checks for regular refs ref: port git-fsck(1) regular refs check for files backend ref: support multiple worktrees check for refs ref: initialize ref name outside of check functions ref: check the full refname instead of basename ref: initialize "fsck_ref_report" with zero
2024-11-27ref-cache: fix invalid free operation in `free_ref_entry`shejialuo1-1/+2
In cfd971520e (refs: keep track of unresolved reference value in iterators, 2024-08-09), we added a new field "referent" into the "struct ref" structure. In order to free the "referent", we unconditionally freed the "referent" by simply adding a "free" statement. However, this is a bad usage. Because when ref entry is either directory or loose ref, we will always execute the following statement: free(entry->u.value.referent); This does not make sense. We should never access the "entry->u.value" field when "entry" is a directory. However, the change obviously doesn't break the tests. Let's analysis why. The anonymous union in the "ref_entry" has two members: one is "struct ref_value", another is "struct ref_dir". On a 64-bit machine, the size of "struct ref_dir" is 32 bytes, which is smaller than the 48-byte size of "struct ref_value". And the offset of "referent" field in "struct ref_value" is 40 bytes. So, whenever we create a new "ref_entry" for a directory, we will leave the offset from 40 bytes to 48 bytes untouched, which means the value for this memory is zero (NULL). It's OK to free a NULL pointer, but this is merely a coincidence of memory layout. To fix this issue, we now ensure that "free(entry->u.value.referent)" is only called when "entry->flag" indicates that it represents a loose reference and not a directory to avoid the invalid memory operation. Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26refs/reftable: reuse iterators when reading refsPatrick Steinhardt1-3/+29
When reading references the reftable backend has to: 1. Create a new ref iterator. 2. Seek the iterator to the record we're searching for. 3. Read the record. We cannot really avoid the last two steps, but re-creating the iterator every single time we want to read a reference is kind of expensive and a waste of resources. We couldn't help it in the past though because it was not possible to reuse iterators. But starting with 5bf96e0c39 (reftable/generic: move seeking of records into the iterator, 2024-05-13) we have split up the iterator lifecycle such that creating the iterator and seeking are two different concerns. Refactor the code such that we cache iterators in the reftable backend. This cache is invalidated whenever the respective stack is reloaded such that we know to recreate the iterator in that case. This leads to a sizeable speedup when creating many refs, which requires a lot of random reference reads: Benchmark 1: update-ref: create many refs (refcount = 100000, revision = master) Time (mean ± σ): 1.793 s ± 0.010 s [User: 0.954 s, System: 0.835 s] Range (min … max): 1.781 s … 1.811 s 10 runs Benchmark 2: update-ref: create many refs (refcount = 100000, revision = HEAD) Time (mean ± σ): 1.680 s ± 0.013 s [User: 0.846 s, System: 0.831 s] Range (min … max): 1.664 s … 1.702 s 10 runs Summary update-ref: create many refs (refcount = 100000, revision = HEAD) ran 1.07 ± 0.01 times faster than update-ref: create many refs (refcount = 100000, revision = master) While 7% is not a huge win, you have to consider that the benchmark is _writing_ data, so _reading_ references is only one part of what we do. Flame graphs show that we spend around 40% of our time reading refs, so the speedup when reading refs is approximately ~2.5x that. I could not find better benchmarks where we perform a lot of random ref reads. You can also see a sizeable impact on memory usage when creating 100k references. Before this change: HEAP SUMMARY: in use at exit: 19,112,538 bytes in 200,170 blocks total heap usage: 8,400,426 allocs, 8,200,256 frees, 454,367,048 bytes allocated After this change: HEAP SUMMARY: in use at exit: 674,416 bytes in 169 blocks total heap usage: 7,929,872 allocs, 7,929,703 frees, 281,509,985 bytes allocated As an additional factor, this refactoring opens up the possibility for more performance optimizations in how we re-seek iterators. Any change that allows us to optimize re-seeking by e.g. reusing data structures would thus also directly speed up random reads. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26refs/reftable: refactor reflog expiry to use reftable backendPatrick Steinhardt1-8/+5
Refactor the callback function that expires reflog entries in the reftable backend to use `reftable_backend_read_ref()` instead of accessing the reftable stack directly. This ensures that the function will benefit from the new caching layer that we're about to introduce. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26refs/reftable: refactor reading symbolic refs to use reftable backendPatrick Steinhardt1-7/+4
Refactor the callback function that reads symbolic references in the reftable backend to use `reftable_backend_read_ref()` instead of accessing the reftable stack directly. This ensures that the function will benefit from the new caching layer that we're about to introduce. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26refs/reftable: read references via `struct reftable_backend`Patrick Steinhardt1-63/+59
Refactor `read_ref_without_reload()` to accept `struct reftable_backend` as parameter instead of `struct reftable_stack`. Rename the function to `reftable_backend_read_ref()` to clarify its scope and move it close to other functions operating on `struct reftable_backend`. This change allows us to implement an additional caching layer when reading refs where we can reuse reftable iterators. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26refs/reftable: figure out hash via `reftable_stack`Patrick Steinhardt1-7/+19
The function `read_ref_without_reload()` accepts a ref store as input only so that we can figure out the hash function used by it. This is duplicate information though because the reftable stack knows about its hash function, too. Drop the superfluous parameter to simplify the calling convention a bit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26refs/reftable: handle reloading stacks in the reftable backendPatrick Steinhardt1-58/+126
When accessing a stack we almost always have to reload the stack before reading data from it. This is mostly because Git does not have a notification mechanism for when underlying data has been changed, and thus we are forced to opportunistically reload the stack every single time to account for any changes that may have happened concurrently. Handle the reload internally in `backend_for()`. For one this forces callsites to think about whether or not they need to reload the stack. But second this makes the logic to access stacks more self-contained by letting the `struct reftable_backend` manage themselves. Update callsites where we don't reload the stack to document why we don't. In some cases it's unclear whether it is the right thing to do in the first place, but fixing that is outside of the scope of this patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26refs/reftable: encapsulate reftable stackPatrick Steinhardt1-59/+76
The reftable ref store needs to keep track of multiple stacks, one for the main worktree and an arbitrary number of stacks for worktrees. This is done by storing pointers to `struct reftable_stack`, which we then access directly. Wrap the stack in a new `struct reftable_backend`. This will allow us to attach more data to each respective stack in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-25refs: add TRANSACTION_CREATE_EXISTS errorBence Ferdinandy2-10/+20
Currently there is only one special error for transaction, for when there is a naming conflict, all other errors are dumped under a generic error. Add a new special error case for when the caller requests the reference to be updated only when it does not yet exist and the reference actually does exist. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-25refs: standardize output of refs_read_symbolic_refBence Ferdinandy3-6/+12
When the symbolic reference we want to read with refs_read_symbolic_ref is actually not a symbolic reference, the files and the reftable backends return different values (1 and -1 respectively). Standardize the returned values so that 0 is success, -1 is a generic error and -2 is that the reference was actually non-symbolic. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: add symlink ref content check for files backendshejialuo1-4/+34
Besides the textual symref, we also allow symbolic links as the symref. So, we should also provide the consistency check as what we have done for textual symref. And also we consider deprecating writing the symbolic links. We first need to access whether symbolic links still be used. So, add a new fsck message "symlinkRef(INFO)" to tell the user be aware of this information. We have already introduced "files_fsck_symref_target". We should reuse this function to handle the symrefs which use legacy symbolic links. We should not check the trailing garbage for symbolic refs. Add a new parameter "symbolic_link" to disable some checks which should only be executed for textual symrefs. And we need to also generate the "referent" parameter for reusing "files_fsck_symref_target" by the following steps: 1. Use "strbuf_add_real_path" to resolve the symlink and get the absolute path "ref_content" which the symlink ref points to. 2. Generate the absolute path "abs_gitdir" of "gitdir" and combine "ref_content" and "abs_gitdir" to extract the relative path "relative_referent_path". 3. If "ref_content" is outside of "gitdir", we just set "referent" with "ref_content". Instead, we set "referent" with "relative_referent_path". Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: check whether the target of the symref is a refshejialuo1-2/+12
Ideally, we want to the users use "git symbolic-ref" to create symrefs instead of writing raw contents into the filesystem. However, "git symbolic-ref" is strict with the refname but not strict with the referent. For example, we can make the "referent" located at the "$(gitdir)/logs/aaa" and manually write the content into this where we can still successfully parse this symref by using "git rev-parse". $ git init repo && cd repo && git commit --allow-empty -mx $ git symbolic-ref refs/heads/test logs/aaa $ echo $(git rev-parse HEAD) > .git/logs/aaa $ git rev-parse test We may need to add some restrictions for "referent" parameter when using "git symbolic-ref" to create symrefs because ideally all the nonpseudo-refs should be located under the "refs" directory and we may tighten this in the future. In order to tell the user we may tighten the above situation, create a new fsck message "symrefTargetIsNotARef" to notify the user that this may become an error in the future. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: add basic symref content check for files backendshejialuo1-0/+40
We have code that checks regular ref contents, but we do not yet check the contents of symbolic refs. By using "parse_loose_ref_content" for symbolic refs, we will get the information of the "referent". We do not need to check the "referent" by opening the file. This is because if "referent" exists in the file system, we will eventually check its correctness by inspecting every file in the "refs" directory. If the "referent" does not exist in the filesystem, this is OK as it is seen as the dangling symref. So we just need to check the "referent" string content. A regular ref could be accepted as a textual symref if it begins with "ref:", followed by zero or more whitespaces, followed by the full refname, followed only by whitespace characters. However, we always write a single SP after "ref:" and a single LF after the refname. It may seem that we should report a fsck error message when the "referent" does not apply above rules and we should not be so aggressive because third-party reimplementations of Git may have taken advantage of the looser syntax. Put it more specific, we accept the following contents: 1. "ref: refs/heads/master " 2. "ref: refs/heads/master \n \n" 3. "ref: refs/heads/master\n\n" When introducing the regular ref content checks, we created two fsck infos "refMissingNewline" and "trailingRefContent" which exactly represents above situations. So we will reuse these two fsck messages to write checks to info the user about these situations. But we do not allow any other trailing garbage. The followings are bad symref contents which will be reported as fsck error by "git-fsck(1)". 1. "ref: refs/heads/master garbage\n" 2. "ref: refs/heads/master \n\n\n garbage " And we introduce a new "badReferentName(ERROR)" fsck message to report above errors by using "is_root_ref" and "check_refname_format" to check the "referent". Since both "is_root_ref" and "check_refname_format" don't work with whitespaces, we use the trimmed version of "referent" with these functions. In order to add checks, we will do the following things: 1. Record the untrimmed length "orig_len" and untrimmed last byte "orig_last_byte". 2. Use "strbuf_rtrim" to trim the whitespaces or newlines to make sure "is_root_ref" and "check_refname_format" won't be failed by them. 3. Use "orig_len" and "orig_last_byte" to check whether the "referent" misses '\n' at the end or it has trailing whitespaces or newlines. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: add more strict checks for regular refsshejialuo2-4/+24
We have already used "parse_loose_ref_contents" function to check whether the ref content is valid in files backend. However, by using "parse_loose_ref_contents", we allow the ref's content to end with garbage or without a newline. Even though we never create such loose refs ourselves, we have accepted such loose refs. So, it is entirely possible that some third-party tools may rely on such loose refs being valid. We should not report an error fsck message at current. We should notify the users about such "curiously formatted" loose refs so that adequate care is taken before we decide to tighten the rules in the future. And it's not suitable either to report a warn fsck message to the user. We don't yet want the "--strict" flag that controls this bit to end up generating errors for such weirdly-formatted reference contents, as we first want to assess whether this retroactive tightening will cause issues for any tools out there. It may cause compatibility issues which may break the repository. So, we add the following two fsck infos to represent the situation where the ref content ends without newline or has trailing garbages: 1. refMissingNewline(INFO): A loose ref that does not end with newline(LF). 2. trailingRefContent(INFO): A loose ref has trailing content. It might appear that we can't provide the user with any warnings by using FSCK_INFO. However, in "fsck.c::fsck_vreport", we will convert FSCK_INFO to FSCK_WARN and we can still warn the user about these situations when using "git refs verify" without introducing compatibility issues. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: port git-fsck(1) regular refs check for files backendshejialuo1-0/+47
"git-fsck(1)" implicitly checks the ref content by passing the callback "fsck_handle_ref" to the "refs.c::refs_for_each_rawref". Then, it will check whether the ref content (eventually "oid") is valid. If not, it will report the following error to the user. error: refs/heads/main: invalid sha1 pointer 0000... And it will also report above errors when there are dangling symrefs in the repository wrongly. This does not align with the behavior of the "git symbolic-ref" command which allows users to create dangling symrefs. As we have already introduced the "git refs verify" command, we'd better check the ref content explicitly in the "git refs verify" command thus later we could remove these checks in "git-fsck(1)" and launch a subprocess to call "git refs verify" in "git-fsck(1)" to make the "git-fsck(1)" more clean. Following what "git-fsck(1)" does, add a similar check to "git refs verify". Then add a new fsck error message "badRefContent(ERROR)" to represent that a ref has an invalid content. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: support multiple worktrees check for refsshejialuo5-10/+26
We have already set up the infrastructure to check the consistency for refs, but we do not support multiple worktrees. However, "git-fsck(1)" will check the refs of worktrees. As we decide to get feature parity with "git-fsck(1)", we need to set up support for multiple worktrees. Because each worktree has its own specific refs, instead of just showing the users "refs/worktree/foo", we need to display the full name such as "worktrees/<id>/refs/worktree/foo". So we should know the id of the worktree to get the full name. Add a new parameter "struct worktree *" for "refs-internal.h::fsck_fn". Then change the related functions to follow this new interface. The "packed-refs" only exists in the main worktree, so we should only check "packed-refs" in the main worktree. Use "is_main_worktree" method to skip checking "packed-refs" in "packed_fsck" function. Then, enhance the "files-backend.c::files_fsck_refs_dir" function to add "worktree/<id>/" prefix when we are not in the main worktree. Last, add a new test to check the refname when there are multiple worktrees to exercise the code. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: initialize ref name outside of check functionsshejialuo1-8/+13
We passes "refs_check_dir" to the "files_fsck_refs_name" function which allows it to create the checked ref name later. However, when we introduce a new check function, we have to allocate redundant memory and re-calculate the ref name. It's bad for us to allocate redundant memory and duplicate logic. Instead, we should allocate and calculate it only once and pass the ref name to the check functions. In order not to do repeat calculation, rename "refs_check_dir" to "refname". And in "files_fsck_refs_dir", create a new strbuf "refname", thus whenever we handle a new ref, calculate the name and call the check functions one by one. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: check the full refname instead of basenameshejialuo1-2/+5
In "files-backend.c::files_fsck_refs_name", we validate the refname format by using "check_refname_format" to check the basename of the iterator with "REFNAME_ALLOW_ONELEVEL" flag. However, this is a bad implementation. Although we doesn't allow a single "@" in ".git" directory, we do allow "refs/heads/@". So, we will report an error wrongly when there is a "refs/heads/@" ref by using one level refname "@". Because we just check one level refname, we either cannot check the other parts of the full refname. And we will ignore the following errors: "refs/heads/ new-feature/test" "refs/heads/~new-feature/test" In order to fix the above problem, enhance "files_fsck_refs_name" to use the full name for "check_refname_format". Then, replace the tests which are related to "@" and add tests to exercise the above situations using for loop to avoid repetition. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: initialize "fsck_ref_report" with zeroshejialuo1-1/+1
In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and "referent" is NULL. So, we need to always initialize these parameters to NULL instead of letting them point to anywhere when creating a new "fsck_ref_report" structure. The original code explicitly initializes the "path" member in the "struct fsck_ref_report" to NULL (which implicitly 0-initializes other members in the struct). It is more customary to use "{ 0 }" to express that we are 0-initializing everything. In order to align with the codebase, initialize "fsck_ref_report" with zero. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21refs: skip collision checks in initial transactionsPatrick Steinhardt2-9/+10
Reference transactions use `refs_verify_refname_available()` to check for colliding references. This check consists of two parts: - Checks for whether multiple ref updates in the same transaction conflict with each other. - Checks for whether existing refs conflict with any refs part of the transaction. While we generally cannot avoid the first check, the second check is superfluous in cases where the transaction is an initial one in an otherwise empty ref store. The check results in multiple ref reads as well as the creation of a ref iterator for every ref we're checking, which adds up quite fast when performing the check for many refs. Introduce a new flag that allows us to skip this check and wire it up in such that the backends pass it when running an initial transaction. This leads to significant speedups when migrating ref storage backends. From "files" to "reftable": Benchmark 1: migrate files:reftable (refcount = 100000, revision = HEAD~) Time (mean ± σ): 472.4 ms ± 6.7 ms [User: 175.9 ms, System: 285.2 ms] Range (min … max): 463.5 ms … 483.2 ms 10 runs Benchmark 2: migrate files:reftable (refcount = 100000, revision = HEAD) Time (mean ± σ): 86.1 ms ± 1.9 ms [User: 67.9 ms, System: 16.0 ms] Range (min … max): 82.9 ms … 90.9 ms 29 runs Summary migrate files:reftable (refcount = 100000, revision = HEAD) ran 5.48 ± 0.15 times faster than migrate files:reftable (refcount = 100000, revision = HEAD~) And from "reftable" to "files": Benchmark 1: migrate reftable:files (refcount = 100000, revision = HEAD~) Time (mean ± σ): 452.7 ms ± 3.4 ms [User: 209.9 ms, System: 235.4 ms] Range (min … max): 445.9 ms … 457.5 ms 10 runs Benchmark 2: migrate reftable:files (refcount = 100000, revision = HEAD) Time (mean ± σ): 95.2 ms ± 2.2 ms [User: 73.6 ms, System: 20.6 ms] Range (min … max): 91.7 ms … 100.8 ms 28 runs Summary migrate reftable:files (refcount = 100000, revision = HEAD) ran 4.76 ± 0.11 times faster than migrate reftable:files (refcount = 100000, revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21refs/files: support symbolic and root refs in initial transactionPatrick Steinhardt1-10/+34
The "files" backend has implemented special logic when committing the first transactions in an otherwise empty ref store: instead of writing all refs as separate loose files, it instead knows to write them all into a "packed-refs" file directly. This is significantly more efficient than having to write each of the refs as separate "loose" ref. The only user of this optimization is git-clone(1), which only uses this mechanism to write regular refs. Consequently, the implementation does not know how to handle both symbolic and root refs. While fine in the context of git-clone(1), this keeps us from using the mechanism in more cases. Adapt the logic to also support symbolic and root refs by using a second transaction that we use for all of the refs that need to be written as loose refs. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21refs: introduce "initial" transaction flagPatrick Steinhardt5-38/+8
There are two different ways to commit a transaction: - `ref_transaction_commit()` can be used to commit a regular transaction and is what almost every caller wants. - `initial_ref_transaction_commit()` can be used when it is known that the ref store that the transaction is committed for is empty and when there are no concurrent processes. This is used when cloning a new repository. Implementing this via two separate functions has a couple of downsides. First, every reference backend needs to implement a separate callback even in the case where they don't special-case the initial transaction. Second, backends are basically forced to reimplement the whole logic for how to commit the transaction like the "files" backend does, even though backends may wish to only tweak certain behaviour of a "normal" commit. Third, it is awkward that callers must never prepare the transaction as this is somewhat different than how a transaction typically works. Refactor the code such that we instead mark initial transactions via a separate flag when starting the transaction. This addresses all of the mentioned painpoints, where the most important part is that it will allow backends to have way more leeway in how exactly they want to handle the initial transaction. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21refs/files: move logic to commit initial transactionPatrick Steinhardt1-101/+101
Move the logic to commit initial transactions such that we can start to call it in `files_transaction_finish()` in a subsequent commit without requiring a separate function declaration. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21refs: allow passing flags when setting up a transactionPatrick Steinhardt2-4/+8
Allow passing flags when setting up a transaction such that the behaviour of the transaction itself can be altered. This functionality will be used in a subsequent patch. Adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19Merge branch 'ps/reftable-detach' into ps/reftable-iterator-reuseJunio C Hamano1-1/+18
* ps/reftable-detach: reftable/system: provide thin wrapper for lockfile subsystem reftable/stack: drop only use of `get_locked_file_path()` reftable/system: provide thin wrapper for tempfile subsystem reftable/stack: stop using `fsync_component()` directly reftable/system: stop depending on "hash.h" reftable: explicitly handle hash format IDs reftable/system: move "dir.h" to its only user
2024-11-19reftable/stack: stop using `fsync_component()` directlyPatrick Steinhardt1-0/+7
We're executing `fsync_component()` directly in the reftable library so that we can fsync data to disk depending on "core.fsync". But as we're in the process of converting the reftable library to become standalone we cannot use that function in the library anymore. Refactor the code such that users of the library can inject a custom fsync function via the write options. This allows us to get rid of the dependency on "write-or-die.h". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19reftable/system: stop depending on "hash.h"Patrick Steinhardt1-1/+11
We include "hash.h" in "reftable/system.h" such that we can use hash format IDs as well as the raw size of SHA1 and SHA256. As we are in the process of converting the reftable library to become standalone we of course cannot rely on those constants anymore. Introduce a new `enum reftable_hash` to replace internal uses of the hash format IDs and new constants that replace internal uses of the hash size. Adapt the reftable backend to set up the correct hash function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-21global: Fix duplicate word typosSven Strickroth1-1/+1
Used regex to find these typos: (?<!struct )(?<=\s)([a-z]{1,}) \1(?=\s) Signed-off-by: Sven Strickroth <email@cs-ware.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-10Merge branch 'ps/reftable-alloc-failures'Junio C Hamano1-8/+31
The reftable library is now prepared to expect that the memory allocation function given to it may fail to allocate and to deal with such an error. * ps/reftable-alloc-failures: (26 commits) reftable/basics: fix segfault when growing `names` array fails reftable/basics: ban standard allocator functions reftable: introduce `REFTABLE_FREE_AND_NULL()` reftable: fix calls to free(3P) reftable: handle trivial allocation failures reftable/tree: handle allocation failures reftable/pq: handle allocation failures when adding entries reftable/block: handle allocation failures reftable/blocksource: handle allocation failures reftable/iter: handle allocation failures when creating indexed table iter reftable/stack: handle allocation failures in auto compaction reftable/stack: handle allocation failures in `stack_compact_range()` reftable/stack: handle allocation failures in `reftable_new_stack()` reftable/stack: handle allocation failures on reload reftable/reader: handle allocation failures in `reader_init_iter()` reftable/reader: handle allocation failures for unindexed reader reftable/merged: handle allocation failures in `merged_table_init_iter()` reftable/writer: handle allocation failures in `reftable_new_writer()` reftable/writer: handle allocation failures in `writer_index_hash()` reftable/record: handle allocation failures when decoding records ...
2024-10-02reftable/merged: handle allocation failures in `merged_table_init_iter()`Patrick Steinhardt1-8/+31
Handle allocation failures in `merged_table_init_iter()`. While at it, merge `merged_iter_init()` into the function. It only has a single caller and merging them makes it easier to handle allocation failures consistently. This change also requires us to adapt `reftable_stack_init_*_iterator()` to bubble up the new error codes of `merged_table_iter_init()`. Adapt callsites accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30Merge branch 'ps/reftable-concurrent-writes'Junio C Hamano1-2/+11
Give timeout to the locking code to write to reftable. * ps/reftable-concurrent-writes: refs/reftable: reload locked stack when preparing transaction reftable/stack: allow locking of outdated stacks refs/reftable: introduce "reftable.lockTimeout"
2024-09-25Merge branch 'ps/reftable-exclude'Junio C Hamano1-3/+130
The reftable backend learned to more efficiently handle exclude patterns while enumerating the refs. * ps/reftable-exclude: refs/reftable: wire up support for exclude patterns reftable/reader: make table iterator reseekable t/unit-tests: introduce reftable library Makefile: stop listing test library objects twice builtin/receive-pack: fix exclude patterns when announcing refs refs: properly apply exclude patterns to namespaced refs
2024-09-24refs/reftable: reload locked stack when preparing transactionPatrick Steinhardt1-1/+2
When starting a reftable transaction we lock all stacks we are about to modify. While it may happen that the stack is out-of-date at this point in time we don't really care: transactional updates encode the expected state of a certain reference, so all that we really want to verify is that the _current_ value matches that expected state. Pass `REFTABLE_STACK_NEW_ADDITION_RELOAD` when locking the stack such that an out-of-date stack will be reloaded after having been locked. This change is safe because all verifications of the expected state happen after this step anyway. Add a testcase that verifies that many writers are now able to write to the stack concurrently without failures and with a deterministic end result. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-24reftable/stack: allow locking of outdated stacksPatrick Steinhardt1-2/+2
In `reftable_stack_new_addition()` we first lock the stack and then check whether it is still up-to-date. If it is not we return an error to the caller indicating that the stack is outdated. This is overly restrictive in our ref transaction interface though: we lock the stack right before we start to verify the transaction, so we do not really care whether it is outdated or not. What we really want is that the stack is up-to-date after it has been locked so that we can verify queued updates against its current state while we know that it is locked for concurrent modification. Introduce a new flag `REFTABLE_STACK_NEW_ADDITION_RELOAD` that alters the behaviour of `reftable_stack_init_addition()` in this case: when we notice that it is out-of-date we reload it instead of returning an error to the caller. This logic will be wired up in the reftable backend in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>