| Age | Commit message (Collapse) | Author | Files | Lines |
|
Code clean-up.
* js/unreachable-workaround-for-no-symlink-head:
refs: forbid clang to complain about unreachable code
|
|
Deal more gracefully with directory / file conflicts when the files
backend is used for ref storage, by failing only the ones that are
involved in the conflict while allowing others.
* kn/refs-files-case-insensitive:
refs/files: handle D/F conflicts during locking
refs/files: handle F/D conflicts in case-insensitive FS
refs/files: use correct error type when lock exists
refs/files: catch conflicts on case-insensitive file-systems
|
|
"git refs migrate" to migrate the reflog entries from a refs
backend to another had a handful of bugs squashed.
* ps/reflog-migrate-fixes:
refs: fix invalid old object IDs when migrating reflogs
refs: stop unsetting REF_HAVE_OLD for log-only updates
refs/files: detect race when generating reflog entry for HEAD
refs: fix identity for migrated reflogs
ident: fix type of string length parameter
builtin/reflog: implement subcommand to write new entries
refs: export `ref_transaction_update_reflog()`
builtin/reflog: improve grouping of subcommands
Documentation/git-reflog: convert to use synopsis type
|
|
When `NO_SYMLINK_HEAD` is defined, `create_ref_symlink()` is hard-coded
as `(-1)`, and as a consequence the condition `!create_ref_symlink()`
always evaluates to false, rendering any code guarded by that condition
unreachable.
Therefore, clang is _technically_ correct when it complains about
unreachable code. It does completely miss the fact that this is okay
because on _other_ platforms, where `NO_SYMLINK_HEAD` is not defined,
the code isn't unreachable at all.
Let's use the same trick as in 82e79c63642c (git-compat-util: add
NOT_CONSTANT macro and use it in atfork_prepare(), 2025-03-17) to
appease clang while at the same time keeping the `-Wunreachable` flag
to potentially find _actually_ unreachable code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The previous commit added the necessary validation and checks for F/D
conflicts in the files backend when working on case insensitive systems.
There is still a possibility for D/F conflicts. This is a different from
the F/D since for F/D conflicts, there would not be a conflict during
the lock creation phase:
refs/heads/foo.lock
refs/heads/foo/bar.lock
However there would be a conflict when the locks are committed, since we
cannot have 'refs/heads/foo/bar' and 'refs/heads/foo'. These kinds of
conflicts are checked and resolved in
`refs_verify_refnames_available()`, so the previous commit ensured that
for case-insensitive filesystems, we would lowercase the inputs to that
function.
For D/F conflicts, there is a conflict during the lock creation phase
itself:
refs/heads/foo/bar.lock
refs/heads/foo.lock
As in `lock_raw_ref()` after creating the lock, we also check for D/F
conflicts. This can occur in case-insensitive filesystems when trying to
fetch case-conflicted references like:
refs/heads/Foo/new
refs/heads/foo
D/F conflicts can also occur in case-sensitive filesystems, when the
repository already contains a directory with a lock file
'refs/heads/foo/bar.lock' and trying to fetch 'refs/heads/foo'. This
doesn't concern directories containing garbage files as those are
handled on a higher level.
To fix this, simply categorize the error as a name conflict. Also remove
this reference from the list of valid refnames for availability checks.
By categorizing the error and removing it from the list of valid
references, batched updates now knows to reject such reference updates
and apply the other reference updates.
Fix a small typo in `ref_transaction_maybe_set_rejected()` while here.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When using the files-backend on case-insensitive filesystems, there is
possibility of hitting F/D conflicts when creating references within a
single transaction, such as:
- 'refs/heads/foo'
- 'refs/heads/Foo/bar'
Ideally such conflicts are caught in `refs_verify_refnames_available()`
which is responsible for checking F/D conflicts within a given
transaction. This utility function is shared across the reference
backends. As such, it doesn't consider the issues of using a
case-insensitive file system, which only affects the files-backend.
While one solution would be to make the function aware of such issues,
this feels like leaking implementation details of file-backend specific
issues into the utility function. So opt for the more simpler option, of
lowercasing all references sent to this function when on a
case-insensitive filesystem and operating on the files-backend.
To do this, simply use a `struct strbuf` to convert the refname to
lowercase and append it to the list of refnames to be checked. Since we
use a `struct strbuf` and the memory is cleared right after, make sure
that the string list duplicates all provided string.
Without this change, the user would simply be left with a repository
with '.lock' files which were created in the 'prepare' phase of the
transaction, as the 'commit' phase would simply abort and not do the
necessary cleanup.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When fetching references into a repository, if a lock for a particular
reference exists, then `lock_raw_ref()` throws:
- REF_TRANSACTION_ERROR_CASE_CONFLICT: when there is a conflict
because the transaction contains conflicting references while being
on a case-insensitive filesystem.
- REF_TRANSACTION_ERROR_GENERIC: for all other errors.
The latter causes the entire set of batched updates to fail, even in
case sensitive filessystems.
Instead, return a 'REF_TRANSACTION_ERROR_CREATE_EXISTS' error. This
allows batched updates to reject the individual update which conflicts
with the existing file, while updating the rest of the references.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
During the 'prepare' phase of a reference transaction in the files
backend, we create the lock files for references to be created. When
using batched updates on case-insensitive filesystems, the entire
batched updates would be aborted if there are conflicting names such as:
refs/heads/Foo
refs/heads/foo
This affects all commands which were migrated to use batched updates in
Git 2.51, including 'git-fetch(1)' and 'git-receive-pack(1)'. Before
that, reference updates would be applied serially with one transaction
used per update. When users fetched multiple references on
case-insensitive systems, subsequent references would simply overwrite
any earlier references. So when fetching:
refs/heads/foo: 5f34ec0bfeac225b1c854340257a65b106f70ea6
refs/heads/Foo: ec3053b0977e83d9b67fc32c4527a117953994f3
refs/heads/sample: 2eefd1150e06d8fca1ddfa684dec016f36bf4e56
The user would simply end up with:
refs/heads/foo: ec3053b0977e83d9b67fc32c4527a117953994f3
refs/heads/sample: 2eefd1150e06d8fca1ddfa684dec016f36bf4e56
This is buggy behavior since the user is never informed about the
overrides performed and missing references. Nevertheless, the user is
left with a working repository with a subset of the references. Since
Git 2.51, in such situations fetches would simply fail without updating
any references. Which is also buggy behavior and worse off since the
user is left without any references.
The error is triggered in `lock_raw_ref()` where the files backend
attempts to create a lock file. When a lock file already exists the
function returns a 'REF_TRANSACTION_ERROR_GENERIC'. When this happens,
the entire batched updates, not individual operation, is aborted as if
it were in a transaction.
Change this to return 'REF_TRANSACTION_ERROR_CASE_CONFLICT' instead to
aid the batched update mechanism to simply reject such errors. The
change only affects batched updates since batched updates will reject
individual updates with non-generic errors. So specifically this would
only affect:
1. git fetch
2. git receive-pack
3. git update-ref --batch-updates
This bubbles the error type up to `files_transaction_prepare()` which
tries to lock each reference update. So if the locking fails, we check
if the rejection type can be ignored, which is done by calling
`ref_transaction_maybe_set_rejected()`.
As the error type is now 'REF_TRANSACTION_ERROR_CASE_CONFLICT',
the specific reference update would simply be rejected, while other
updates in the transaction would continue to be applied. This allows
partial application of references in case-insensitive filesystems when
fetching colliding references.
While the earlier implementation allowed the last reference to be
applied overriding the initial references, this change would allow the
first reference to be applied while rejecting consequent collisions.
This should be an okay compromise since with the files backend, there is
no scenario possible where we would retain all colliding references.
Let's also be more proactive and notify users on case-insensitive
filesystems about such problems by providing a brief about the issue
while also recommending using the reftable backend, which doesn't have
the same issue.
Reported-by: Joe Drew <joe.drew@indexexchange.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When migrating reflog entries between different storage formats we end
up with invalid old object IDs for the migrated entries: instead of
writing the old object ID of the to-be-migrated entry, we end up with
the all-zeroes object ID.
The root cause of this issue is that we don't know to use the old object
ID provided by the caller. Instead, we manually resolve the old object
ID by resolving the current value of its matching reference. But as that
reference does not yet exist in the target ref storage we always end up
resolving it to all-zeroes.
This issue got unnoticed as there is no user-facing command that would
even show the old object ID. While `git log -g` knows to show the new
object ID, we don't have any formatting directive to show the old object
ID.
Fix the bug by introducing a new flag `REF_LOG_USE_PROVIDED_OIDS`. If
set, backends are instructed to use the old and new object IDs provided
by the caller, without doing any manual resolving. Set this flag in
`ref_transaction_update_reflog()`.
Amend our tests in t1460-refs-migrate to use our test tool to read
reflog entries. This test tool prints out both old and new object ID of
each reflog entry, which fixes the test gap. Furthermore it also prints
the full identity used to write the reflog, which provides test coverage
for the previous commit in this patch series that fixed the identity for
migrated reflogs.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `REF_HAVE_OLD` flag indicates whether a given ref update has its old
object ID set. If so, the value of that field is used to verify whether
the current state of the reference matches this expected state. It is
thus an important part of mitigating races with a concurrent process
that updates the same set of references.
When writing reflogs though we explicitly unset that flag. This is a
sensible thing to do: the old state of reflog entry updates may not
necessarily match the current on-disk state of its accompanying ref, but
it's only intended to signal what old object ID we want to write into
the new reflog entry. For example when migrating refs we end up writing
many reflog entries for a single reference, and most likely those reflog
entries will have many different old object IDs.
But unsetting this flag also removes a useful signal, namely that the
caller _did_ provide an old object ID for a given reflog entry. This
signal will become useful in a subsequent commit, where we add a new
flag that tells the transaction to use the provided old and new object
IDs to write a reflog entry. The `REF_HAVE_OLD` flag is then used as a
signal to verify that the caller really did provide an old object ID.
Stop unsetting the flag so that we can use it as this described signal
in a subsequent commit. Skip checking the old object ID for log-only
updates so that we don't expect it to match the current on-disk state.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When updating a reference that is being pointed to HEAD we don't only
write a reflog message for that particular reference, but also generate
one for HEAD. This logic is handled by `split_head_update()`, where we:
1. Verify that the condition actually triggered. This is done by
reading HEAD at the start of the transaction so that we can then
check whether a given reference update refers to its target.
2. Queue a new log-only update for HEAD in case it did.
But the logic is unfortunately not free of races, as we do not lock the
HEAD reference after we have read its target. This can lead to the
following two scenarios:
- HEAD gets concurrently updated to point to one of the references we
have already processed. This causes us not writing a reflog message
even though we should have done so.
- HEAD gets concurrently updated to no longer point to a reference
anymore that we have already processed. This causes us to write a
reflog message even though we should _not_ have done so.
Improve the situation by introducing a new `REF_LOG_VIA_SPLIT` flag that
is specific to the "files" backend. If set, we will double check that
the HEAD reference still points to the reference that we are creating
the reflog entry for after we have locked HEAD. Furthermore, instead of
manually resolving the old object ID of that entry, we now use the same
old state as for the parent update.
If we detect such a racy update we abort the transaction. This is a bit
heavy-handed: the user didn't even ask us to write a reflog update for
"HEAD", so it might be surprising if we abort the transaction. That
being said:
- Normal users wouldn't typically hit this case as we only hit the
relevant code when committing to a branch that is being pointed to
by "HEAD" directly. Commands like git-commit(1) typically commit to
"HEAD" itself though.
- Scripted users that use git-update-ref(1) and related plumbing
commands are unlikely to hit this case either, as they would have to
update the pointed-to-branch at the same as "HEAD" is being updated,
which is an exceedingly rare event.
The alternative would be to instead drop the log-only update completely,
but that would require more logic that is hard to verify without adding
infrastructure specific for such a test. So we rather do the pragmatic
thing and don't worry too much about an edge case that is very unlikely
to happen.
Unfortunately, this change only helps with the second race. We cannot
reliably plug the first race without locking the HEAD reference at the
start of the transaction. Locking HEAD unconditionally would effectively
serialize all writes though, and that doesn't seem like an option. Also,
double checking its value at the end of the transaction is not an option
either, as its target may have flip-flopped during the transaction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git for-each-ref" learns "--start-after" option to help
applications that want to page its output.
* kn/for-each-ref-skip:
ref-cache: set prefix_state when seeking
for-each-ref: introduce a '--start-after' option
ref-filter: remove unnecessary else clause
refs: selectively set prefix in the seek functions
ref-cache: remove unused function 'find_ref_entry()'
refs: expose `ref_iterator` via 'refs.h'
|
|
When a ref creation at refs/heads/foo/bar fails, the files backend
now removes refs/heads/foo/ if the directory is otherwise not used.
* ps/refs-files-remove-empty-parent:
refs/files: remove empty parent dirs when ref creation fails
|
|
The ref iterator exposes a `ref_iterator_seek()` function. The name
suggests that this would seek the iterator to a specific reference in
some ways similar to how `fseek()` works for the filesystem.
However, the function actually sets the prefix for refs iteration. So
further iteration would only yield references which match the particular
prefix. This is a bit confusing.
Let's add a 'flags' field to the function, which when set with the
'REF_ITERATOR_SEEK_SET_PREFIX' flag, will set the prefix for the
iteration in-line with the existing behavior. Otherwise, the reference
backends will simply seek to the specified reference and clears any
previously set prefix. This allows users to start iteration from a
specific reference.
In the packed and reftable backend, since references are available in a
sorted list, the changes are simply setting the prefix if needed. The
changes on the files-backend are a little more involved, since the files
backend uses the 'ref-cache' mechanism. We move out the existing logic
within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()`
which is called when the 'REF_ITERATOR_SEEK_SET_PREFIX' flag is set. We
then parse the provided seek string and set the required levels and
their indexes to ensure that seeking is possible.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git push" and "git fetch" are taught to update refs in batches to
gain performance.
* kn/fetch-push-bulk-ref-update:
receive-pack: handle reference deletions separately
refs/files: skip updates with errors in batched updates
receive-pack: use batched reference updates
send-pack: fix memory leak around duplicate refs
fetch: use batched reference updates
refs: add function to translate errors to strings
|
|
When creating a new reference in the "files" backend we first create the
directory hierarchy for that reference, then create the lockfile for
that reference, and finally rename the lockfile into place. When the
transaction gets aborted we prune the lockfile, but we don't clean up
the directory hierarchy that we may have created for the lockfile.
In some egde cases this can lead to lots of empty directories being
cluttered in the ".git/refs" directory that really serve no purpose at
all. We know to prune such empty directories when packing refs, but that
only patches over the issue.
Improve this by removing empty parents when cleaning up still-locked
references in `files_transaction_cleanup()`. This function is also
called when preparing or committing the transaction, so this change also
helps when not explicitly aborting the transaction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The commit 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08) introduced support for batched reference updates. This
allows users to batch updates together, while allowing some of the
updates to fail.
Under the hood, batched updates use the reference transaction mechanism.
Each update which fails is marked as such. Any failed updates must be
skipped over in the rest of the code, as they wouldn't apply any more.
In two of the loops within 'files_transaction_finish()' of the files
backend, the failed updates aren't skipped over. This can cause a
SEGFAULT otherwise. Add the missing skips and a test to validate the
same.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git verify-refs" (and hence "git fsck --reference") started
erroring out in a repository in which secondary worktrees were
prepared with Git 2.43 or lower.
* sj/ref-contents-check-fix:
fsck: ignore missing "refs" directory for linked worktrees
|
|
"git refs verify" doesn't work if there are worktrees created on Git
v2.43.0 or older versions. These versions don't automatically create the
"refs" directory, causing the error:
error: cannot open directory .git/worktrees/<worktree name>/refs:
No such file or directory
Since 8f4c00de95 (builtin/worktree: create refdb via ref backend,
2024-01-08), we automatically create the "refs" directory for new
worktrees. And in 7c78d819e6 (ref: support multiple worktrees check for
refs, 2024-11-20), we assume that all linked worktrees have this
directory and would wrongly report an error to the user, thus
introducing compatibility issue.
Check for ENOENT errno before reporting directory access errors for
linked worktrees to maintain backward compatibility.
Reported-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up.
* ps/object-file-cleanup:
object-store: merge "object-store-ll.h" and "object-store.h"
object-store: remove global array of cached objects
object: split out functions relating to object store subsystem
object-file: drop `index_blob_stream()`
object-file: split up concerns of `HASH_*` flags
object-file: split out functions relating to object store subsystem
object-file: move `xmmap()` into "wrapper.c"
object-file: move `git_open_cloexec()` to "compat/open.c"
object-file: move `safe_create_leading_directories()` into "path.c"
object-file: move `mkdir_in_gitdir()` into "path.c"
|
|
Updating multiple references have only been possible in all-or-none
fashion with transactions, but it can be more efficient to batch
multiple updates even when some of them are allowed to fail in a
best-effort manner. A new "best effort batches of updates" mode
has been introduced.
* kn/non-transactional-batch-updates:
update-ref: add --batch-updates flag for stdin mode
refs: support rejection in batch updates during F/D checks
refs: implement batch reference update support
refs: introduce enum-based transaction error types
refs/reftable: extract code from the transaction preparation
refs/files: remove duplicate duplicates check
refs: move duplicate refname update check to generic layer
refs/files: remove redundant check in split_symref_update()
|
|
The object layer has been updated to take an explicit repository
instance as a parameter in more code paths.
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
|
|
The `safe_create_leading_directories()` function and its relatives are
located in "object-file.c", which is not a good fit as they provide
generic functionality not related to objects at all. Move them into
"path.c", which already hosts `safe_create_dir()` and its relative
`safe_create_dir_in_gitdir()`.
"path.c" is free of `the_repository`, but the moved functions depend on
`the_repository` to read the "core.sharedRepository" config. Adapt the
function signature to accept a repository as argument to fix the issue
and adjust callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
|
|
The `refs_verify_refnames_available()` is used to batch check refnames
for F/D conflicts. While this is the more performant alternative than
its individual version, it does not provide rejection capabilities on a
single update level. For batched updates, this would mean a rejection of
the entire transaction whenever one reference has a F/D conflict.
Modify the function to call `ref_transaction_maybe_set_rejected()` to
check if a single update can be rejected. Since this function is only
internally used within 'refs/' and we want to pass in a `struct
ref_transaction *` as a variable. We also move and mark
`refs_verify_refnames_available()` to 'refs-internal.h' to be an
internal function.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Git supports making reference updates with or without transactions.
Updates with transactions are generally better optimized. But
transactions are all or nothing. This means, if a user wants to batch
updates to take advantage of the optimizations without the hard
requirement that all updates must succeed, there is no way currently to
do so. Particularly with the reftable backend where batching multiple
reference updates is more efficient than performing them sequentially.
Introduce batched update support with a new flag,
'REF_TRANSACTION_ALLOW_FAILURE'. Batched updates while different from
transactions, use the transaction infrastructure under the hood. When
enabled, this flag allows individual reference updates that would
typically cause the entire transaction to fail due to non-system-related
errors to be marked as rejected while permitting other updates to
proceed. System errors referred by 'REF_TRANSACTION_ERROR_GENERIC'
continue to result in the entire transaction failing. This approach
enhances flexibility while preserving transactional integrity where
necessary.
The implementation introduces several key components:
- Add 'rejection_err' field to struct `ref_update` to track failed
updates with failure reason.
- Add a new struct `ref_transaction_rejections` and a field within
`ref_transaction` to this struct to allow quick iteration over
rejected updates.
- Modify reference backends (files, packed, reftable) to handle
partial transactions by using `ref_transaction_set_rejected()`
instead of failing the entire transaction when
`REF_TRANSACTION_ALLOW_FAILURE` is set.
- Add `ref_transaction_for_each_rejected_update()` to let callers
examine which updates were rejected and why.
This foundational change enables batched update support throughout the
reference subsystem. A following commit will expose this capability to
users by adding a `--batch-updates` flag to 'git-update-ref(1)',
providing both a user-facing feature and a testable implementation.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Replace preprocessor-defined transaction errors with a strongly-typed
enum `ref_transaction_error`. This change:
- Improves type safety and function signature clarity.
- Makes error handling more explicit and discoverable.
- Maintains existing error cases, while adding new error cases for
common scenarios.
This refactoring paves the way for more comprehensive error handling
which we will utilize in the upcoming commits to add batch reference
update support.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Within the files reference backend's transaction's 'finish' phase, a
verification step is currently performed wherein the refnames list is
sorted and examined for multiple updates targeting the same refname.
It has been observed that this verification is redundant, as an
identical check is already executed during the transaction's 'prepare'
stage. Since the refnames list remains unmodified following the
'prepare' stage, this secondary verification can be safely eliminated.
The duplicate check has been removed accordingly, and the
`ref_update_reject_duplicates()` function has been marked as static, as
its usage is now confined to 'refs.c'.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the tracking of refnames in `affected_refnames` from individual
backends into the generic layer in 'refs.c'. This centralizes the
duplicate refname detection that was previously handled separately by
each backend.
Make some changes to accommodate this move:
- Add a `string_list` field `refnames` to `ref_transaction` to contain
all the references in a transaction. This field is updated whenever
a new update is added via `ref_transaction_add_update`, so manual
additions in reference backends are dropped.
- Modify the backends to use this field internally as needed. The
backends need to check if an update for refname already exists when
splitting symrefs or adding an update for 'HEAD'.
- In the reftable backend, within `reftable_be_transaction_prepare()`,
move the `string_list_has_string()` check above
`ref_transaction_add_update()`. Since `ref_transaction_add_update()`
automatically adds the refname to `transaction->refnames`,
performing the check after will always return true, so we perform
the check before adding the update.
This helps reduce duplication of functionality between the backends and
makes it easier to make changes in a more centralized manner.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In `split_symref_update()`, there were two checks for duplicate
refnames:
- At the start, `string_list_has_string()` ensures the refname is not
already in `affected_refnames`, preventing duplicates from being
added.
- After adding the refname, another check verifies whether the newly
inserted item has a `util` value.
The second check is unnecessary because the first one guarantees that
`string_list_insert()` will never encounter a preexisting entry.
The `item->util` field is assigned to validate that a rename doesn't
already exist in the list. The validation is done after the first check.
As this check is removed, clean up the validation and the assignment of
this field in `split_head_update()` and `files_transaction_prepare()`.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code paths to check whether a refname X is available (by seeing
if another ref X/Y exists, etc.) have been optimized.
* ps/refname-avail-check-optim:
refs: reuse iterators when determining refname availability
refs/iterator: implement seeking for files iterators
refs/iterator: implement seeking for packed-ref iterators
refs/iterator: implement seeking for ref-cache iterators
refs/iterator: implement seeking for reftable iterators
refs/iterator: implement seeking for merged iterators
refs/iterator: provide infrastructure to re-seek iterators
refs/iterator: separate lifecycle from iteration
refs: stop re-verifying common prefixes for availability
refs/files: batch refname availability checks for initial transactions
refs/files: batch refname availability checks for normal transactions
refs/reftable: batch refname availability checks
refs: introduce function to batch refname availability checks
builtin/update-ref: skip ambiguity checks when parsing object IDs
object-name: allow skipping ambiguity checks in `get_oid()` family
object-name: introduce `repo_get_oid_with_flags()`
|
|
Implement seeking for "files" iterators. As we simply use a ref-cache
iterator under the hood the implementation is straight-forward. Note
that we do not implement seeking on reflog iterators, same as with the
"reftable" backend.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The ref and reflog iterators have their lifecycle attached to iteration:
once the iterator reaches its end, it is automatically released and the
caller doesn't have to care about that anymore. When the iterator should
be released before it has been exhausted, callers must explicitly abort
the iterator via `ref_iterator_abort()`.
This lifecycle is somewhat unusual in the Git codebase and creates two
problems:
- Callsites need to be very careful about when exactly they call
`ref_iterator_abort()`, as calling the function is only valid when
the iterator itself still is. This leads to somewhat awkward calling
patterns in some situations.
- It is impossible to reuse iterators and re-seek them to a different
prefix. This feature isn't supported by any iterator implementation
except for the reftable iterators anyway, but if it was implemented
it would allow us to optimize cases where we need to search for
specific references repeatedly by reusing internal state.
Detangle the lifecycle from iteration so that we don't deallocate the
iterator anymore once it is exhausted. Instead, callers are now expected
to always call a newly introduce `ref_iterator_free()` function that
deallocates the iterator and its internal state.
Note that the `dir_iterator` is somewhat special because it does not
implement the `ref_iterator` interface, but is only used to implement
other iterators. Consequently, we have to provide `dir_iterator_free()`
instead of `dir_iterator_release()` as the allocated structure itself is
managed by the `dir_iterator` interfaces, as well, and not freed by
`ref_iterator_free()` like in all the other cases.
While at it, drop the return value of `ref_iterator_abort()`, which
wasn't really required by any of the iterator implementations anyway.
Furthermore, stop calling `base_ref_iterator_free()` in any of the
backends, but instead call it in `ref_iterator_free()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "files" backend explicitly carves out special logic for its initial
transaction so that it can avoid writing out every single reference as
a loose reference. While the assumption is that there shouldn't be any
preexisting references, we still have to verify that none of the newly
written references will conflict with any other new reference in the
same transaction.
Refactor the initial transaction to use batched refname availability
checks. This does not yet have an effect on performance as we still call
`refs_verify_refname_available()` in a loop. But this will change in
subsequent commits and then impact performance when cloning a repository
with many references or when migrating references to the "files" format.
This will improve performance when cloning a repository with many
references or when migrating references from any format to the "files"
format once the availability checks have learned to optimize checks for
many references in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Same as the "reftable" backend that we have adapted in the preceding
commit to use batched refname availability checks we can also do so for
the "files" backend. Things are a bit more intricate here though, as we
call `refs_verify_refname_available()` in a set of different contexts:
1. `lock_raw_ref()` when it hits either EEXISTS or EISDIR when creating
a new reference, mostly to create a nice, user-readable error
message. This is nothing we have to care about too much, as we only
hit this code path at most once when we hit a conflict.
2. `lock_raw_ref()` when it _could_ create the lockfile to check
whether it is conflicting with any packed refs. In the general case,
this code path will be hit once for every (successful) reference
update.
3. `lock_ref_oid_basic()`, but it is only executed when copying or
renaming references or when expiring reflogs. It will thus not be
called in contexts where we have many references queued up.
4. `refs_refname_ref_available()`, but again only when copying or
renaming references. It is thus not interesting due to the same
reason as the previous case.
5. `files_transaction_finish_initial()`, which is only executed when
creating a new repository or migrating references.
So out of these, only (2) and (5) are viable candidates to use the
batched checks.
Adapt `lock_raw_ref()` accordingly by queueing up reference names that
need to be checked for availability and then checking them after we have
processed all updates. This check is done before we (optionally) lock
the `packed-refs` file, which is somewhat flawed because it means that
the `packed-refs` could still change after the availability check and
thus create an undetected conflict. But unconditionally locking the file
would change semantics that users are likely to rely on, so we keep the
current locking sequence intact, even if it's suboptmial.
The refactoring of `files_transaction_finish_initial()` will be done in
the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.
This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.
There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:
- "builtin/grep.c"
- "grep.c"
- "refs/debug.c"
There are also two non-trivial exceptions:
- "diff-no-index.c": Here we know that we may not have a repository
initialized at all, so we cannot rely on `the_repository`. Instead,
we adapt `diff_no_index()` to get a `struct git_hash_algo` as
parameter. The only caller is located in "builtin/diff.c", where we
know to call `repo_set_hash_algo()` in case we're running outside of
a Git repository. Consequently, it is fine to continue passing
`the_repository->hash_algo` even in this case.
- "builtin/ls-files.c": There is an in-flight patch series that drops
`USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
conflict because we use `null_oid()` in `show_submodule()`. The
value is passed to `repo_submodule_init()`, which may use the object
ID to resolve a tree-ish in the superproject from which we want to
read the submodule config. As such, the object ID should refer to an
object in the superproject, and consequently we need to use its hash
algorithm.
This means that we could in theory just not bother about this edge
case at all and just use `the_repository` in "diff-no-index.c". But
doing so would feel misdesigned.
Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
With the preceding refactorings we now only have a couple of implicit
users of `the_repository` left in the "path" subsystem, all of which
depend on global state via `calc_shared_perm()`. Make the dependency on
`the_repository` explicit by passing the repo as a parameter instead and
adjust callers accordingly.
Note that this change bubbles up into a couple of subsystems that were
previously declared as free from `the_repository`. Instead of marking
all of them as `the_repository`-dependent again, we instead use the
repository that is available in the calling context. There are three
exceptions though with "copy.c", "pack-write.c" and "tempfile.c".
Adjusting these would require us to adapt callsites all over the place,
so this is left for a future iteration.
Mark "path.c" as free from `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
reflog entries for symbolic ref updates were broken, which has been
corrected.
* kn/reflog-symref-fix:
refs: fix creation of reflog entries for symrefs
|
|
The commit 297c09eabb (refs: allow multiple reflog entries for the
same refname, 2024-12-16) added logic to exit early in
`lock_ref_for_update()` after obtaining the required lock. This was
added as a performance optimization on a false assumption that no
further processing was required for reflog-only updates.
However the assumption was wrong. For a symref's reflog entry, the
update needs to be populated with the old_oid value, but the early
exit skipped this necessary step.
This caused a bug in Git 2.48 in the files backend where target
references of symrefs being updated would create a corrupted reflog
entry for the symref since the old_oid is not populated.
Everything the early exit skipped in the code path is necessary for
both regular and symbolic ref, so eliminate the mistaken
optimization, and also add a test to ensure that such an issue
doesn't arise in the future.
Reported-by: Nika Layzell <nika@thelayzells.com>
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git refs migrate" learned to also migrate the reflog data across
backends.
* kn/reflog-migration:
refs: mark invalid refname message for translation
refs: add support for migrating reflogs
refs: allow multiple reflog entries for the same refname
refs: introduce the `ref_transaction_update_reflog` function
refs: add `committer_info` to `ref_transaction_add_update()`
refs: extract out refname verification in transactions
refs/files: add count field to ref_lock
refs: add `index` field to `struct ref_udpate`
refs: include committer info in `ref_update` struct
|
|
Start working to make the codebase buildable with -Wsign-compare.
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
|
|
When "git fetch $remote" notices that refs/remotes/$remote/HEAD is
missing and discovers what branch the other side points with its
HEAD, refs/remotes/$remote/HEAD is updated to point to it.
* bf/set-head-symref:
fetch set_head: handle mirrored bare repositories
fetch: set remote/HEAD if it does not exist
refs: add create_only option to refs_update_symref_extended
refs: add TRANSACTION_CREATE_EXISTS error
remote set-head: better output for --auto
remote set-head: refactor for readability
refs: atomically record overwritten ref in update_symref
refs: standardize output of refs_read_symbolic_ref
t/t5505-remote: test failure of set-head
t/t5505-remote: set default branch to main
|
|
The reference transaction only allows a single update for a given
reference to avoid conflicts. This, however, isn't an issue for reflogs.
There are no conflicts to be resolved in reflogs and when migrating
reflogs between backends we'd have multiple reflog entries for the same
refname.
So allow multiple reflog updates within a single transaction. Also the
reflog creation logic isn't exposed to the end user. While this might
change in the future, currently, this reduces the scope of issues to
think about.
In the reftable backend, the writer sorts all updates based on the
update_index before writing to the block. When there are multiple
reflogs for a given refname, it is essential that the order of the
reflogs is maintained. So add the `index` value to the `update_index`.
The `index` field is only set when multiple reflog entries for a given
refname are added and as such in most scenarios the old behavior
remains.
This is required to add reflog migration support to `git refs migrate`.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce a new function `ref_transaction_update_reflog`, for clients to
add a reflog update to a transaction. While the existing function
`ref_transaction_update` also allows clients to add a reflog entry, this
function does a few things more, It:
- Enforces that only a reflog entry is added and does not update the
ref itself.
- Allows the users to also provide the committer information. This
means clients can add reflog entries with custom committer
information.
The `transaction_refname_valid()` function also modifies the error
message selectively based on the type of the update. This change also
affects reflog updates which go through `ref_transaction_update()`.
A follow up commit will utilize this function to add reflog support to
`git refs migrate`.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `ref_transaction_add_update()` creates the `ref_update` struct. To
facilitate addition of reflogs in the next commit, the function needs to
accommodate setting the `committer_info` field in the struct. So modify
the function to also take `committer_info` as an argument and set it
accordingly.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When refs are updated in the files-backend, a lock is obtained for the
corresponding file path. This is the case even for reflogs, i.e. a lock
is obtained on the reference path instead of the reflog path. This
works, since generally, reflogs are updated alongside the ref.
The upcoming patches will add support for reflog updates in ref
transaction. This means, in a particular transaction we want to have ref
updates and reflog updates. For a given ref in a given transaction there
can be at most one update. But we can theoretically have multiple reflog
updates for a given ref in a given transaction. A great example of this
would be when migrating reflogs from one backend to another. There we
would batch all the reflog updates for a given reference in a single
transaction.
The current flow does not support this, because currently refs & reflogs
are treated as a single entity and capture the lock together. To
separate this, add a count field to ref_lock. With this, multiple
updates can hold onto a single ref_lock and the lock will only be
released when all of them release the lock.
This patch only adds the `count` field to `ref_lock` and adds the logic
to increment and decrement the lock. In a follow up commit, we'll
separate the reflog update logic from ref updates and utilize this
functionality.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reference backends obtain the committer information from
`git_committer_info(0)` when adding a reflog. The upcoming patches
introduce support for migrating reflogs between the reference backends.
This requires an interface to creating reflogs, including custom
committer information.
Add a new field `committer_info` to the `ref_update` struct, which is
then used by the reference backends. If there is no `committer_info`
provided, the reference backends default to using
`git_committer_info(0)`. The field itself cannot be set to
`git_committer_info(0)` since the values are dynamic and must be
obtained right when the reflog is being committed.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git fsck" learned to issue warnings on "curiously formatted" ref
contents that have always been taken valid but something Git
wouldn't have written itself (e.g., missing terminating end-of-line
after the full object name).
* sj/ref-contents-check:
ref: add symlink ref content check for files backend
ref: check whether the target of the symref is a ref
ref: add basic symref content check for files backend
ref: add more strict checks for regular refs
ref: port git-fsck(1) regular refs check for files backend
ref: support multiple worktrees check for refs
ref: initialize ref name outside of check functions
ref: check the full refname instead of basename
ref: initialize "fsck_ref_report" with zero
|
|
Currently there is only one special error for transaction, for when
there is a naming conflict, all other errors are dumped under a generic
error. Add a new special error case for when the caller requests the
reference to be updated only when it does not yet exist and the
reference actually does exist.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|