| Age | Commit message (Collapse) | Author | Files | Lines |
|
The code paths to check whether a refname X is available (by seeing
if another ref X/Y exists, etc.) have been optimized.
* ps/refname-avail-check-optim:
refs: reuse iterators when determining refname availability
refs/iterator: implement seeking for files iterators
refs/iterator: implement seeking for packed-ref iterators
refs/iterator: implement seeking for ref-cache iterators
refs/iterator: implement seeking for reftable iterators
refs/iterator: implement seeking for merged iterators
refs/iterator: provide infrastructure to re-seek iterators
refs/iterator: separate lifecycle from iteration
refs: stop re-verifying common prefixes for availability
refs/files: batch refname availability checks for initial transactions
refs/files: batch refname availability checks for normal transactions
refs/reftable: batch refname availability checks
refs: introduce function to batch refname availability checks
builtin/update-ref: skip ambiguity checks when parsing object IDs
object-name: allow skipping ambiguity checks in `get_oid()` family
object-name: introduce `repo_get_oid_with_flags()`
|
|
The refname exclusion logic in the packed-ref backend has been
broken for some time, which confused upload-pack to advertise
different set of refs. This has been corrected.
* tb/refs-exclude-fixes:
refs.c: stop matching non-directory prefixes in exclude patterns
refs.c: remove empty '--exclude' patterns
|
|
The default branch name advice message is displayed when
`repo_default_branch_name()` is invoked and the `init.defaultBranch`
config is not set. In this scenario, the advice message is always shown
even if the `--no-advice` option is used.
Adapt `repo_default_branch_name()` to allow the default branch name
advice message to be disabled with the `--no-advice` option and
corresponding configuration.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When verifying whether refnames are available we have to verify whether
any reference exists that is nested under the current reference. E.g.
given a reference "refs/heads/foo", we must make sure that there is no
other reference "refs/heads/foo/*".
This check is performed using a ref iterator with the prefix set to the
nested reference namespace. Until now it used to not be possible to
reseek iterators, so we always had to reallocate the iterator for every
single reference we're about to check. This keeps us from reusing state
that the iterator may have and that may make it work more efficiently.
Refactor the logic to reseek iterators. This leads to a sizeable speedup
with the "reftable" backend:
Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 39.8 ms ± 0.9 ms [User: 29.7 ms, System: 9.8 ms]
Range (min … max): 38.4 ms … 42.0 ms 62 runs
Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 31.9 ms ± 1.1 ms [User: 27.0 ms, System: 4.5 ms]
Range (min … max): 29.8 ms … 34.3 ms 74 runs
Summary
update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
1.25 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
The "files" backend doesn't really show a huge impact:
Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 392.3 ms ± 7.1 ms [User: 59.7 ms, System: 328.8 ms]
Range (min … max): 384.6 ms … 404.5 ms 10 runs
Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 387.7 ms ± 7.4 ms [User: 54.6 ms, System: 329.6 ms]
Range (min … max): 377.0 ms … 397.7 ms 10 runs
Summary
update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
1.01 ± 0.03 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
This is mostly because it is way slower to begin with because it has to
create a separate file for each new reference, so the milliseconds we
shave off by reseeking the iterator doesn't really translate into a
significant relative improvement.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The ref and reflog iterators have their lifecycle attached to iteration:
once the iterator reaches its end, it is automatically released and the
caller doesn't have to care about that anymore. When the iterator should
be released before it has been exhausted, callers must explicitly abort
the iterator via `ref_iterator_abort()`.
This lifecycle is somewhat unusual in the Git codebase and creates two
problems:
- Callsites need to be very careful about when exactly they call
`ref_iterator_abort()`, as calling the function is only valid when
the iterator itself still is. This leads to somewhat awkward calling
patterns in some situations.
- It is impossible to reuse iterators and re-seek them to a different
prefix. This feature isn't supported by any iterator implementation
except for the reftable iterators anyway, but if it was implemented
it would allow us to optimize cases where we need to search for
specific references repeatedly by reusing internal state.
Detangle the lifecycle from iteration so that we don't deallocate the
iterator anymore once it is exhausted. Instead, callers are now expected
to always call a newly introduce `ref_iterator_free()` function that
deallocates the iterator and its internal state.
Note that the `dir_iterator` is somewhat special because it does not
implement the `ref_iterator` interface, but is only used to implement
other iterators. Consequently, we have to provide `dir_iterator_free()`
instead of `dir_iterator_release()` as the allocated structure itself is
managed by the `dir_iterator` interfaces, as well, and not freed by
`ref_iterator_free()` like in all the other cases.
While at it, drop the return value of `ref_iterator_abort()`, which
wasn't really required by any of the iterator implementations anyway.
Furthermore, stop calling `base_ref_iterator_free()` in any of the
backends, but instead call it in `ref_iterator_free()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
One of the checks done by `refs_verify_refnames_available()` is whether
any of the prefixes of a reference already exists. For example, given a
reference "refs/heads/main", we'd check whether "refs/heads" or "refs"
already exist, and if so we'd abort the transaction.
When updating multiple references at once, this check is performed for
each of the references individually. Consequently, because references
tend to have common prefixes like "refs/heads/" or refs/tags/", we
evaluate the availability of these prefixes repeatedly. Naturally this
is a waste of compute, as the availability of those prefixes should in
general not change in the middle of a transaction. And if it would,
backends would notice at a later point in time.
Optimize this pattern by storing prefixes in a `strset` so that we can
trivially track those prefixes that we have already checked. This leads
to a significant speedup with the "reftable" backend when creating many
references that all share a common prefix:
Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 63.1 ms ± 1.8 ms [User: 41.0 ms, System: 21.6 ms]
Range (min … max): 60.6 ms … 69.5 ms 38 runs
Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 40.0 ms ± 1.3 ms [User: 29.3 ms, System: 10.3 ms]
Range (min … max): 38.1 ms … 47.3 ms 61 runs
Summary
update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
For the "files" backend we see an improvement, but a much smaller one:
Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 395.8 ms ± 5.3 ms [User: 63.6 ms, System: 330.5 ms]
Range (min … max): 387.0 ms … 404.6 ms 10 runs
Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 386.0 ms ± 4.0 ms [User: 51.5 ms, System: 332.8 ms]
Range (min … max): 380.8 ms … 392.6 ms 10 runs
Summary
update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
1.03 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
This change also leads to a modest improvement when writing references
with "initial" semantics, for example when migrating references. The
following benchmarks are migrating 1m references from the "reftable" to
the "files" backend:
Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~)
Time (mean ± σ): 836.6 ms ± 5.6 ms [User: 645.2 ms, System: 185.2 ms]
Range (min … max): 829.6 ms … 845.9 ms 10 runs
Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD)
Time (mean ± σ): 759.8 ms ± 5.1 ms [User: 574.9 ms, System: 178.9 ms]
Range (min … max): 753.1 ms … 768.8 ms 10 runs
Summary
migrate reftable:files (refcount = 1000000, revision = HEAD) ran
1.10 ± 0.01 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~)
And vice versa:
Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~)
Time (mean ± σ): 870.7 ms ± 5.7 ms [User: 735.2 ms, System: 127.4 ms]
Range (min … max): 861.6 ms … 883.2 ms 10 runs
Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD)
Time (mean ± σ): 799.1 ms ± 8.5 ms [User: 661.1 ms, System: 130.2 ms]
Range (min … max): 787.5 ms … 812.6 ms 10 runs
Summary
migrate files:reftable (refcount = 1000000, revision = HEAD) ran
1.09 ± 0.01 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~)
The impact here is significantly smaller given that we don't perform any
reference reads with "initial" semantics, so the speedup only comes from
us doing less string list lookups.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `refs_verify_refname_available()` functions checks whether a
reference update can be committed or whether it would conflict with
either a prefix or suffix thereof. This function needs to be called once
per reference that one wants to check, which requires us to redo a
couple of checks every time the function is called.
Introduce a new function `refs_verify_refnames_available()` that does
the same, but for a list of references. For now, the new function uses
the exact same implementation, except that we loop through all refnames
provided by the caller. This will be tuned in subsequent commits.
The existing `refs_verify_refname_available()` function is reimplemented
on top of the new function. As such, the diff is best viewed with the
`--ignore-space-change option`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.
This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.
There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:
- "builtin/grep.c"
- "grep.c"
- "refs/debug.c"
There are also two non-trivial exceptions:
- "diff-no-index.c": Here we know that we may not have a repository
initialized at all, so we cannot rely on `the_repository`. Instead,
we adapt `diff_no_index()` to get a `struct git_hash_algo` as
parameter. The only caller is located in "builtin/diff.c", where we
know to call `repo_set_hash_algo()` in case we're running outside of
a Git repository. Consequently, it is fine to continue passing
`the_repository->hash_algo` even in this case.
- "builtin/ls-files.c": There is an in-flight patch series that drops
`USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
conflict because we use `null_oid()` in `show_submodule()`. The
value is passed to `repo_submodule_init()`, which may use the object
ID to resolve a tree-ish in the superproject from which we want to
read the submodule config. As such, the object ID should refer to an
object in the superproject, and consequently we need to use its hash
algorithm.
This means that we could in theory just not bother about this edge
case at all and just use `the_repository` in "diff-no-index.c". But
doing so would feel misdesigned.
Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the packed-refs backend, our implementation of '--exclude' (dating
back to 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid
excluded pattern(s), 2023-07-10)) considers, for example:
$ git for-each-ref --exclude=refs/heads/ba
to exclude "refs/heads/bar", "refs/heads/baz", and so on.
The files backend, which does not implement '--exclude' (and relies on
the caller to cull out results that don't match) naturally will
enumerate "refs/heads/bar" and so on.
So in the above example, 'for-each-ref' will try and see if
"refs/heads/ba" matches "refs/heads/bar" (since the files backend simply
enumerated every loose reference), and, realizing that it does not
match, output the reference as expected. (A caller that did want to
exclude "refs/heads/bar" and "refs/heads/baz" might instead run "git
for-each-ref --exclude='refs/heads/ba*'").
This can lead to strange behavior, like seeing a different set of
references advertised via 'upload-pack' depending on what set of
references were loose versus packed.
So there is a subtle bug with '--exclude' which is that in the
packed-refs backend we will consider "refs/heads/bar" to be a pattern
match against "refs/heads/ba" when we shouldn't. Likewise, the reftable
backend (which in this case is bug-compatible with the packed backend)
exhibits the same broken behavior.
There are a few ways to fix this. One is to tighten the rules in
cmp_record_to_refname(), which is used to determine the start/end-points
of the jump list used by the packed backend. In this new "strict" mode,
the comparison function would handle the case where we've reached the
end of the pattern by introducing a new check like so:
while (1) {
if (*r1 == '\n')
return *r2 ? -1 : 0;
if (!*r2)
if (strict && *r1 != '/') /* <- here */
return 1;
return start ? 1 : -1;
if (*r1 != *r2)
return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1;
r1++;
r2++;
}
(eliding out the rest of cmp_record_to_refname()). Equivalently, we
could teach refs/packed-backend::populate_excluded_jump_list() to append
a trailing '/' if one does not already exist, forcing an exclude pattern
like "refs/heads/ba" to only match "refs/heads/ba/abc" and so forth.
But since the same problem exists in reftable, we can fix both at once
by performing this pre-processing step one layer up in refs.c at the
common entrypoint for the two, which is 'refs_ref_iterator_begin()'.
Since that solution is both the simplest and only requires modification
in one spot, let's normalize exclude patterns so that they end with a
trailing slash. This causes us to unify the behavior between all three
backends.
There is some minor test fallout in the "overlapping excluded regions"
test, which happens to use 'refs/ba' as an exclude pattern, and expects
references under the "refs/heads/bar/*" and "refs/heads/baz/*"
hierarchies to be excluded from the results.
But that test fallout is expected, because the test was codifying the
buggy behavior to begin with, and should have never been written that
way. Split that into its own test (since the range is no longer
overlapping under the stricter interpretation of --exclude patterns
presented here). Create a new test which does have overlapping
regions by using a refs/heads/bar/4/... hierarchy and excluding both
"refs/heads/bar" and "refs/heads/bar/4".
Reported-by: SURA <surak8806@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid
excluded pattern(s), 2023-07-10), the packed-refs backend learned how to
construct "jump lists" to avoid enumerating sections of the packed-refs
file that we know the caller is going to throw out anyway.
This process works by finding the start- and end-points (that is, where
in the packed-refs file corresponds to the range we're going to ignore)
for each exclude pattern, then constructing a jump list based on that.
At enumeration time we'll consult the jump list to skip past everything
in the range(s) found in the previous step, saving time when excluding a
large portion of references.
But when there is a --exclude pattern which is just the empty string,
the behavior is a little funky. When we try and exclude the empty
string, the matched range covers the entire packed-refs file, meaning
that we won't output any packed references. But the empty pattern
doesn't actually match any references to begin with! For example, on my
copy of git.git I can do:
$ git for-each-ref '' | wc -l
0
So "git for-each-ref --exclude=''" shouldn't actually remove anything
from the output, and ought to be equivalent to "git for-each-ref". But
it's not, and in fact:
$ git for-each-ref | wc -l
2229
$ git for-each-ref --exclude='' | wc -l
480
But why does the '--exclude' version output only some of the references
in the repository? Here's a hint:
$ find .git/refs -type f | wc -l
480
Indeed, because the files backend doesn't implement[^1] the same jump
list concept as the packed backend we get the correct result for the
loose references, but none of the packed references.
Since the empty string exclude pattern doesn't match anything, we can
discard them before the packed-refs backend has a chance to even see it
(and likewise for reftable, which also implements a similar concept
since 1869525066 (refs/reftable: wire up support for exclude patterns,
2024-09-16)).
This approach (copying only some of the patterns into a strvec at the
refs.c layer) may seem heavy-handed, but it's setting us up to fix
another bug in the following commit where the fix will involve modifying
the incoming patterns.
[^1]: As noted in 59c35fac54. We technically could avoid opening and
enumerating the contents of, for e.g., "$GIT_DIR/refs/heads/foo/" if
we knew that we were excluding anything under the 'refs/heads/foo'
hierarchy. But the --exclude stuff is all best-effort anyway, since
the caller is expected to cull out any results that they don't want.
Noticed-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The path.[ch] API takes an explicit repository parameter passed
throughout the callchain, instead of relying on the_repository
singleton instance.
* ps/path-sans-the-repository:
path: adjust last remaining users of `the_repository`
environment: move access to "core.sharedRepository" into repo settings
environment: move access to "core.hooksPath" into repo settings
repo-settings: introduce function to clear struct
path: drop `git_path()` in favor of `repo_git_path()`
rerere: let `rerere_path()` write paths into a caller-provided buffer
path: drop `git_common_path()` in favor of `repo_common_path()`
worktree: return allocated string from `get_worktree_git_dir()`
path: drop `git_path_buf()` in favor of `repo_git_path_replace()`
path: drop `git_pathdup()` in favor of `repo_git_path()`
path: drop unused `strbuf_git_path()` function
path: refactor `repo_submodule_path()` family of functions
submodule: refactor `submodule_to_gitdir()` to accept a repo
path: refactor `repo_worktree_path()` family of functions
path: refactor `repo_git_path()` family of functions
path: refactor `repo_common_path()` family of functions
|
|
"git refs migrate" can optionally be told not to migrate the reflog.
* kn/ref-migrate-skip-reflog:
builtin/refs: add '--no-reflog' flag to drop reflogs
|
|
The "git refs migrate" subcommand converts the backend used for ref
storage. It always migrates reflog data as well as refs. Introduce an
option to exclude reflogs from migration, allowing them to be discarded
when they are unnecessary.
This is particularly useful in server-side repositories, where reflogs
are typically not expected. However, some repositories may still have
them due to historical reasons, such as bugs, misconfigurations, or
administrative decisions to enable reflogs for debugging. In such
repositories, it would be optimal to drop reflogs during the migration.
To address this, introduce the '--no-reflog' flag, which prevents reflog
migration. When this flag is used, reflogs from the original reference
backend are migrated. Since only the new reference backend remains in
the repository, all previous reflogs are permanently discarded.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up.
* kn/reflog-migration-fix-followup:
reftable: prevent 'update_index' changes after adding records
refs: use 'uint64_t' for 'ref_update.index'
refs: mark `ref_transaction_update_reflog()` as static
|
|
The `submodule_to_gitdir()` function implicitly uses `the_repository` to
resolve submodule paths. Refactor the function to instead accept a repo
as parameter to remove the dependency on global state.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The functions provided by the "path" subsystem to derive repository
paths for the commondir, gitdir, worktrees and submodules are quite
inconsistent. Some functions have a `strbuf_` prefix, others have
different return values, some don't provide a variant working on top of
`strbuf`s.
We're thus about to refactor all of these family of functions so that
they follow a common pattern:
- `repo_*_path()` returns an allocated string.
- `repo_*_path_append()` appends the path to the caller-provided
buffer while returning a constant pointer to the buffer. This
clarifies whether the buffer is being appended to or rewritten,
which otherwise wasn't immediately obvious.
- `repo_*_path_replace()` replaces contents of the buffer with the
computed path, again returning a pointer to the buffer contents.
The returned constant pointer isn't being used anywhere yet, but it will
be used in subsequent commits. Its intent is to allow calling patterns
like the following somewhat contrived example:
if (!stat(&st, repo_common_path_replace(repo, &buf, ...)) &&
!unlink(repo_common_path_replace(repo, &buf, ...)))
...
Refactor the commondir family of functions accordingly and adapt all
callers.
Note that `repo_common_pathv()` is converted into an internal
implementation detail. It is only used to implement `the_repository`
compatibility shims and will eventually be removed from the public
interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Fix bugs in an earlier attempt to fix "git refs migration".
* kn/reflog-migration-fix-fix:
refs/reftable: fix uninitialized memory access of `max_index`
reftable: write correct max_update_index to header
|
|
The "git refs migrate" command did not migrate the reflog for
refs/stash, which is the contents of the stashes, which has been
corrected.
* ps/reflog-migration-with-logall-fix:
refs: fix migration of reflogs respecting "core.logAllRefUpdates"
|
|
In 246cebe320 (refs: add support for migrating reflogs, 2024-12-16) we
have added support to git-refs(1) to migrate reflogs between reference
backends. It was reported [1] though that not we don't migrate reflogs
for a subset of references, most importantly "refs/stash".
This issue is caused by us still honoring "core.logAllRefUpdates" when
trying to migrate reflogs: we do queue the updates, but depending on the
value of that config we may decide to just skip writing the reflog entry
altogether. And given that:
- The default for "core.logAllRefUpdates" is to only create reflogs
for branches, remotes, note refs and "HEAD"
- "refs/stash" is neither of these ref types.
We end up skipping the reflog creation for that particular reference.
Fix the bug by setting `REF_FORCE_CREATE_REFLOG`, which instructs the
ref backends to create the reflog entry regardless of the config or any
preexisting state.
[1]: <Z5BTQRlsOj1sygun@tapette.crustytoothpaste.net>
Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The 'ref_update.index' variable is used to store an index for a given
reference update. This index is used to order the updates in a
predetermined order, while the default ordering is alphabetical as per
the refname.
For large repositories with millions of references, it should be safer
to use 'uint64_t'. Let's do that. This also is applied for all other
code sections where we store 'index' and pass it around.
Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `ref_transaction_update_reflog()` function is only used within
'refs.c', so mark it as static.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* kn/reflog-migration-fix:
reftable: write correct max_update_index to header
|
|
In 297c09eabb (refs: allow multiple reflog entries for the same refname,
2024-12-16), the reftable backend learned to handle multiple reflog
entries within the same transaction. This was done modifying the
`update_index` for reflogs with multiple indices. During writing the
logs, the `max_update_index` of the writer was modified to ensure the
limits were raised to the modified `update_index`s.
However, since ref entries are written before the modification to the
`max_update_index`, if there are multiple blocks to be written, the
reftable backend writes the header with the old `max_update_index`. When
all logs are finally written, the footer will be written with the new
`min_update_index`. This causes a mismatch between the header and the
footer and causes the reftable file to be corrupted. The existing tests
only spawn a single block and since headers are lazily written with the
first block, the tests didn't capture this bug.
To fix the issue, the appropriate `max_update_index` limit must be set
even before the first block is written. Add a `max_index` field to the
transaction which holds the `max_index` within all its updates, then
propagate this value to the reftable backend, wherein this is used to
the set the `max_update_index` correctly.
Add a test which creates a few thousand reference updates with multiple
reflog entries, which should trigger the bug.
Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git refs migrate" learned to also migrate the reflog data across
backends.
* kn/reflog-migration:
refs: mark invalid refname message for translation
refs: add support for migrating reflogs
refs: allow multiple reflog entries for the same refname
refs: introduce the `ref_transaction_update_reflog` function
refs: add `committer_info` to `ref_transaction_add_update()`
refs: extract out refname verification in transactions
refs/files: add count field to ref_lock
refs: add `index` field to `struct ref_udpate`
refs: include committer info in `ref_update` struct
|
|
Start working to make the codebase buildable with -Wsign-compare.
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
|
|
The error message produced by `transaction_refname_valid()` changes based
on whether the update is a ref update or a reflog update, with the use
of a ternary operator. This breaks translation since the sub-msg is not
marked for translation. Fix this by setting the entire message using a
`if {} else {}` block and marking each message for translation.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When "git fetch $remote" notices that refs/remotes/$remote/HEAD is
missing and discovers what branch the other side points with its
HEAD, refs/remotes/$remote/HEAD is updated to point to it.
* bf/set-head-symref:
fetch set_head: handle mirrored bare repositories
fetch: set remote/HEAD if it does not exist
refs: add create_only option to refs_update_symref_extended
refs: add TRANSACTION_CREATE_EXISTS error
remote set-head: better output for --auto
remote set-head: refactor for readability
refs: atomically record overwritten ref in update_symref
refs: standardize output of refs_read_symbolic_ref
t/t5505-remote: test failure of set-head
t/t5505-remote: set default branch to main
|
|
The `git refs migrate` command was introduced in
25a0023f28 (builtin/refs: new command to migrate ref storage formats,
2024-06-06) to support migrating from one reference backend to another.
One limitation of the command was that it didn't support migrating
repositories which contained reflogs. A previous commit, added support
for adding reflog updates in ref transactions. Using the added
functionality bake in reflog support for `git refs migrate`.
To ensure that the order of the reflogs is maintained during the
migration, we add the index for each reflog update as we iterate over
the reflogs from the old reference backend. This is to ensure that the
order is maintained in the new backend.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce a new function `ref_transaction_update_reflog`, for clients to
add a reflog update to a transaction. While the existing function
`ref_transaction_update` also allows clients to add a reflog entry, this
function does a few things more, It:
- Enforces that only a reflog entry is added and does not update the
ref itself.
- Allows the users to also provide the committer information. This
means clients can add reflog entries with custom committer
information.
The `transaction_refname_valid()` function also modifies the error
message selectively based on the type of the update. This change also
affects reflog updates which go through `ref_transaction_update()`.
A follow up commit will utilize this function to add reflog support to
`git refs migrate`.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `ref_transaction_add_update()` creates the `ref_update` struct. To
facilitate addition of reflogs in the next commit, the function needs to
accommodate setting the `committer_info` field in the struct. So modify
the function to also take `committer_info` as an argument and set it
accordingly.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Unless the `REF_SKIP_REFNAME_VERIFICATION` flag is set for an update,
the refname of the update is verified for:
- Ensuring it is not a pseudoref.
- Checking the refname format.
These checks will also be needed in a following commit where the
function to add reflog updates to the transaction is introduced. Extract
the code out into a new static function.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reference backends obtain the committer information from
`git_committer_info(0)` when adding a reflog. The upcoming patches
introduce support for migrating reflogs between the reference backends.
This requires an interface to creating reflogs, including custom
committer information.
Add a new field `committer_info` to the `ref_update` struct, which is
then used by the reference backends. If there is no `committer_info`
provided, the reference backends default to using
`git_committer_info(0)`. The field itself cannot be set to
`git_committer_info(0)` since the values are dynamic and must be
obtained right when the reflog is being committed.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git tag" has been taught to refuse to create refs/tags/HEAD
as such a tag will be confusing in the context of UI provided by
the Git Porcelain commands.
* jc/forbid-head-as-tagname:
tag: "git tag" refuses to use HEAD as a tagname
t5604: do not expect that HEAD can be a valid tagname
refs: drop strbuf_ prefix from helpers
refs: move ref name helpers around
|
|
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git fsck" learned to issue warnings on "curiously formatted" ref
contents that have always been taken valid but something Git
wouldn't have written itself (e.g., missing terminating end-of-line
after the full object name).
* sj/ref-contents-check:
ref: add symlink ref content check for files backend
ref: check whether the target of the symref is a ref
ref: add basic symref content check for files backend
ref: add more strict checks for regular refs
ref: port git-fsck(1) regular refs check for files backend
ref: support multiple worktrees check for refs
ref: initialize ref name outside of check functions
ref: check the full refname instead of basename
ref: initialize "fsck_ref_report" with zero
|
|
The migration procedure between two ref backends has been optimized.
* ps/ref-backend-migration-optim:
reftable: rename scratch buffer
refs: adapt `initial_transaction` flag to be unsigned
reftable/block: optimize allocations by using scratch buffer
reftable/block: rename `block_writer::buf` variable
reftable/writer: optimize allocations by using a scratch buffer
refs: don't normalize log messages with `REF_SKIP_CREATE_REFLOG`
refs: skip collision checks in initial transactions
refs: use "initial" transaction semantics to migrate refs
refs/files: support symbolic and root refs in initial transaction
refs: introduce "initial" transaction flag
refs/files: move logic to commit initial transaction
refs: allow passing flags when setting up a transaction
|
|
Even though the plumbing level allows you to create refs/tags/HEAD
and refs/heads/HEAD, doing so makes it confusing within the context
of the UI Git Porcelain commands provides. Just like we prevent a
branch from getting called "HEAD" at the Porcelain layer (i.e. "git
branch" command), teach "git tag" to refuse to create a tag "HEAD".
With a few new tests, we make sure that
- "git tag HEAD" and "git tag -a HEAD" are rejected
- "git update-ref refs/tags/HEAD" is still allowed (this is a
deliberate design decision to allow others to create their own UI
on top of Git infrastructure that may be different from our UI).
- "git tag -d HEAD" can remove refs/tags/HEAD to recover from an
mistake.
Helped-by: Jeff King <peff@peff.net>
Helped-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The helper functions (strbuf_branchname, strbuf_check_branch_ref,
and strbuf_check_tag_ref) are about handling branch and tag names,
and it is a non-essential fact that these functions use strbuf to
hold these names. Rename them to make it clarify that these are
more about "ref".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
strbuf_branchname(), strbuf_check_{branch,tag}_ref() are helper
functions to deal with branch and tag names, and the fact that they
happen to use strbuf to hold the name of a branch or a tag is not
essential. These functions fit better in the refs API than strbuf
API, the latter of which is about string manipulations.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `initial_transaction` flag is tracked as a signed integer, but we
typically pass around flags via unsigned integers. Adapt the type
accordingly.
Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Allow the caller to specify that it only wants to update the symref if
it does not already exist. Silently ignore the error from the
transaction API if the symref already exists.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When updating a symref with update_symref it's currently not possible to
know for sure what was the previous value that was overwritten. Extend
refs_update_symref under a new function name, to record the value after
the ref has been locked if the caller of refs_update_symref_extended
requests it via a new variable in the function call. Make the return
value of the function notify the caller, if the previous value was
actually not a symbolic reference. Keep the original refs_update_symref
function with the same signature, but now as a wrapper around
refs_update_symref_extended.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We have already used "parse_loose_ref_contents" function to check
whether the ref content is valid in files backend. However, by
using "parse_loose_ref_contents", we allow the ref's content to end with
garbage or without a newline.
Even though we never create such loose refs ourselves, we have accepted
such loose refs. So, it is entirely possible that some third-party tools
may rely on such loose refs being valid. We should not report an error
fsck message at current. We should notify the users about such
"curiously formatted" loose refs so that adequate care is taken before
we decide to tighten the rules in the future.
And it's not suitable either to report a warn fsck message to the user.
We don't yet want the "--strict" flag that controls this bit to end up
generating errors for such weirdly-formatted reference contents, as we
first want to assess whether this retroactive tightening will cause
issues for any tools out there. It may cause compatibility issues which
may break the repository. So, we add the following two fsck infos to
represent the situation where the ref content ends without newline or
has trailing garbages:
1. refMissingNewline(INFO): A loose ref that does not end with
newline(LF).
2. trailingRefContent(INFO): A loose ref has trailing content.
It might appear that we can't provide the user with any warnings by
using FSCK_INFO. However, in "fsck.c::fsck_vreport", we will convert
FSCK_INFO to FSCK_WARN and we can still warn the user about these
situations when using "git refs verify" without introducing
compatibility issues.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We have already set up the infrastructure to check the consistency for
refs, but we do not support multiple worktrees. However, "git-fsck(1)"
will check the refs of worktrees. As we decide to get feature parity
with "git-fsck(1)", we need to set up support for multiple worktrees.
Because each worktree has its own specific refs, instead of just showing
the users "refs/worktree/foo", we need to display the full name such as
"worktrees/<id>/refs/worktree/foo". So we should know the id of the
worktree to get the full name. Add a new parameter "struct worktree *"
for "refs-internal.h::fsck_fn". Then change the related functions to
follow this new interface.
The "packed-refs" only exists in the main worktree, so we should only
check "packed-refs" in the main worktree. Use "is_main_worktree" method
to skip checking "packed-refs" in "packed_fsck" function.
Then, enhance the "files-backend.c::files_fsck_refs_dir" function to add
"worktree/<id>/" prefix when we are not in the main worktree.
Last, add a new test to check the refname when there are multiple
worktrees to exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the `REF_SKIP_CREATE_REFLOG` flag is set we skip the creation of
the reflog entry, but we still normalize the reflog message when we
queue the update. This is a waste of resources as the normalized message
will never get used in the first place.
Fix this issue by skipping the normalization in case the flag is set.
This leads to a surprisingly large speedup when migrating from the
"files" to the "reftable" backend:
Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~)
Time (mean ± σ): 878.5 ms ± 14.9 ms [User: 726.5 ms, System: 139.2 ms]
Range (min … max): 858.4 ms … 941.3 ms 50 runs
Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD)
Time (mean ± σ): 831.1 ms ± 10.5 ms [User: 694.1 ms, System: 126.3 ms]
Range (min … max): 812.4 ms … 851.4 ms 50 runs
Summary
migrate files:reftable (refcount = 1000000, revision = HEAD) ran
1.06 ± 0.02 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~)
And an ever larger speedup when migrating the other way round:
Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~)
Time (mean ± σ): 923.6 ms ± 11.6 ms [User: 705.5 ms, System: 208.1 ms]
Range (min … max): 905.3 ms … 946.5 ms 50 runs
Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD)
Time (mean ± σ): 818.5 ms ± 9.0 ms [User: 627.6 ms, System: 180.6 ms]
Range (min … max): 802.2 ms … 842.9 ms 50 runs
Summary
migrate reftable:files (refcount = 1000000, revision = HEAD) ran
1.13 ± 0.02 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Reference transactions use `refs_verify_refname_available()` to check
for colliding references. This check consists of two parts:
- Checks for whether multiple ref updates in the same transaction
conflict with each other.
- Checks for whether existing refs conflict with any refs part of the
transaction.
While we generally cannot avoid the first check, the second check is
superfluous in cases where the transaction is an initial one in an
otherwise empty ref store. The check results in multiple ref reads as
well as the creation of a ref iterator for every ref we're checking,
which adds up quite fast when performing the check for many refs.
Introduce a new flag that allows us to skip this check and wire it up in
such that the backends pass it when running an initial transaction. This
leads to significant speedups when migrating ref storage backends. From
"files" to "reftable":
Benchmark 1: migrate files:reftable (refcount = 100000, revision = HEAD~)
Time (mean ± σ): 472.4 ms ± 6.7 ms [User: 175.9 ms, System: 285.2 ms]
Range (min … max): 463.5 ms … 483.2 ms 10 runs
Benchmark 2: migrate files:reftable (refcount = 100000, revision = HEAD)
Time (mean ± σ): 86.1 ms ± 1.9 ms [User: 67.9 ms, System: 16.0 ms]
Range (min … max): 82.9 ms … 90.9 ms 29 runs
Summary
migrate files:reftable (refcount = 100000, revision = HEAD) ran
5.48 ± 0.15 times faster than migrate files:reftable (refcount = 100000, revision = HEAD~)
And from "reftable" to "files":
Benchmark 1: migrate reftable:files (refcount = 100000, revision = HEAD~)
Time (mean ± σ): 452.7 ms ± 3.4 ms [User: 209.9 ms, System: 235.4 ms]
Range (min … max): 445.9 ms … 457.5 ms 10 runs
Benchmark 2: migrate reftable:files (refcount = 100000, revision = HEAD)
Time (mean ± σ): 95.2 ms ± 2.2 ms [User: 73.6 ms, System: 20.6 ms]
Range (min … max): 91.7 ms … 100.8 ms 28 runs
Summary
migrate reftable:files (refcount = 100000, revision = HEAD) ran
4.76 ± 0.11 times faster than migrate reftable:files (refcount = 100000, revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Until now, we couldn't use "initial" transaction semantics to migrate
refs because the "files" backend only supported writing regular refs via
the initial transaction because it simply mapped the transaction to a
"packed-refs" transaction. But with the preceding commit, the "files"
backend has learned to also write symbolic and root refs in the initial
transaction by creating a second transaction for all refs that need to
be written as loose refs.
Adapt the code to migrate refs to commit the transaction as an initial
transaction. This results in a signiticant speedup when migrating many
refs:
Benchmark 1: migrate reftable:files (refcount = 100000, revision = HEAD~)
Time (mean ± σ): 3.247 s ± 0.034 s [User: 0.485 s, System: 2.722 s]
Range (min … max): 3.216 s … 3.309 s 10 runs
Benchmark 2: migrate reftable:files (refcount = 100000, revision = HEAD)
Time (mean ± σ): 453.6 ms ± 1.9 ms [User: 214.6 ms, System: 230.5 ms]
Range (min … max): 451.5 ms … 456.4 ms 10 runs
Summary
migrate reftable:files (refcount = 100000, revision = HEAD) ran
7.16 ± 0.08 times faster than migrate reftable:files (refcount = 100000, revision = HEAD~)
As the reftable backend doesn't (yet) special-case initial transactions
there is no comparable speedup for that backend.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are two different ways to commit a transaction:
- `ref_transaction_commit()` can be used to commit a regular
transaction and is what almost every caller wants.
- `initial_ref_transaction_commit()` can be used when it is known that
the ref store that the transaction is committed for is empty and
when there are no concurrent processes. This is used when cloning a
new repository.
Implementing this via two separate functions has a couple of downsides.
First, every reference backend needs to implement a separate callback
even in the case where they don't special-case the initial transaction.
Second, backends are basically forced to reimplement the whole logic for
how to commit the transaction like the "files" backend does, even though
backends may wish to only tweak certain behaviour of a "normal" commit.
Third, it is awkward that callers must never prepare the transaction as
this is somewhat different than how a transaction typically works.
Refactor the code such that we instead mark initial transactions via a
separate flag when starting the transaction. This addresses all of the
mentioned painpoints, where the most important part is that it will
allow backends to have way more leeway in how exactly they want to
handle the initial transaction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Allow passing flags when setting up a transaction such that the
behaviour of the transaction itself can be altered. This functionality
will be used in a subsequent patch.
Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reference-transaction hook is invoked whenever there is a reference
update being performed. For each state of the transaction, we iterate
over the updates present and pass this information to the hook.
The `ref_update` structure is used to hold these updates within a
`transaction`. We use the same structure for holding reflog updates too.
Which means that the reference transaction hook is also obtaining
information about a reflog update. This is a bug, since:
- The hook is designed to work with reference updates and reflogs
updates are different.
- The hook doesn't have the required information to distinguish
reference updates from reflog updates.
This is particularly evident when the default branch (pointed by HEAD)
is updated, we see that the hook also receives information about HEAD
being changed. In reality, we only add a reflog update for HEAD, while
HEAD's values remains the same.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|