| Age | Commit message (Collapse) | Author | Files | Lines |
|
The "string-list" API function to find where a given string would
be inserted got updated so that it can use unrealistically huge
array index that would only fit in size_t but not int or ssize_t
to achieve unstated goal.
* sj/string-list:
refs: enable sign compare warnings check
string-list: change "string_list_find_insert_index" return type to "size_t"
string-list: replace negative index encoding with "exact_match" parameter
string-list: use bool instead of int for "exact_match"
|
|
The reftable backend learned to sanity check its on-disk data more
carefully.
* kn/reftable-consistency-checks:
refs/reftable: add fsck check for checking the table name
reftable: add code to facilitate consistency checks
fsck: order 'fsck_msg_type' alphabetically
Documentation/fsck-msgids: remove duplicate msg id
reftable: check for trailing newline in 'tables.list'
refs: move consistency check msg to generic layer
refs: remove unused headers
|
|
The files-backend prints a message before the consistency checks run.
Move this to the generic layer so both the files and reftable backend
can benefit from this message.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
After fixing the tricky compare warning introduced by calling
"string_list_find_insert_index", there are only two loop iterator type
mismatches. Fix them to enable compare warnings check.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As "string_list_find_insert_index" is a simple wrapper of
"get_entry_index" and the return type of "get_entry_index" is already
"size_t", we could simply change its return type to "size_t".
Update all callers to use size_t variables for storing the return value.
The tricky fix is the loop condition in "mailmap.c" to properly handle
"size_t" underflow by changing from `0 <= --i` to `i--`.
Remove "DISABLE_SIGN_COMPARE_WARNINGS" from "mailmap.c" as it's no
longer needed with the proper unsigned types.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "string_list_find_insert_index()" function is used to determine
the correct insertion index for a new string within the string list.
The function also doubles up to convey if the string is already
existing in the list, this is done by returning a negative index
"-1 -index". Users are expected to decode this information. This
approach has several limitations:
1. It requires the callers to look into the detail of the function to
understand how to decode the negative index encoding.
2. Using int for indices can cause overflow issues when dealing with
large string lists.
To address these limitations, change the function to return size_t for
the index value and use a separate bool parameter to indicate whether
the index refers to an existing entry or an insertion point.
In some cases, the callers of "string_list_find_insert_index" only need
the index position and don't care whether an exact match is found.
However, "get_entry_index" currently requires a non-NULL "exact_match"
parameter, forcing these callers to declare unnecessary variables.
Let's allow callers to pass NULL for the "exact_match" parameter when
they don't need this information, reducing unnecessary variable
declarations in calling code.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git refs optimize" is added for not very well explained reason
despite it does the same thing as "git pack-refs"...
* ms/refs-optimize:
t: add test for git refs optimize subcommand
t0601: refactor tests to be shareable
builtin/refs: add optimize subcommand
doc: pack-refs: factor out common options
builtin/pack-refs: factor out core logic into a shared library
builtin/pack-refs: convert to use the generic refs_optimize() API
reftable-backend: implement 'optimize' action
files-backend: implement 'optimize' action
refs: add a generic 'optimize' API
|
|
Deal more gracefully with directory / file conflicts when the files
backend is used for ref storage, by failing only the ones that are
involved in the conflict while allowing others.
* kn/refs-files-case-insensitive:
refs/files: handle D/F conflicts during locking
refs/files: handle F/D conflicts in case-insensitive FS
refs/files: use correct error type when lock exists
refs/files: catch conflicts on case-insensitive file-systems
|
|
The existing `pack-refs` API is conceptually tied to the 'files'
backend, but its behavior is generic (e.g., it triggers compaction for
reftable). This naming is confusing.
Introduce a new generic refs_optimize() API that dispatches to a
backend-specific implementation via a new 'optimize' vtable method.
This lays the architectural groundwork for different reference backends
(like 'files' and 'reftable') to provide their own storage optimization
logic, which will be called from a single, generic entry point.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It is likely that those who came to Git after 3.0 switched the
default initial branch name to 'main' would still try to follow
tutorials that were written before 3.0 happened and with the
assumption that the tool would call the initial branch 'master'.
To help these new users after 3.0 boundary, let's retain one part of
the hint we will be giving before the default changes, namely, how
to rename the branch an unconfigured Git has created just once.
We do this without telling them how to permanently configure the
default name of the initial branch, and that design choice is very
much deliberate. The whole point of switching the default name was
because we did not want to force individual users to configure their
default branch name but while the hard wired default was 'master',
they _had_ to configure it away from 'master' in order to conform to
the recent norm, and a hint that tells them how to do so is useful.
But once the default is renamed to 'main', that no longer is true.
A narrower audience who are new users that follow an instruction
that assumes the initial branch name is 'master' would only need to
learn "here is how to change the branch name to match the tutorial
you are following in the repository you created for practice", and
"here is how you keep creating repositories with the first branch
with a name everybody hates" is unnecessary.
It also needs to be noted that the advise token to squelch the
message is the same advice.defaultBranchName as before, which is
also very much deliberate. The users who do have that configured
are those who _have_ been using Git since before 3.0, and they are
not the target audience for the new advice message. Reusing the
same advise token ensures that they do not have to turn the message
off.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The previous commit added the necessary validation and checks for F/D
conflicts in the files backend when working on case insensitive systems.
There is still a possibility for D/F conflicts. This is a different from
the F/D since for F/D conflicts, there would not be a conflict during
the lock creation phase:
refs/heads/foo.lock
refs/heads/foo/bar.lock
However there would be a conflict when the locks are committed, since we
cannot have 'refs/heads/foo/bar' and 'refs/heads/foo'. These kinds of
conflicts are checked and resolved in
`refs_verify_refnames_available()`, so the previous commit ensured that
for case-insensitive filesystems, we would lowercase the inputs to that
function.
For D/F conflicts, there is a conflict during the lock creation phase
itself:
refs/heads/foo/bar.lock
refs/heads/foo.lock
As in `lock_raw_ref()` after creating the lock, we also check for D/F
conflicts. This can occur in case-insensitive filesystems when trying to
fetch case-conflicted references like:
refs/heads/Foo/new
refs/heads/foo
D/F conflicts can also occur in case-sensitive filesystems, when the
repository already contains a directory with a lock file
'refs/heads/foo/bar.lock' and trying to fetch 'refs/heads/foo'. This
doesn't concern directories containing garbage files as those are
handled on a higher level.
To fix this, simply categorize the error as a name conflict. Also remove
this reference from the list of valid refnames for availability checks.
By categorizing the error and removing it from the list of valid
references, batched updates now knows to reject such reference updates
and apply the other reference updates.
Fix a small typo in `ref_transaction_maybe_set_rejected()` while here.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
During the 'prepare' phase of a reference transaction in the files
backend, we create the lock files for references to be created. When
using batched updates on case-insensitive filesystems, the entire
batched updates would be aborted if there are conflicting names such as:
refs/heads/Foo
refs/heads/foo
This affects all commands which were migrated to use batched updates in
Git 2.51, including 'git-fetch(1)' and 'git-receive-pack(1)'. Before
that, reference updates would be applied serially with one transaction
used per update. When users fetched multiple references on
case-insensitive systems, subsequent references would simply overwrite
any earlier references. So when fetching:
refs/heads/foo: 5f34ec0bfeac225b1c854340257a65b106f70ea6
refs/heads/Foo: ec3053b0977e83d9b67fc32c4527a117953994f3
refs/heads/sample: 2eefd1150e06d8fca1ddfa684dec016f36bf4e56
The user would simply end up with:
refs/heads/foo: ec3053b0977e83d9b67fc32c4527a117953994f3
refs/heads/sample: 2eefd1150e06d8fca1ddfa684dec016f36bf4e56
This is buggy behavior since the user is never informed about the
overrides performed and missing references. Nevertheless, the user is
left with a working repository with a subset of the references. Since
Git 2.51, in such situations fetches would simply fail without updating
any references. Which is also buggy behavior and worse off since the
user is left without any references.
The error is triggered in `lock_raw_ref()` where the files backend
attempts to create a lock file. When a lock file already exists the
function returns a 'REF_TRANSACTION_ERROR_GENERIC'. When this happens,
the entire batched updates, not individual operation, is aborted as if
it were in a transaction.
Change this to return 'REF_TRANSACTION_ERROR_CASE_CONFLICT' instead to
aid the batched update mechanism to simply reject such errors. The
change only affects batched updates since batched updates will reject
individual updates with non-generic errors. So specifically this would
only affect:
1. git fetch
2. git receive-pack
3. git update-ref --batch-updates
This bubbles the error type up to `files_transaction_prepare()` which
tries to lock each reference update. So if the locking fails, we check
if the rejection type can be ignored, which is done by calling
`ref_transaction_maybe_set_rejected()`.
As the error type is now 'REF_TRANSACTION_ERROR_CASE_CONFLICT',
the specific reference update would simply be rejected, while other
updates in the transaction would continue to be applied. This allows
partial application of references in case-insensitive filesystems when
fetching colliding references.
While the earlier implementation allowed the last reference to be
applied overriding the initial references, this change would allow the
first reference to be applied while rejecting consequent collisions.
This should be an okay compromise since with the files backend, there is
no scenario possible where we would retain all colliding references.
Let's also be more proactive and notify users on case-insensitive
filesystems about such problems by providing a brief about the issue
while also recommending using the reftable backend, which doesn't have
the same issue.
Reported-by: Joe Drew <joe.drew@indexexchange.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since 1296cbe4b46 (init: document `init.defaultBranch` better,
2020-12-11) "git-init.adoc" has advertised that the default name
of the initial branch may change in the future. The name "main"
is chosen to match the default used by the big Git forge web sites.
The advice printed when init.defaultBranch is not set is updated
to say that the default will change to "main" in Git 3.0. Building
with WITH_BREAKING_CHANGES enabled removes the advice and changes
the default branch name to "main". The code in guess_remote_head()
that looks for "refs/heads/master" is left unchanged as that is only
called when the remote server does not support the symref capability
in the v0 protocol or the symref extension to the ls-refs list in the
v2 protocol. Such an old server is more likely to be using "master"
as the default branch name.
With the exception of the "git-init.adoc" the documentation is left
unchanged. I had hoped to parameterize the name of the default branch
by using an asciidoc attribute. Unfortunately attribute expansion
is inhibited by backticks and we use backticks to mark up ref names
so that idea does not work. As the changes to git-init.adoc show
inserting ifdef's around each instance of the branch name "master"
is cumbersome and makes the documentation sources harder to read.
Apart from "git-init.adoc" there are some other files where "master" is
used as the name of the initial branch rather than as an example of a
branch name such as "user-manual.adoc" and "gitcore-tutorial.adoc". The
name appears a lot in those so updating it with ifdef's is not really
practical. We can update that document in the 3.0 release cycle. The
other documentation where master is used as an example branch name
can be gradually converted over time.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git remote rename origin upstream" failed to move origin/HEAD to
upstream/HEAD when origin/HEAD is unborn and performed other
renames extremely inefficiently, which has been corrected.
* ps/remote-rename-fix:
builtin/remote: only iterate through refs that are to be renamed
builtin/remote: rework how remote refs get renamed
builtin/remote: determine whether refs need renaming early on
builtin/remote: fix sign comparison warnings
refs: simplify logic when migrating reflog entries
refs: pass refname when invoking reflog entry callback
|
|
"git refs migrate" to migrate the reflog entries from a refs
backend to another had a handful of bugs squashed.
* ps/reflog-migrate-fixes:
refs: fix invalid old object IDs when migrating reflogs
refs: stop unsetting REF_HAVE_OLD for log-only updates
refs/files: detect race when generating reflog entry for HEAD
refs: fix identity for migrated reflogs
ident: fix type of string length parameter
builtin/reflog: implement subcommand to write new entries
refs: export `ref_transaction_update_reflog()`
builtin/reflog: improve grouping of subcommands
Documentation/git-reflog: convert to use synopsis type
|
|
When renaming a remote we also need to rename all references
accordingly. But while we only need to rename references that are
contained in the "refs/remotes/$OLDNAME/" namespace, we end up using
`refs_for_each_rawref()` that iterates through _all_ references. We know
to exit early in the callback in case we see an irrelevant reference,
but ultimately this is still a waste of compute as we knowingly iterate
through references that we won't ever care about.
Improve this by using `refs_for_each_rawref_in()`, which knows to only
iterate through (potentially broken) references in a given prefix.
The following benchmark renames a remote with a single reference in a
repository that has 100k unrelated references. This shows a sizeable
improvement with the "files" backend:
Benchmark 1: rename remote (refformat = files, revision = HEAD~)
Time (mean ± σ): 42.6 ms ± 0.9 ms [User: 29.1 ms, System: 8.4 ms]
Range (min … max): 40.1 ms … 43.3 ms 10 runs
Benchmark 2: rename remote (refformat = files, revision = HEAD)
Time (mean ± σ): 31.7 ms ± 4.0 ms [User: 19.6 ms, System: 6.9 ms]
Range (min … max): 27.1 ms … 36.0 ms 10 runs
Summary
rename remote (refformat = files, revision = HEAD) ran
1.35 ± 0.17 times faster than rename remote (refformat = files, revision = HEAD~)
The "reftable" backend shows roughly the same absolute improvement, but
given that it's already significantly faster than the "files" backend
this translates to a much larger relative improvement:
Benchmark 1: rename remote (refformat = reftable, revision = HEAD~)
Time (mean ± σ): 18.2 ms ± 0.5 ms [User: 12.7 ms, System: 3.0 ms]
Range (min … max): 17.3 ms … 21.4 ms 110 runs
Benchmark 2: rename remote (refformat = reftable, revision = HEAD)
Time (mean ± σ): 8.8 ms ± 0.5 ms [User: 3.8 ms, System: 2.9 ms]
Range (min … max): 7.5 ms … 9.9 ms 167 runs
Summary
rename remote (refformat = reftable, revision = HEAD) ran
2.07 ± 0.12 times faster than rename remote (refformat = reftable, revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When migrating reflog entries between two storage formats we have to do
so via two callback-driven functions:
- `migrate_one_reflog()` gets invoked via `refs_for_each_reflog()` to
first list all available reflogs.
- `migrate_one_reflog_entry()` gets invoked via
`refs_for_each_reflog_ent()` in `migrate_one_reflog()`.
Before the preceding commit we didn't have the refname available in
`migrate_one_reflog_entry()`, which made it necessary to have a separate
structure that we pass to the second callback so that we can propagate
the refname. Now that `refs_for_each_reflog_ent()` knows to pass the
refname to the callback though that indirection isn't necessary anymore.
There's one catch though: we do have an update index that is also stored
in the entry-specific callback data. This update index is required so
that we can tell the ref backend in which order it should persist the
reflog entries to disk.
But that purpose can be trivially achieved by just converting it into a
global counter that is used for all reflog entries, regardless of which
reference they are for. The ordering will remain the same as both the
update index and the refname is considered when sorting the entries.
Move the index into `struct migration_data` and drop the now-unused
`struct reflog_migration_data` to simplify the code a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
With `refs_for_each_reflog_ent()` callers can iterate through all the
reflog entries for a given reference. The callback that is being invoked
for each such entry does not receive the name of the reference that we
are currently iterating through. This isn't really a limiting factor, as
callers can simply pass the name via the callback data.
But this layout sometimes does make for a bit of an awkward calling
pattern. One example: when iterating through all reflogs, and for each
reflog we iterate through all refnames, we have to do some extra book
keeping to track which reference name we are currently yielding reflog
entries for.
Change the signature of the callback function so that the reference name
of the reflog gets passed through to it. Adapt callers accordingly and
start using the new parameter in trivial cases. The next commit will
refactor the reference migration logic to make use of this parameter so
that we can simplify its logic a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ps/reflog-migrate-fixes:
refs: fix invalid old object IDs when migrating reflogs
refs: stop unsetting REF_HAVE_OLD for log-only updates
refs/files: detect race when generating reflog entry for HEAD
refs: fix identity for migrated reflogs
ident: fix type of string length parameter
builtin/reflog: implement subcommand to write new entries
refs: export `ref_transaction_update_reflog()`
builtin/reflog: improve grouping of subcommands
Documentation/git-reflog: convert to use synopsis type
|
|
When migrating reflog entries between different storage formats we end
up with invalid old object IDs for the migrated entries: instead of
writing the old object ID of the to-be-migrated entry, we end up with
the all-zeroes object ID.
The root cause of this issue is that we don't know to use the old object
ID provided by the caller. Instead, we manually resolve the old object
ID by resolving the current value of its matching reference. But as that
reference does not yet exist in the target ref storage we always end up
resolving it to all-zeroes.
This issue got unnoticed as there is no user-facing command that would
even show the old object ID. While `git log -g` knows to show the new
object ID, we don't have any formatting directive to show the old object
ID.
Fix the bug by introducing a new flag `REF_LOG_USE_PROVIDED_OIDS`. If
set, backends are instructed to use the old and new object IDs provided
by the caller, without doing any manual resolving. Set this flag in
`ref_transaction_update_reflog()`.
Amend our tests in t1460-refs-migrate to use our test tool to read
reflog entries. This test tool prints out both old and new object ID of
each reflog entry, which fixes the test gap. Furthermore it also prints
the full identity used to write the reflog, which provides test coverage
for the previous commit in this patch series that fixed the identity for
migrated reflogs.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `REF_HAVE_OLD` flag indicates whether a given ref update has its old
object ID set. If so, the value of that field is used to verify whether
the current state of the reference matches this expected state. It is
thus an important part of mitigating races with a concurrent process
that updates the same set of references.
When writing reflogs though we explicitly unset that flag. This is a
sensible thing to do: the old state of reflog entry updates may not
necessarily match the current on-disk state of its accompanying ref, but
it's only intended to signal what old object ID we want to write into
the new reflog entry. For example when migrating refs we end up writing
many reflog entries for a single reference, and most likely those reflog
entries will have many different old object IDs.
But unsetting this flag also removes a useful signal, namely that the
caller _did_ provide an old object ID for a given reflog entry. This
signal will become useful in a subsequent commit, where we add a new
flag that tells the transaction to use the provided old and new object
IDs to write a reflog entry. The `REF_HAVE_OLD` flag is then used as a
signal to verify that the caller really did provide an old object ID.
Stop unsetting the flag so that we can use it as this described signal
in a subsequent commit. Skip checking the old object ID for log-only
updates so that we don't expect it to match the current on-disk state.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When migrating reflog entries between different storage formats we must
reconstruct the identity of reflog entries. This is done by passing the
committer passed to the `migrate_one_reflog_entry()` callback function
to `fmt_ident()`.
This results in an invalid identity though: `fmt_ident()` expects the
caller to provide both name and mail of the author, but we pass the full
identity as mail. This leads to an identity like:
pks <Patrick Steinhardt ps@pks.im>
Fix the bug by splitting the identity line first. This allows us to
extract both the name and mail so that we can pass them to `fmt_ident()`
separately.
This commit does not yet add any tests as there is another bug in the
reflog migration that will be fixed in a subsequent commit. Once that
bug is fixed we'll make the reflog verification in t1450 stricter, and
that will catch both this bug here and the other bug.
Note that we also add two new `name` and `mail` string buffers to the
callback structures and splice them through to the callbacks. This is
done so that we can avoid allocating a new buffer every time we compute
the committer information.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a subsequent commit we'll add another user that wants to write reflog
entries. This requires them to call `ref_transaction_update_reflog()`,
but that function is local to "refs.c".
Export the function to prepare for the change. While at it, drop the
`flags` field, as all callers are for now expected to use the same flags
anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The config API had a set of convenience wrapper functions that
implicitly use the_repository instance; they have been removed and
inlined at the calling sites.
* ps/config-wo-the-repository: (21 commits)
config: fix sign comparison warnings
config: move Git config parsing into "environment.c"
config: remove unused `the_repository` wrappers
config: drop `git_config_set_multivar()` wrapper
config: drop `git_config_get_multivar_gently()` wrapper
config: drop `git_config_set_multivar_in_file_gently()` wrapper
config: drop `git_config_set_in_file_gently()` wrapper
config: drop `git_config_set()` wrapper
config: drop `git_config_set_gently()` wrapper
config: drop `git_config_set_in_file()` wrapper
config: drop `git_config_get_bool()` wrapper
config: drop `git_config_get_ulong()` wrapper
config: drop `git_config_get_int()` wrapper
config: drop `git_config_get_string()` wrapper
config: drop `git_config_get_string()` wrapper
config: drop `git_config_get_string_multi()` wrapper
config: drop `git_config_get_value()` wrapper
config: drop `git_config_get_value()` wrapper
config: drop `git_config_get()` wrapper
config: drop `git_config_clear()` wrapper
...
|
|
"git for-each-ref" learns "--start-after" option to help
applications that want to page its output.
* kn/for-each-ref-skip:
ref-cache: set prefix_state when seeking
for-each-ref: introduce a '--start-after' option
ref-filter: remove unnecessary else clause
refs: selectively set prefix in the seek functions
ref-cache: remove unused function 'find_ref_entry()'
refs: expose `ref_iterator` via 'refs.h'
|
|
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_int()`. All
callsites are adjusted so that they use
`repo_config_get_int(the_repository, ...)` instead. While some callsites
might already have a repository available, this mechanical conversion is
the exact same as the current situation and thus cannot cause any
regression. Those sites should eventually be cleaned up in a later patch
series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git fetch --prune" used to be O(n^2) expensive when there are many
refs, which has been corrected.
* ph/fetch-prune-optim:
clean up interface for refs_warn_dangling_symrefs
refs: remove old refs_warn_dangling_symref
fetch-prune: optimize dangling-ref reporting
|
|
Code clean-up around object access API.
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
|
|
The ref iterator exposes a `ref_iterator_seek()` function. The name
suggests that this would seek the iterator to a specific reference in
some ways similar to how `fseek()` works for the filesystem.
However, the function actually sets the prefix for refs iteration. So
further iteration would only yield references which match the particular
prefix. This is a bit confusing.
Let's add a 'flags' field to the function, which when set with the
'REF_ITERATOR_SEEK_SET_PREFIX' flag, will set the prefix for the
iteration in-line with the existing behavior. Otherwise, the reference
backends will simply seek to the specified reference and clears any
previously set prefix. This allows users to start iteration from a
specific reference.
In the packed and reftable backend, since references are available in a
sorted list, the changes are simply setting the prefix if needed. The
changes on the files-backend are a little more involved, since the files
backend uses the 'ref-cache' mechanism. We move out the existing logic
within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()`
which is called when the 'REF_ITERATOR_SEEK_SET_PREFIX' flag is set. We
then parse the provided seek string and set the required levels and
their indexes to ensure that seeking is possible.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The refs_warn_dangling_symrefs interface is a bit fragile as it passes
in printf-formatting strings with expectations about the number of
arguments. This patch series made it worse by adding a 2nd positional
argument. But there are only two call sites, and they both use almost
identical display options.
Make this safer by moving the format strings into the function that uses
them to make it easier to see when the arguments don't match. Pass a
prefix string and a dry_run flag so the decision logic can be handled
where needed.
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The dangling warning function that takes a single ref to search for
is no longer used. Remove it.
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When pruning during `git fetch` we check each pruned ref against the
ref_store one at a time to decide whether to report it as dangling.
This causes every local ref to be scanned for each ref being pruned.
If there are N refs in the repo and M refs being pruned, this code is
O(M*N). However, `git remote prune` uses a very similar function that
is only O(N*log(M)).
Remove the wasteful ref scanning for each pruned ref and use the faster
version already available in refs_warn_dangling_symrefs. Change the
message to include the original refname since the message is no longer
printed immediately after the line that did just print the refname.
In a repo with 126,000 refs, where I was pruning 28,000 refs, this
code made about 3.6 billion calls to strcmp and consumed 410 seconds
of CPU. (Invariably in that time, my remote would timeout and the
fetch would fail anyway.)
After this change, the same operation completes in under a second.
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Rename `has_object()` to `odb_has_object()` to match other functions
related to the object database and our modern coding guidelines.
Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `object_directory` structure is used as an access point for a single
object directory like ".git/objects". While the structure isn't yet
fully self-contained, the intent is for it to eventually contain all
information required to access objects in one specific location.
While the name "object directory" is a good fit for now, this will
change over time as we continue with the agenda to make pluggable object
databases a thing. Eventually, objects may not be accessed via any kind
of directory at all anymore, but they could instead be backed by any
kind of durable storage mechanism. While it seems quite far-fetched for
now, it is thinkable that eventually this might even be some form of a
database, for example.
As such, the current name of this structure will become worse over time
as we evolve into the direction of pluggable ODBs. Immediate next steps
will start to carve out proper self-contained object directories, which
requires us to pass in these object directories as parameters. Based on
our modern naming schema this means that those functions should then be
named after their subsystem, which means that we would start to bake the
current name into the codebase more and more.
Let's preempt this by renaming the structure. There have been a couple
alternatives that were discussed:
- `odb_backend` was discarded because it led to the association that
one object database has a single backend, but the model is that one
alternate has one backend. Furthermore, "backend" is more about the
actual backing implementation and less about the high-level concept.
- `odb_alternate` was discarded because it is a bit of a stretch to
also call the main object directory an "alternate".
Instead, pick `odb_source` as the new name. It makes it sufficiently
clear that there can be multiple sources and does not cause confusion
when mixed with the already-existing "alternate" terminology.
In the future, this change allows us to easily introduce for example a
`odb_files_source` and other format-specific implementations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The commit 76e760b999 (refs: introduce enum-based transaction error
types, 2025-04-08) introduced enum-based transaction error types. The
refs transaction logic was also modified to propagate these errors. For
clients of the ref transaction system, it would be beneficial to provide
human readable messages for these errors.
There is already an existing mapping in 'builtin/update-ref.c', move it
to 'refs.c' as `ref_transaction_error_msg()` and use the same within the
'builtin/update-ref.c'.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As the comment of `repo_has_object_file()` and its `_with_flags()`
variant tells us, these functions are considered to be deprecated in
favor of `has_object()`. There are a couple of slight benefits in favor
of the replacement:
- The new function has a short-and-sweet name.
- More explicit defaults: `has_object()` doesn't fetch missing objects
via promisor remotes, and neither does it reload packfiles if an
object wasn't found by default. This ensures that it becomes
immediately obvious when a simple object existence check may result
in expensive actions.
Most importantly though, it is confusing that we have two sets of
functions that ultimately do the same thing, but with different
defaults.
Start sunsetting `repo_has_object_file()` and its `_with_flags()`
sibling by replacing all callsites with `has_object()`:
- `repo_has_object_file(...)` is equivalent to
`has_object(..., HAS_OBJECT_RECHECK_PACKED | HAS_OBJECT_FETCH_PROMISOR)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT)`
is equivalent to `has_object(..., 0)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_SKIP_FETCH_OBJECT)`
is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)`
is equivalent to `has_object(..., HAS_OBJECT_FETCH_PROMISOR)`.
The replacements should be functionally equivalent.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ps/object-file-cleanup:
object-store: merge "object-store-ll.h" and "object-store.h"
object-store: remove global array of cached objects
object: split out functions relating to object store subsystem
object-file: drop `index_blob_stream()`
object-file: split up concerns of `HASH_*` flags
object-file: split out functions relating to object store subsystem
object-file: move `xmmap()` into "wrapper.c"
object-file: move `git_open_cloexec()` to "compat/open.c"
object-file: move `safe_create_leading_directories()` into "path.c"
object-file: move `mkdir_in_gitdir()` into "path.c"
|
|
Comment fix.
* cj/refname-avail-check-optim-typofix:
refs: fix duplicated word in comment
|
|
Updating multiple references have only been possible in all-or-none
fashion with transactions, but it can be more efficient to batch
multiple updates even when some of them are allowed to fail in a
best-effort manner. A new "best effort batches of updates" mode
has been introduced.
* kn/non-transactional-batch-updates:
update-ref: add --batch-updates flag for stdin mode
refs: support rejection in batch updates during F/D checks
refs: implement batch reference update support
refs: introduce enum-based transaction error types
refs/reftable: extract code from the transaction preparation
refs/files: remove duplicate duplicates check
refs: move duplicate refname update check to generic layer
refs/files: remove redundant check in split_symref_update()
|
|
"git clone" still gave the message about the default branch name;
this message has been turned into an advice message that can be
turned off.
* jt/clone-guess-remote-head-fix:
advice: allow disabling default branch name advice
builtin/clone: suppress unexpected default branch advice
remote: allow `guess_remote_head()` to suppress advice
|
|
The object layer has been updated to take an explicit repository
instance as a parameter in more code paths.
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
|
|
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.
Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Fix a typo in a comment in refs.c: "checking checking" → "checking".
Signed-off-by: Christian Fredrik Johnsen <christian@johnsen.no>
Acked-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
|
|
The `refs_verify_refnames_available()` is used to batch check refnames
for F/D conflicts. While this is the more performant alternative than
its individual version, it does not provide rejection capabilities on a
single update level. For batched updates, this would mean a rejection of
the entire transaction whenever one reference has a F/D conflict.
Modify the function to call `ref_transaction_maybe_set_rejected()` to
check if a single update can be rejected. Since this function is only
internally used within 'refs/' and we want to pass in a `struct
ref_transaction *` as a variable. We also move and mark
`refs_verify_refnames_available()` to 'refs-internal.h' to be an
internal function.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Git supports making reference updates with or without transactions.
Updates with transactions are generally better optimized. But
transactions are all or nothing. This means, if a user wants to batch
updates to take advantage of the optimizations without the hard
requirement that all updates must succeed, there is no way currently to
do so. Particularly with the reftable backend where batching multiple
reference updates is more efficient than performing them sequentially.
Introduce batched update support with a new flag,
'REF_TRANSACTION_ALLOW_FAILURE'. Batched updates while different from
transactions, use the transaction infrastructure under the hood. When
enabled, this flag allows individual reference updates that would
typically cause the entire transaction to fail due to non-system-related
errors to be marked as rejected while permitting other updates to
proceed. System errors referred by 'REF_TRANSACTION_ERROR_GENERIC'
continue to result in the entire transaction failing. This approach
enhances flexibility while preserving transactional integrity where
necessary.
The implementation introduces several key components:
- Add 'rejection_err' field to struct `ref_update` to track failed
updates with failure reason.
- Add a new struct `ref_transaction_rejections` and a field within
`ref_transaction` to this struct to allow quick iteration over
rejected updates.
- Modify reference backends (files, packed, reftable) to handle
partial transactions by using `ref_transaction_set_rejected()`
instead of failing the entire transaction when
`REF_TRANSACTION_ALLOW_FAILURE` is set.
- Add `ref_transaction_for_each_rejected_update()` to let callers
examine which updates were rejected and why.
This foundational change enables batched update support throughout the
reference subsystem. A following commit will expose this capability to
users by adding a `--batch-updates` flag to 'git-update-ref(1)',
providing both a user-facing feature and a testable implementation.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Replace preprocessor-defined transaction errors with a strongly-typed
enum `ref_transaction_error`. This change:
- Improves type safety and function signature clarity.
- Makes error handling more explicit and discoverable.
- Maintains existing error cases, while adding new error cases for
common scenarios.
This refactoring paves the way for more comprehensive error handling
which we will utilize in the upcoming commits to add batch reference
update support.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Within the files reference backend's transaction's 'finish' phase, a
verification step is currently performed wherein the refnames list is
sorted and examined for multiple updates targeting the same refname.
It has been observed that this verification is redundant, as an
identical check is already executed during the transaction's 'prepare'
stage. Since the refnames list remains unmodified following the
'prepare' stage, this secondary verification can be safely eliminated.
The duplicate check has been removed accordingly, and the
`ref_update_reject_duplicates()` function has been marked as static, as
its usage is now confined to 'refs.c'.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the tracking of refnames in `affected_refnames` from individual
backends into the generic layer in 'refs.c'. This centralizes the
duplicate refname detection that was previously handled separately by
each backend.
Make some changes to accommodate this move:
- Add a `string_list` field `refnames` to `ref_transaction` to contain
all the references in a transaction. This field is updated whenever
a new update is added via `ref_transaction_add_update`, so manual
additions in reference backends are dropped.
- Modify the backends to use this field internally as needed. The
backends need to check if an update for refname already exists when
splitting symrefs or adding an update for 'HEAD'.
- In the reftable backend, within `reftable_be_transaction_prepare()`,
move the `string_list_has_string()` check above
`ref_transaction_add_update()`. Since `ref_transaction_add_update()`
automatically adds the refname to `transaction->refnames`,
performing the check after will always return true, so we perform
the check before adding the update.
This helps reduce duplication of functionality between the backends and
makes it easier to make changes in a more centralized manner.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|