| Age | Commit message (Collapse) | Author | Files | Lines |
|
In batch mode, git-cat-file(1) enumerates all objects and prints them
by iterating through both loose and packed objects. This works without
considering their reachability at all, and consequently most options to
filter objects as they exist in e.g. git-rev-list(1) are not applicable.
In some situations it may still be useful though to filter objects based
on properties that are inherent to them. This includes the object size
as well as its type.
Such a filter already exists in git-rev-list(1) with the `--filter=`
command line option. While this option supports a couple of filters that
are not applicable to our usecase, some of them are quite a neat fit.
Wire up the filter as an option for git-cat-file(1). This allows us to
reuse the same syntax as in git-rev-list(1) so that we don't have to
reinvent the wheel. For now, we die when any of the filter options has
been passed by the user, but they will be wired up in subsequent
commits.
Further note that the filters that we are about to introduce don't
significantly speed up the runtime of git-cat-file(1). While we can skip
emitting a lot of objects in case they are uninteresting to us, the
majority of time is spent reading the packfile, which is bottlenecked by
I/O and not the processor. This will change though once we start to make
use of bitmaps, which will allow us to skip reading the whole packfile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We have multiple callsites that report the status of an object, for
example when the objec tis missing or its name is ambiguous. We're about
to add a couple more such callsites to report on "excluded" objects.
Prepare for this by introducing a new function `report_object_status()`
that encapsulates the functionality.
Note that this function also flushes stdout, which is a requirement so
that request-response style batched modes can learn about the status
before proceeding to the next object. We already flush correctly at all
existing callsites, even though the flush in `batch_one_object()` only
comes after the switch statement. That flush is now redundant, and we
could in theory deduplicate it by moving it into all branches that don't
use `report_object_status()`. But that doesn't quite feel sensible:
- The duplicate flush should ultimately just be a no-op for us and
thus shouldn't impact performance significantly.
- By keeping the flush in `report_object_status()` we ensure that all
future callers get semantics correct.
So let's just be pragmatic and live with the duplicated flush.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The usage strings for git-cat-file(1) that we pass to `parse_options()`
and `usage_msg_optf()` are stored in a variable called `usage`. This
variable shadows the declaration of `usage()`, which we'll want to use
in a subsequent commit.
Rename the variable to `builtin_catfile_usage`, which is in line with
how the variable is typically called in other builtins.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
An earlier code refactoring of the hash machinery missed a few
required calls to init_fn.
* jh/hash-init-fixes:
index-pack, unpack-objects: restore missing ->init_fn
|
|
"git repack" learned "--combine-cruft-below-size" option that
controls how cruft-packs are combined.
* tb/combine-cruft-below-size:
repack: begin combining cruft packs with `--combine-cruft-below-size`
repack: avoid combining cruft packs with `--max-cruft-size`
t/t7704-repack-cruft.sh: consolidate `write_blob()`
t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
|
|
Using "git name-rev --stdin" as an example, improve the framework to
prepare tests to pretend to be in the future where the breaking
changes have already happened.
* jc/name-rev-stdin:
name-rev: remove "--stdin" support
t6120: further modernize
t6120: avoid hiding "git" exit status
t: introduce WITH_BREAKING_CHANGES prerequisite
t: extend test_lazy_prereq
t: document test_lazy_prereq
|
|
There are multiple places in loops, where a signed and an
unsigned data type are compared. Git uses a mix of signed and unsigned
types to store lengths of arrays. This sometimes leads to using a signed
index for an array whose length is stored in an unsigned variable or
vice versa.
get_ours_cache_pos is a special case where i, though derived from a
signed variable is never negative. Move this part to the caller side
and make i an unsigned argument of the function. Rename i to
pos to make it descriptive, now that it is a function argument.
Replace signed data types with unsigned data types and vice versa
wherever necessary. Where both signed and unsigned data types have been
used, define a new variable in the scope of the for loop for use as the
iterator. Remove #define DISABLE_SIGN_COMPARE_WARNINGS.
Signed-off-by: Arnav Bhate <bhatearnav@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Miscellaneous code clean-ups.
* en/random-cleanups:
merge-ort: remove extraneous word in comment
merge-ort: fix accidental strset<->strintmap
t7615: be more explicit about diff algorithm used
t6423: fix a comment that accidentally reversed two commits
stash: remove merge-recursive.h include
|
|
Certain "cruft" objects would have never been refreshed when there
are multiple cruft packs in the repository, which has been
corrected.
* tb/multi-cruft-pack-refresh-fix:
builtin/pack-objects.c: freshen objects from existing cruft packs
|
|
In protocol v2 where the refs advertisement is constrained, we try
to tell the server side not to limit the advertisement when there
is no specific need to, which has been the source of confusion and
recent bugs. Revamp the logic to simplify.
* jk/fetch-ref-prefix-cleanup:
fetch: use ref prefix list to skip ls-refs
fetch: avoid ls-refs only to ask for HEAD symref update
fetch: stop protecting additions to ref-prefix list
fetch: ask server to advertise HEAD for config-less fetch
refspec_ref_prefixes(): clean up refspec_item logic
t5516: beef up exact-oid ref prefixes test
t5516: drop NEEDSWORK about v2 reachability behavior
t5516: prefer "oid" to "sha1" in some test titles
t5702: fix typo in test name
|
|
First step of deprecating and removing merge-recursive.
* en/merge-ort-prepare-to-remove-recursive:
am: switch from merge_recursive_generic() to merge_ort_generic()
merge-ort: fix merge.directoryRenames=false
t3650: document bug when directory renames are turned off
merge-ort: support having merge verbosity be set to 0
merge-ort: allow rename detection to be disabled
merge-ort: add new merge_ort_generic() function
|
|
The code paths to check whether a refname X is available (by seeing
if another ref X/Y exists, etc.) have been optimized.
* ps/refname-avail-check-optim:
refs: reuse iterators when determining refname availability
refs/iterator: implement seeking for files iterators
refs/iterator: implement seeking for packed-ref iterators
refs/iterator: implement seeking for ref-cache iterators
refs/iterator: implement seeking for reftable iterators
refs/iterator: implement seeking for merged iterators
refs/iterator: provide infrastructure to re-seek iterators
refs/iterator: separate lifecycle from iteration
refs: stop re-verifying common prefixes for availability
refs/files: batch refname availability checks for initial transactions
refs/files: batch refname availability checks for normal transactions
refs/reftable: batch refname availability checks
refs: introduce function to batch refname availability checks
builtin/update-ref: skip ambiguity checks when parsing object IDs
object-name: allow skipping ambiguity checks in `get_oid()` family
object-name: introduce `repo_get_oid_with_flags()`
|
|
"git fast-export | git fast-import" learns to deal with commit and
tag objects with embedded signatures a bit better.
* cc/signed-fast-export-import:
fast-export, fast-import: add support for signed-commits
fast-export: do not modify memory from get_commit_buffer
git-fast-export.adoc: clarify why 'verbatim' may not be a good idea
fast-export: rename --signed-tags='warn' to 'warn-verbatim'
fast-export: fix missing whitespace after switch
git-fast-import.adoc: add missing LF in the BNF
|
|
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. Better use a
semicolon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A handful of built-in command implementations have been rewritten
to use the repository instance supplied by git.c:run_builtin(), its
caller.
* ua/some-builtins-wo-the-repository:
builtin/checkout-index: stop using `the_repository`
builtin/for-each-ref: stop using `the_repository`
builtin/ls-files: stop using `the_repository`
builtin/pack-refs: stop using `the_repository`
builtin/send-pack: stop using `the_repository`
builtin/verify-commit: stop using `the_repository`
builtin/verify-tag: stop using `the_repository`
config: teach repo_config to allow `repo` to be NULL
|
|
"git fsck" becomes more careful when checking the refs.
* sj/ref-consistency-checks-more:
builtin/fsck: add `git refs verify` child process
packed-backend: check whether the "packed-refs" is sorted
packed-backend: add "packed-refs" entry consistency check
packed-backend: check whether the refname contains NUL characters
packed-backend: add "packed-refs" header consistency check
packed-backend: check if header starts with "# pack-refs with: "
packed-backend: check whether the "packed-refs" is regular file
builtin/refs: get worktrees without reading head information
t0602: use subshell to ensure working directory unchanged
|
|
A post-processing filter for "diff --raw" output has been
introduced.
* jt/diff-pairs:
builtin/diff-pairs: allow explicit diff queue flush
builtin: introduce diff-pairs command
diff: add option to skip resolving diff statuses
diff: return diff_filepair from diff queue helpers
|
|
In 199f44cb2ead (builtin/clone: allow remote helpers to detect repo,
2024-02-27), clones started partially initializing the refdb before
executing the remote helpers by creating a HEAD file and "refs/"
directory. This has resulted in some scenarios where git-clone(1) now
prints the default branch name advice message where it previously did
not.
A side-effect of the HEAD file already existing, is that computation of
the default branch name is handled later in execution. This matters
because prior to 97abaab5f6 (refs: drop `git_default_branch_name()`,
2024-05-17), the default branch value would be computed during its first
execution and cached. Subsequent invocations would simply return the
cached value. Since the next `git_default_branch_name()` call site,
which is invoked through `guess_remote_head()`, is not configured to
suppress the advice message, computing the default branch name results
in the advice message being printed.
Configure `guess_remote_head()` to suppress the advice message,
restoring the previous behavior.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `repo_default_branch_name()` invoked through `guess_remote_head()`
is configured to always display the default branch advice message.
Adapt `guess_remote_head()` to accept flags and convert the `all`
parameter to a flag. Add the `REMOTE_GUESS_HEAD_QUIET` flag to to enable
suppression of advice messages. Call sites are updated accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The 'loose-objects' task of 'git maintenance run' first deletes loose
objects that exit within packfiles and then collects loose objects into
a packfile. This second step uses an implicit limit of fifty thousand
that cannot be modified by users.
Add a new config option that allows this limit to be adjusted or ignored
entirely.
While creating tests for this option, I noticed that actually there was
an off-by-one error due to the strict comparison in the limit check. I
considered making the limit check turn true on equality, but instead I
thought to use INT_MAX as a "no limit" barrier which should mean it's
never possible to hit the limit. Thus, a new decrement to the limit is
provided if the value is positive. (The restriction to positive values
is to avoid underflow if INT_MIN is configured.)
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The --no-quiet option for 'git maintenance run' is supposed to indicate
that progress should happen even while ignoring the value of isatty(2).
However, Git implicitly asks child processes to check isatty(2) since
these arguments are not passed through.
The pass through of --no-quiet will be useful in a test in the next
change.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Now that the pack-bitmap machinery has learned how to read and interact
with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery
(and relevant callers from within the MIDX machinery) to write such
bitmaps.
The details for doing so are mostly straightforward. The main changes
are as follows:
- find_object_pos() now makes use of an extra MIDX parameter which is
used to locate the bit positions of objects which are from previous
layers (and thus do not exist in the current layer's pack_order
field).
(Note also that the pack_order field is moved into struct
write_midx_context to further simplify the callers for
write_midx_bitmap()).
- bitmap_writer_build_type_index() first determines how many objects
precede the current bitmap layer and offsets the bits it sets in
each respective type-level bitmap by that amount so they can be OR'd
together.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As part of the reference transaction commit phase, the transaction is
set to a closed state regardless of whether it was successful of not.
Attempting to abort a closed transaction via `ref_transaction_abort()`
results in a `BUG()`.
In c92abe71df (builtin/fetch: fix leaking transaction with `--atomic`,
2024-08-22), logic to free a transaction after the commit phase is moved
to the centralized exit path. In cases where the transaction commit
failed, this results in a closed transaction being aborted and signaling
a bug.
Free the transaction and set it to NULL when the commit fails. This
allows the exit path to correctly handle the error without attempting to
abort the transaction.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The previous commit changed the behavior of repack's '--max-cruft-size'
to specify a cruft pack-specific override for '--max-pack-size'.
Introduce a new flag, '--combine-cruft-below-size' which is a
replacement for the old behavior of '--max-cruft-size'. This new flag
does explicitly what it says: it combines together cruft packs which are
smaller than a given threshold, and leaves alone ones which are
larger.
This accomplishes the original intent of '--max-cruft-size', which was
to avoid repacking cruft packs larger than the given threshold.
The new behavior is slightly different. Instead of building up small
packs together until the threshold is met, '--combine-cruft-below-size'
packs up *all* cruft packs smaller than the threshold. This means that
we may make a pack much larger than the given threshold (e.g., if you
aggregate 5 packs which are each 99 MiB in size with a threshold of 100
MiB).
But that's OK: the point isn't to restrict the size of the cruft packs
we generate, it's to avoid working with ones that have already grown too
large. If repositories still want to limit the size of the generated
cruft pack(s), they may use '--max-cruft-size'.
There's some minor test fallout as a result of the slight differences in
behavior between the old meaning of '--max-cruft-size' and the behavior
of '--combine-cruft-below-size'. In the test which is now called
"--combine-cruft-below-size combines packs", we need to use the new flag
over the old one to exercise that test's intended behavior. The
remainder of the changes there are to improve the clarity of the
comments.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 37dc6d8104 (builtin/repack.c: implement support for
`--max-cruft-size`, 2023-10-02), we exposed new functionality that
allowed repositories to specify the behavior of when we should combine
multiple cruft packs together.
This feature was designed to ensure that we never repacked cruft packs
which were larger than the given threshold in order to provide tighter
I/O bounds for repositories that have many unreachable objects. In
essence, specifying '--max-cruft-size=N' instructed 'repack' to
aggregate cruft packs together (in order of ascending size) until the
combine size grows past 'N', and then make a new cruft pack whose
contents includes the packs we rolled up.
But this isn't quite how it works in practice. Suppose for example that
we have two cruft packs which are each 100MiB in size. One might expect
specifying "--max-cruft-size=200M" would combine these two packs
together, and then avoid repacking them until a pruning GC takes place.
In reality, 'repack' would try and aggregate these together, but writing
a pack that is strictly smaller than 200 MiB (since pack-objects'
"--max-pack-size" provides a strict bound for packs containing more than
one object).
So instead we'll write out a pack that is, say, 199 MiB in size, and
then another 1 MiB pack containing the balance. If we later repack the
repository without adding any new unreachable objects, we'll repeat the
same exercise again, making the same 199 MiB and 1 MiB packs each time.
This happens because of a poor choice to bolt the '--max-cruft-size'
functionality onto pack-objects' '--max-pack-size', forcing us to
generate packs which are always smaller than the provided threshold and
thus subject to repacking.
The following commit will introduce a new flag that implements something
similar to the behavior above. Let's prepare for that by making repack's
'--max-cruft-size' flag behave as an cruft pack-specific override for
'--max-pack-size'.
Do so by temporarily repurposing the 'collapse_small_cruft_packs()'
function to instead generate a cruft pack using the same instructions as
if we didn't specify any maximum pack size. The calling code looks
something like:
if (args->max_pack_size && !cruft_expiration) {
collapse_small_cruft_packs(in, args->max_pack_size, existing);
} else {
for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "-%s.pack\n", item->string);
for_each_string_list_item(item, &existing->cruft_packs)
fprintf(in, "-%s.pack\n", item->string);
}
This patch makes collapse_small_cruft_packs() behave identically to the
'else' arm of the conditional above. This repurposing of
'collapse_small_cruft_packs()' is intentional, since it will set us up
nicely to introduce the new behavior in the following commit.
Naturally, there is some test fallout in the test which exercises the
old meaning of '--max-cruft-size'. Mark that test as failing for now to
be dealt with in the following commit. Likewise, add a new test which
explicitly tests the behavior of '--max-cruft-size' to place a hard
limit on the size of any generated cruft pack(s).
Note that this is a breaking change, as it alters the user-visible
behavior of '--max-cruft-size'. But I'm OK changing this behavior in
this instance, since the behavior wasn't accurate to begin with.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `--missing={print,print-info}` option for git-rev-list(1) prints
missing objects found while performing the object walk in the form:
$ git rev-list --missing=print-info <rev>
?<oid> [SP <token>=<value>]... LF
Add support for printing missing objects in a NUL-delimited format when
the `-z` option is enabled.
$ git rev-list -z --missing=print-info <rev>
<oid> NUL missing=yes NUL [<token>=<value> NUL]...
In this mode, values containing special characters or spaces are printed
as-is without being escaped or quoted. Instead of prefixing the missing
OID with '?', a separate `missing=yes` token/value pair is appended.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `--boundary` option for git-rev-list(1) prints boundary objects
found while performing the object walk in the form:
$ git rev-list --boundary <rev>
-<oid> LF
Add support for printing boundary objects in a NUL-delimited format when
the `-z` option is enabled.
$ git rev-list -z --boundary <rev>
<oid> NUL boundary=yes NUL
In this mode, instead of prefixing the boundary OID with '-', a separate
`boundary=yes` token/value pair is appended.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When walking objects, git-rev-list(1) prints each object entry on a
separate line. Some options, such as `--objects`, may print additional
information about tree and blob object on the same line in the form:
$ git rev-list --objects <rev>
<tree/blob oid> SP [<path>] LF
Note that in this form the SP is appended regardless of whether the tree
or blob object has path information available. Paths containing a
newline are also truncated at the newline.
Introduce the `-z` option for git-rev-list(1) which reformats the output
to use NUL-delimiters between objects and associated info in the
following form:
$ git rev-list -z --objects <rev>
<oid> NUL [path=<path> NUL]
In this form, the start of each record is signaled by an OID entry that
is all hexidecimal and does not contain any '='. Additional path info
from `--objects` is appended to the record as a token/value pair
`path=<path>` as-is without any truncation.
For now, the `--objects` flag is the only options that can be used in
combination with `-z`. In a subsequent commit, NUL-delimited support for
other options is added. Other options that do not make sense when used
in combination with `-z` are rejected.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Before invoking `setup_revisions()`, the `--missing` and
`--exclude-promisor-objects` options are parsed early. In a subsequent
commit, another option is added that must be parsed early.
Refactor the code to parse both options in a single early pass.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `show_object_with_name()` function only has a single call site.
Inline call to `show_object_with_name()` in `show_object()` so the
explicit function can be cleaned up and live closer to where it is used.
While at it, factor out the code that prints the OID and newline for
both objects with and without a name. In a subsequent commit,
`show_object()` is modified to support printing object information in a
NUL-delimited format.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
For similar reasons as in the previous refactoring of `refspec_init()`
into `refspec_init_fetch()` and `refspec_init_push()`, apply the same
refactoring to `refspec_item_init()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are two callers of this function, which ensures that a dispatched
call to refspec_item_init() does not fail.
In the following commit, we're going to add fetch/push-specific variants
of refspec_item_init(), which will turn one function into two. To avoid
introducing yet another pair of new functions (such as
refspec_item_init_push_or_die() and refspec_item_init_fetch_or_die()),
let's remove the thin wrapper entirely.
This duplicates a single line of code among two callers, but thins the
refspec.h API by one function, and prevents introducing two more in the
following commit.
Note that we still have a trailing Boolean argument in the function
`refspec_item_init()`. The following commit will address this.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since 6d4c057859 (refspec: introduce struct refspec, 2018-05-16), we
have macros called REFSPEC_FETCH and REFSPEC_PUSH. This confusingly
suggests that we might introduce other modes in the future, which, while
possible, is highly unlikely.
But these values are treated as a Boolean, and stored in a struct field
called 'fetch'. So the following:
if (refspec->fetch == REFSPEC_FETCH) { ... }
, and
if (refspec->fetch) { ... }
are equivalent. Let's avoid renaming the Boolean values "true" and
"false" here and remove the two REFSPEC_ macros mentioned above.
Since this value is truly a Boolean and will only ever take on a value
of 0 or 1, we can declare it as a single bit unsigned field. In
practice this won't shrink the size of 'struct refspec', but it more
clearly indicates the intent.
Note that this introduces some awkwardness like:
refspec_item_init_or_die(&spec, refspec, 1);
, where it's unclear what the final "1" does. This will be addressed in
the following commits.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit 0578f1e66a ("global: adapt callers to use generic hash context helpers")
accidentally removed `->init_fn`, which is required for OpenSSL 3+ SHA1.
This fixes the following error on fetch:
fatal: fetch-pack: invalid index-pack output
Signed-off-by: Jensen Huang <hmz007@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we are going to consider updating the refs/remotes/*/HEAD symref,
we have to ask the remote side where its HEAD points. But if we know
that the feature is disabled by config, we don't need to bother!
This saves a little bit of work and network communication for the
server. And even a little bit of effort on the client, as our local
set_head() function did a bit of work matching the remote HEAD before
realizing that we're not going to do anything with it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The new followRemoteHEAD feature is triggered for almost every fetch,
causing us to ask the server about the remote "HEAD" and to consider
updating our local tracking HEAD symref. This patch limits the feature
only to the case when we are fetching a remote using its configured
refspecs (typically into its refs/remotes/ hierarchy). There are two
reasons for this.
One is efficiency. E.g., the fixes in 6c915c3f85 (fetch: do not ask for
HEAD unnecessarily, 2024-12-06) and 20010b8c20 (fetch: avoid ls-refs
only to ask for HEAD symref update, 2025-03-08) were aimed at reducing
the work we do when we would not be able to update HEAD anyway. But they
do not quite cover all cases. The remaining one is:
git fetch origin refs/heads/foo:refs/remotes/origin/foo
which _sometimes_ can update HEAD, but usually not. And that leads us to
the second point, which is being simple and explainable.
The code for updating the tracking HEAD symref requires both that we
learned which ref the remote HEAD points at, and that the server
advertised that ref to us. But because the v2 protocol narrows the
server's advertisement, the command above would not typically update
HEAD at all, unless it happened to point to the "foo" branch. Or even
weirder, it probably _would_ update if the server is very old and
supports only the v0 protocol, which always gives a full advertisement.
This creates confusing behavior for the user: sometimes we may try to
update HEAD and sometimes not, depending on vague rules.
One option here would be to loosen the update code to accept the remote
HEAD even if the server did not advertise that ref. I think that could
work, but it may also lead to interesting corner cases (e.g., creating a
dangling symref locally, even though the branch is not unborn on the
server, if we happen not to have fetched it).
So let's instead simplify the rules: we'll only consider updating the
tracking HEAD symref when we're doing a full fetch of the remote's
configured refs. This is easy to implement; we can just set a flag at
the moment we realize we're using the configured refspecs. And we can
drop the special case code added by 6c915c3f85 and 20010b8c20, since
this covers those cases. The existing tests from those commits still
pass.
In t5505, an incidental call to "git fetch <remote> <refspec>" updated
HEAD, which caused us to adjust the test in 3f763ddf28 (fetch: set
remote/HEAD if it does not exist, 2024-11-22). We can now adjust that
back to how it was before the feature was added.
Even though t5505 is incidentally testing our new desired behavior,
we'll add an explicit test in t5510 to make sure it is covered.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Switch from merge-recursive to merge-ort. Adjust the following
testcases due to the switch:
* t4151: This test left an untracked file in the way of the merge.
merge-recursive could only sometimes tell when untracked files were
in the way, and by the time it discovers others, it has already made
too many changes to back out of the merge. So, instead of writing the
results to e.g. 'file1' it would instead write them to
'file1~branch1'. This is confusing for users, because they might not
notice 'file1~branch1' and accidentally add and commit 'file1'.
In contrast, merge-ort correctly notices the file in the way before
making any changes and aborts. Since this test didn't care about the
file in the way, just remove it before calling git-am.
* t4255: Usage of merge-ort allows us to change two known failures into
successes.
* t6427: As noted a few commits ago, the choice of conflict label for
diff3 markers for the ancestor commit was previously handled by
merge-recursive.c rather than by callers. Since that has now changed,
`git am` needs to specify that label. Although the previous conflict
label ("constructed merge base") was already fairly somewhat slanted
towards `git am`, let's use wording more along the lines of the
related command-line flag from `git apply` and function involved to
tie it more closely to `git am`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* tb/multi-cruft-pack-refresh-fix:
builtin/pack-objects.c: freshen objects from existing cruft packs
|
|
While 'git-reflog(1)' currently allows users to expire reflogs and
delete individual entries, it lacks functionality to completely remove
reflogs for specific references. This becomes problematic in
repositories where reflogs are not needed but continue to accumulate
entries despite setting 'core.logAllRefUpdates=false'.
Add a new 'drop' subcommand to git-reflog that allows users to delete
the entire reflog for a specified reference. Include an '--all' flag to
enable dropping all reflogs from all worktrees and an addon flag
'--single-worktree', to only drop all reflogs from the current worktree.
While here, remove an extraneous newline in the file.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The 'git reflog expire' prints the error message '<ref> points nowhere!'
when used with a non-existent ref. This message is a bit confusing and
vague. Modify the message to be more clear and direct.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
stash was modified to use merge_ort_nonrecursive() instead of
merge_recursive_generic() back in commit 874cf2a60444 (stash: apply
stash using 'merge_ort_nonrecursive()', 2022-05-10). That makes the
inclusion of merge-recursive.h unnecessary. In preparation for the
removal of merge-recursive.h, remove the unnecessary include.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Once an object is written into a cruft pack, we can only freshen it by
writing a new loose or packed copy of that object with a more recent
mtime.
Prior to 61568efa95 (builtin/pack-objects.c: support `--max-pack-size`
with `--cruft`, 2023-08-28), we typically had at most one cruft pack in
a repository at any given time. So freshening unreachable objects was
straightforward when already rewriting the cruft pack (and its *.mtimes
file).
But 61568efa95 changes things: 'pack-objects' now supports writing
multiple cruft packs when invoked with `--cruft` and the
`--max-pack-size` flag. Cruft packs are rewritten until they reach some
size threshold, at which point they are considered "frozen", and will
only be modified in a pruning GC, or if the threshold itself is
adjusted.
Prior to this patch, however, this process breaks down when we attempt
to freshen an object packed in an earlier cruft pack, and that cruft
pack is larger than the threshold and thus will survive the repack.
When this is the case, it is impossible to freshen objects in cruft
pack(s) when those cruft packs are larger than the threshold. This is
because we would avoid writing them in the new cruft pack entirely, for
a couple of reasons.
1. When enumerating packed objects via 'add_objects_in_unpacked_packs()'
we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping
over the packs we're going to retain (which are marked as kept
in-core by 'read_cruft_objects()').
This means that we will avoid enumerating additional packed copies
of objects found in any cruft packs which are larger than the given
size threshold. Thus there is no opportunity to call
'create_object_entry()' whatsoever.
2. We likewise will discard the loose copy (if one exists) of any
unreachable object packed in a cruft pack that is larger than the
threshold. Here our call path is 'add_unreachable_loose_objects()',
which uses the 'add_loose_object()' callback.
That function will eventually land us in 'want_object_in_pack()'
(via 'add_cruft_object_entry()'), and we'll discard the object as it
appears in one of the packs which we marked as kept in-core.
This means in effect that it is impossible to freshen an unreachable
object once it appears in a cruft pack larger than the given threshold.
Instead, we should pack an additional copy of an unreachable object we
want to freshen even if it appears in a cruft pack, provided that the
cruft copy has an mtime which is before the mtime of the copy we are
trying to pack/freshen. This is sub-optimal in the sense that it
requires keeping an additional copy of unreachable objects upon
freshening, but we don't have a better alternative without the ability
to make in-place modifications to existing *.mtimes files.
In order to implement this, we have to adjust the behavior of
'want_found_object()'. When 'pack-objects' is told that we're *not*
going to retain any cruft packs (i.e. the set of packs marked as kept
in-core does not contain a cruft pack), the behavior is unchanged.
But when there *is* at least one cruft pack that we're holding onto, it
is no longer sufficient to reject a copy of an object found in that
cruft pack for that reason alone. In this case, we only want to reject a
candidate object when copies of that object either:
- exists in a non-cruft pack that we are retaining, regardless of that
pack's mtime, or
- exists in a cruft pack with an mtime at least as recent as the copy
we are debating whether or not to pack, in which case freshening
would be redundant.
To do this, keep track of whether or not we have any cruft packs in our
in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
flag. When we end up in this new special case, we replace a call to
'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject
objects when we have a copy in an existing cruft pack with at least as
recent an mtime as our candidate (in which case "freshening" would be
redundant).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The ref and reflog iterators have their lifecycle attached to iteration:
once the iterator reaches its end, it is automatically released and the
caller doesn't have to care about that anymore. When the iterator should
be released before it has been exhausted, callers must explicitly abort
the iterator via `ref_iterator_abort()`.
This lifecycle is somewhat unusual in the Git codebase and creates two
problems:
- Callsites need to be very careful about when exactly they call
`ref_iterator_abort()`, as calling the function is only valid when
the iterator itself still is. This leads to somewhat awkward calling
patterns in some situations.
- It is impossible to reuse iterators and re-seek them to a different
prefix. This feature isn't supported by any iterator implementation
except for the reftable iterators anyway, but if it was implemented
it would allow us to optimize cases where we need to search for
specific references repeatedly by reusing internal state.
Detangle the lifecycle from iteration so that we don't deallocate the
iterator anymore once it is exhausted. Instead, callers are now expected
to always call a newly introduce `ref_iterator_free()` function that
deallocates the iterator and its internal state.
Note that the `dir_iterator` is somewhat special because it does not
implement the `ref_iterator` interface, but is only used to implement
other iterators. Consequently, we have to provide `dir_iterator_free()`
instead of `dir_iterator_release()` as the allocated structure itself is
managed by the `dir_iterator` interfaces, as well, and not freed by
`ref_iterator_free()` like in all the other cases.
While at it, drop the return value of `ref_iterator_abort()`, which
wasn't really required by any of the iterator implementations anyway.
Furthermore, stop calling `base_ref_iterator_free()` in any of the
backends, but instead call it in `ref_iterator_free()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Most of the commands in git-update-ref(1) accept an old and/or new
object ID to update a specific reference to. These object IDs get parsed
via `repo_get_oid()`, which not only handles plain object IDs, but also
those that have a suffix like "~" or "^2". More surprisingly though, it
even knows to resolve arbitrary revisions, despite the fact that its
manpage does not mention this fact even once.
One consequence of this is that we also check for ambiguous references:
when parsing a full object ID where the DWIM mechanism would also cause
us to resolve it as a branch, we'd end up printing a warning. While this
check makes sense to have in general, it is arguably less useful in the
context of git-update-ref(1). This is due to multiple reasons:
- The manpage is explicitly structured around object IDs. So if we see
a fully blown object ID, the intent should be quite clear in
general.
- The command is part of our plumbing layer and not a tool that users
would generally use in interactive workflows. As such, the warning
will likely not be visible to anybody in the first place.
- Users can and should use the fully-qualified refname in case there
is any potential for ambiguity. And given that this command is part
of our plumbing layer, one should always try to be as defensive as
possible and use fully-qualified refnames.
Furthermore, this check can be quite expensive when updating lots of
references via `--stdin`, because we try to read multiple references per
object ID that we parse according to the DWIM rules. This effect can be
seen both with the "files" and "reftable" backend.
The issue is not unique to git-update-ref(1), but was also an issue in
git-cat-file(1), where it was addressed by disabling the ambiguity check
in 25fba78d36b (cat-file: disable object/refname ambiguity check for
batch mode, 2013-07-12).
Disable the warning in git-update-ref(1), which provides a significant
speedup with both backends. The user-visible outcome is unchanged even
when ambiguity exists, except that we don't show the warning anymore.
The following benchmark creates 10000 new references with a 100000
preexisting refs with the "files" backend:
Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 467.3 ms ± 5.1 ms [User: 100.0 ms, System: 365.1 ms]
Range (min … max): 461.9 ms … 479.3 ms 10 runs
Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 394.1 ms ± 5.8 ms [User: 63.3 ms, System: 327.6 ms]
Range (min … max): 384.9 ms … 405.7 ms 10 runs
Summary
update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
1.19 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
And with the "reftable" backend:
Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 146.9 ms ± 2.2 ms [User: 90.4 ms, System: 56.0 ms]
Range (min … max): 142.7 ms … 150.8 ms 19 runs
Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 63.2 ms ± 1.1 ms [User: 41.0 ms, System: 21.8 ms]
Range (min … max): 61.1 ms … 66.6 ms 41 runs
Summary
update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
2.32 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Note that the absolute improvement with both backends is roughly in the
same ballpark, but the relative improvement for the "reftable" backend
is more significant because writing the new table to disk is faster in
the first place.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As part of Git 3.0, remove the hidden synonym for "--annotate-stdin"
for real. As this does not change the fact that it used to be
called "--stdin" in older version of Git, keep that passage in the
documentation for "--annotate-stdin".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
fast-export has a --signed-tags= option that controls how to handle tag
signatures. However, there is no equivalent for commit signatures; it
just silently strips the signature out of the commit (analogously to
--signed-tags=strip).
While signatures are generally problematic for fast-export/fast-import
(because hashes are likely to change), if they're going to support tag
signatures, there's no reason to not also support commit signatures.
So, implement a --signed-commits= option that mirrors the --signed-tags=
option.
On the fast-export side, try to be as much like signed-tags as possible,
in both implementation and in user-interface. This will change the
default behavior to '--signed-commits=abort' from what is now
'--signed-commits=strip'. In order to provide an escape hatch for users
of third-party tools that call fast-export and do not yet know of the
--signed-commits= option, add an environment variable
'FAST_EXPORT_SIGNED_COMMITS_NOABORT=1' that changes the default to
'--signed-commits=warn-strip'.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
fast-export's helper function find_encoding() takes a `const char *`, but
modifies that memory despite the `const`. Ultimately, this memory came
from get_commit_buffer(), and you're not supposed to modify the memory
that you get from get_commit_buffer().
So, get rid of find_encoding() in favor of commit.h:find_commit_header(),
which gives back a string length, rather than mutating the memory to
insert a '\0' terminator.
Because find_commit_header() detects the "\n\n" string that separates the
headers and the commit message, move the call to be above the
`message = strstr(..., "\n\n")` call. This helps readability, and allows
for the value of `encoding` to be used for a better value of "..." so that
the same memory doesn't need to be checked twice. Introduce a
`commit_buffer_cursor` variable to avoid writing an awkward
`encoding ? encoding + encoding_len : committer_end` expression.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The --signed-tags= option takes one of five arguments specifying how to
handle signed tags during export. Among these arguments, 'strip' is to
'warn-strip' as 'verbatim' is to 'warn' (the unmentioned argument is
'abort', which stops the fast-export process entirely). That is,
signatures are either stripped or copied verbatim while exporting, with
or without a warning.
Match the pattern and rename 'warn' to 'warn-verbatim' to make it clear
that it instructs fast-export to copy signatures verbatim.
To maintain backwards compatibility, 'warn' is still recognized as
deprecated synonym of 'warn-verbatim'.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"Documentation/CodingGuidelines" says that there should be whitespaces
around operators like 'if', 'switch', 'for', etc.
Let's fix this in "builtin/fast-export.c".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.
This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.
There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:
- "builtin/grep.c"
- "grep.c"
- "refs/debug.c"
There are also two non-trivial exceptions:
- "diff-no-index.c": Here we know that we may not have a repository
initialized at all, so we cannot rely on `the_repository`. Instead,
we adapt `diff_no_index()` to get a `struct git_hash_algo` as
parameter. The only caller is located in "builtin/diff.c", where we
know to call `repo_set_hash_algo()` in case we're running outside of
a Git repository. Consequently, it is fine to continue passing
`the_repository->hash_algo` even in this case.
- "builtin/ls-files.c": There is an in-flight patch series that drops
`USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
conflict because we use `null_oid()` in `show_submodule()`. The
value is passed to `repo_submodule_init()`, which may use the object
ID to resolve a tree-ish in the superproject from which we want to
read the submodule config. As such, the object ID should refer to an
object in the superproject, and consequently we need to use its hash
algorithm.
This means that we could in theory just not bother about this edge
case at all and just use `the_repository` in "diff-no-index.c". But
doing so would feel misdesigned.
Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|