summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2024-02-28upload-pack: use existing config mechanism for advertisementJeff King1-16/+10
When serving a v2 capabilities request, we call upload_pack_advertise() to tell us the set of features we can advertise to the client. That involves looking at various config options, all of which need to be kept in sync with the rules we use in upload_pack_config to set flags like allow_filter, allow_sideband_all, and so on. If these two pieces of code get out of sync then we may refuse to respect a capability we advertised, or vice versa accept one that we should not. Instead, let's call the same config helper that we'll use for processing the actual client request, and then just pick the values out of the resulting struct. This is only a little bit shorter than the current code, but we don't repeat any policy logic (e.g., we don't have to worry about the magic sideband-all environment variable here anymore). And this reveals a gap in the existing code: there is no struct flag for the packfile-uris capability (we accept it even if it is not advertised, which we should not). We'll leave the advertisement code for now and deal with it in the next patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: centralize setup of sideband-all configJeff King1-2/+3
We read uploadpack.allowsidebandall to set a matching flag in our upload_pack_data struct. But for our tests, we also respect GIT_TEST_SIDEBAND_ALL from the environment, and anybody looking at the flag in the struct needs to remember to check both. There's only one such piece of code now, but we're about to add another. So let's have the config step actually fold the environment value into the struct, letting the rest of the code use the flag in the obvious way. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: use repository struct to get configJeff King1-5/+6
Our upload_pack_v2() function gets a repository struct, but we ignore it totally. In practice this doesn't cause any problems, as it will never differ from the_repository. But in the spirit of taking a small step towards getting rid of the_repository, let's at least starting using it to grab config. There are probably other spots that could benefit, but it's a start. Note that we don't need to pass the repo for protected_config(); the whole point there is that we are not looking at repo config, so there is no repo-specific version of the function. For the v0 version of the protocol, we're not passed a repository struct, so we'll continue to use the_repository there. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: free tree buffers after parsingJeff King4-4/+23
When a client sends us a "want" or "have" line, we call parse_object() to get an object struct. If the object is a tree, then the parsed state means that tree->buffer points to the uncompressed contents of the tree. But we don't really care about it. We only really need to parse commits and tags; for trees and blobs, the important output is just a "struct object" with the correct type. But much worse, we do not ever free that tree buffer. It's not leaked in the traditional sense, in that we still have a pointer to it from the global object hash. But if the client requests many trees, we'll hold all of their contents in memory at the same time. Nobody really noticed because it's rare for clients to directly request a tree. It might happen for a lightweight tag pointing straight at a tree, or it might happen for a "tree:depth" partial clone filling in missing trees. But it's also possible for a malicious client to request a lot of trees, causing upload-pack's memory to balloon. For example, without this patch, requesting every tree in git.git like: pktline() { local msg="$*" printf "%04x%s\n" $((1+4+${#msg})) "$msg" } want_trees() { pktline command=fetch printf 0001 git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' | while read oid type; do test "$type" = "tree" || continue pktline want $oid done pktline done printf 0000 } want_trees | GIT_PROTOCOL=version=2 valgrind --tool=massif ./git upload-pack . >/dev/null shows a peak heap usage of ~3.7GB. Which is just about the sum of the sizes of all of the uncompressed trees. For linux.git, it's closer to 17GB. So the obvious thing to do is to call free_tree_buffer() after we realize that we've parsed a tree. We know that upload-pack won't need it later. But let's push the logic into parse_object_with_flags(), telling it to discard the tree buffer immediately. There are two reasons for this. One, all of the relevant call-sites already call the with_options variant to pass the SKIP_HASH flag. So it actually ends up as less code than manually free-ing in each spot. And two, it enables an extra optimization that I'll discuss below. I've touched all of the sites that currently use SKIP_HASH in upload-pack. That drops the peak heap of the upload-pack invocation above from 3.7GB to ~24MB. I've also modified the caller in get_reference(); a partial clone benefits from its use in pack-objects for the reasons given in 0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects, 2022-09-06), where we were measuring blob requests. But note that the results of get_reference() are used for traversing, as well; so we really would _eventually_ use the tree contents. That makes this at first glance a space/time tradeoff: we won't hold all of the trees in memory at once, but we'll have to reload them each when it comes time to traverse. And here's where our extra optimization comes in. If the caller is not going to immediately look at the tree contents, and it doesn't care about checking the hash, then parse_object() can simply skip loading the tree entirely, just like we do for blobs! And now it's not a space/time tradeoff in get_reference() anymore. It's just a lazy-load: we're delaying reading the tree contents until it's time to actually traverse them one by one. And of course for upload-pack, this optimization means we never load the trees at all, saving lots of CPU time. Timing the "every tree from git.git" request above shows upload-pack dropping from 32 seconds of CPU to 19 (the remainder is mostly due to pack-objects actually sending the pack; timing just the upload-pack portion shows we go from 13s to ~0.28s). These are all highly gamed numbers, of course. For real-world partial-clone requests we're saving only a small bit of time in practice. But it does help harden upload-pack against malicious denial-of-service attacks. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more placesJeff King1-2/+4
In commit 0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects, 2022-09-06), we optimized the parse_object() calls for v2 "want" lines from the client so that they avoided parsing blobs, and so that they used the commit-graph rather than parsing commit objects from scratch. We should extend that to two other spots: 1. We parse "have" objects in the got_oid() function. These won't generally be non-commits (unlike "want" lines from a partial clone). But we still benefit from the use of the commit-graph. 2. For v0, the "want" lines are parsed in receive_needs(). These are also less likely to be non-commits because by default they have to be ref tips. There are config options you might set to allow non-tip objects, but you'd mostly do so to support partial clones, and clients recent enough to support partial clone will generally speak v2 anyway. So I don't expect this change to improve performance much for day-to-day operations. But both are possible denial-of-service vectors, where an attacker can waste our time by sending over a large number of objects to parse (of course we may waste even more time serving a pack to them, but we try as much as possible to optimize that in pack-objects; we should do what we can here in upload-pack, too). With this patch, running p5600 with GIT_TEST_PROTOCOL_VERSION=0 shows similar results to what we saw in 0bc2557951 (which ran with the v2 protocol by default). Here are the numbers for linux.git: Test HEAD^ HEAD ----------------------------------------------------------------------------- 5600.3: checkout of result 50.91(87.95+2.93) 41.75(79.00+3.18) -18.0% Or for a more extreme (and malicious) case, we can claim to "have" every blob in git.git over the v0 protocol: $ { echo "0032want $(git rev-parse HEAD)" printf 0000 git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' | perl -alne 'print "0032have $F[0]" if $F[1] eq "blob"' } >input $ time ./git.old upload-pack . <input >/dev/null real 0m52.951s user 0m51.633s sys 0m1.304s $ time ./git.new upload-pack . <input >/dev/null real 0m0.261s user 0m0.156s sys 0m0.105s (Note that these don't actually compute a pack because of the hacky protocol usage, so those numbers are representing the raw blob-parsing effort done by upload-pack). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: always turn off save_commit_bufferJeff King2-2/+2
When the client sends us "want $oid" lines, we call parse_object($oid) to get an object struct. It's important to parse the commits because we need to traverse them in the negotiation phase. But of course we don't need to hold on to the commit messages for each one. We've turned off the save_commit_buffer flag in get_common_commits() for a long time, since f0243f26f6 (git-upload-pack: More efficient usage of the has_sha1 array, 2005-10-28). That helps with the commits we see while actually traversing. But: 1. That function is only used by the v0 protocol. I think the v2 protocol's code path leaves the flag on (and thus pays the extra memory penalty), though I didn't measure it specifically. 2. If the client sends us a bunch of "want" lines, that happens before the negotiation phase. So we'll hold on to all of those commit messages. Generally the number of "want" lines scales with the refs, not with the number of objects in the repo. But a malicious client could send a lot in order to waste memory. As an example of (2), if I generate a request to fetch all commits in git.git like this: pktline() { local msg="$*" printf "%04x%s\n" $((1+4+${#msg})) "$msg" } want_commits() { pktline command=fetch printf 0001 git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' | while read oid type; do test "$type" = "commit" || continue pktline want $oid done pktline done printf 0000 } want_commits | GIT_PROTOCOL=version=2 valgrind --tool=massif git-upload-pack . >/dev/null before this patch upload-pack peaks at ~125MB, and after at ~35MB. The difference is not coincidentally about the same as the sum of all commit object sizes as computed by: git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectsize)' | perl -alne '$v += $F[1] if $F[0] eq "commit"; END { print $v }' In a larger repository like linux.git, that number is ~1GB. In a repository with a full commit-graph file this will have no impact (and the commit graph would save us from parsing at all, so is a much better solution!). But it's easy to do, might help a little in real-world cases (where even if you have a commit graph it might not be fully up to date), and helps a lot for a worst-case malicious request. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: disallow object-info capability by defaultTaylor Blau4-3/+40
We added an "object-info" capability to the v2 upload-pack protocol in a2ba162cda (object-info: support for retrieving object info, 2021-04-20). In the almost 3 years since, we have not added any client-side support, and it does not appear to exist in other implementations either (JGit understands the verb on the server side, but not on the client side). Since this largely unused code is accessible over the network by default, it increases the attack surface of upload-pack. I don't know of any particularly severe problem, but one issue is that because of the request/response nature of the v2 protocol, it will happily read an unbounded number of packets, adding each one to a string list (without regard to whether they are objects we know about, duplicates, etc). This may be something we want to improve in the long run, but in the short term it makes sense to disable the feature entirely. We'll add a config option as an escape hatch for anybody who wants to develop the feature further. A more gentle option would be to add the config option to let people disable it manually, but leave it enabled by default. But given that there's no client side support, that seems like the wrong balance with security. Disabling by default will slow adoption a bit once client-side support does become available (there were some patches[1] in 2022, but nothing got merged and there's been nothing since). But clients have to deal with older servers that do not understand the option anyway (and the capability system handles that), so it will just be a matter of servers flipping their config at that point (and hopefully once any unbounded allocations have been addressed). [jk: this is a patch that GitHub has been running for several years, but rebased forward and with a new commit message for upstream] [1] https://lore.kernel.org/git/20220208231911.725273-1-calvinwan@google.com/ Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: accept only a single packfile-uri lineJeff King2-1/+5
When we see a packfile-uri line from the client, we use string_list_split() to split it on commas and store the result in a string_list. A single packfile-uri line is therefore limited to storing ~64kb, the size of a pkt-line. But we'll happily accept multiple such lines, and each line appends to the string list, growing without bound. In theory this could be useful, making: 0017packfile-uris http 0018packfile-uris https equivalent to: 001dpackfile-uris http,https But the protocol documentation doesn't indicate that this should work (and indeed, refers to this in the singular as "the following argument can be included in the client's request"). And the client-side implementation in fetch-pack has always sent a single line (JGit appears to understand the line on the server side but has no client-side implementation, and libgit2 understands neither). If we were worried about compatibility, we could instead just put a limit on the maximum number of values we'd accept. The current client implementation limits itself to only two values: "http" and "https", so something like "256" would be more than enough. But accepting only a single line seems more in line with the protocol documentation, and matches other parts of the protocol (e.g., we will not accept a second "filter" line). We'll also make this more explicit in the protocol documentation; as above, I think this was always the intent, but there's no harm in making it clear. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: use a strmap for want-ref linesJeff King2-14/+19
When the "ref-in-want" capability is advertised (which it is not by default), then upload-pack processes a "want-ref" line from the client by checking that the name is a valid ref and recording it in a string-list. In theory this list should grow no larger than the number of refs in the server-side repository. But since we don't do any de-duplication, a client which sends "want-ref refs/heads/foo" over and over will cause the array to grow without bound. We can fix this by switching to strmap, which efficiently detects duplicates. There are two client-visible changes here: 1. The "wanted-refs" response will now be in an apparently-random order (based on iterating the hashmap) rather than the order given by the client. The protocol documentation is quiet on ordering here. The current fetch-pack implementation is happy with any order, as it looks up each returned ref using a binary search in its local sorted list. JGit seems to implement want-ref on the server side, but has no client-side support. libgit2 doesn't support either side. It would obviously be possible to record the original order or to use the strmap as an auxiliary data structure. But if the client doesn't care, we may as well do the simplest thing. 2. We'll now reject duplicates explicitly as a protocol error. The client should never send them (and our current implementation, even when asked to "git fetch master:one master:two" will de-dup on the client side). If we wanted to be more forgiving, we could perhaps just throw away the duplicates. But then our "wanted-refs" response back to the client would omit the duplicates, and it's hard to say what a client that accidentally sent a duplicate would do with that. So I think we're better off to complain loudly before anybody accidentally writes such a client. Let's also add a note to the protocol documentation clarifying that duplicates are forbidden. As discussed above, this was already the intent, but it's not very explicit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: use oidset for deepen_not listJeff King1-10/+11
We record the oid of every deepen-not line the client sends to us. For a well-behaved client, the resulting array should be bounded by the number of unique refs we have. But because there's no de-duplication, a malicious client can cause the array to grow unbounded by just sending the same "refs/heads/foo" over and over (assuming such a ref exists). Since the deepen-not list is just being fed to a "rev-list --not" traversal, the order of items doesn't matter. So we can replace the oid_array with an oidset which notices and skips duplicates. That bounds the memory in malicious cases to be linear in the number of unique refs. And even in non-malicious cases, there may be a slight improvement in memory usage if multiple refs point to the same oid (though in practice this list is probably pretty tiny anyway, as it comes from the user specifying "--shallow-exclude" on the client fetch). Note that in the trace2 output we'll now output the number of de-duplicated objects, rather than the total number of "deepen-not" lines we received. This is arguably a more useful value for tracing / debugging anyway. Reported-by: Benjamin Flesch <benjaminflesch@icloud.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: switch deepen-not list to an oid_arrayJeff King1-7/+7
When we see a "deepen-not" line from the client, we verify that the given name can be resolved as a ref, and then add it to a string list to be passed later to an internal "rev-list --not" traversal. We record the actual refname in the string list (so the traversal resolves it again later), but we'd be better off recording the resolved oid: 1. There's a tiny bit of wasted work in resolving it twice. 2. There's a small race condition with simultaneous updates; the later traversal may resolve to a different value (or not at all). This shouldn't cause any bad behavior (we do not care about the value in this first resolution, so whatever value rev-list gets is OK) but it could mean a confusing error message (if upload-pack fails to resolve the ref it produces a useful message, but a failing traversal later results in just "revision walk setup failed"). 3. It makes it simpler to de-duplicate the results. We don't de-dup at all right now, but we will in the next patch. >From the client's perspective the behavior should be the same. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: drop separate v2 "haves" arrayJeff King1-38/+10
When upload-pack sees a "have" line in the v0 protocol, it immediately calls got_oid() with its argument and potentially produces an ACK response. In the v2 protocol, we simply record the argument in an oid_array, and only later process all of the "have" objects by calling the equivalent of got_oid() on the contents of the array. This makes some sense, as v2 is a pure request/response protocol, as opposed to v0's asynchronous negotiation phase. But there's a downside: a client can send us an infinite number of garbage "have" lines, which we'll happily slurp into the array, consuming memory. Whereas in v0, they are limited by the number of objects in the repository (because got_oid() only records objects we have ourselves, and we avoid duplicates by setting a flag on the object struct). We can make v2 behave more like v0 by also calling got_oid() directly when v2 parses a "have" line. Calling it early like this is OK because got_oid() itself does not interact with the client; it only confirms that we have the object and sets a few flags. Note that unlike v0, v2 does not ever (before or after this patch) check the return code of got_oid(), which lets the caller know whether we have the object. But again, that makes sense; v0 is using it to asynchronously tell the client to stop sending. In v2's synchronous protocol, we just discard those entries (and decide how to ACK at the end of each round). There is one slight tweak we need, though. In v2's state machine, we reach the SEND_ACKS state if the other side sent us any "have" lines, whether they were useful or not. Right now we do that by checking whether the "have" array had any entries, but if we record only the useful ones, that doesn't work. Instead, we can add a simple boolean that tells us whether we saw any have line (even if it was useless). This lets us drop the "haves" array entirely, as we're now placing objects directly into the "have_obj" object array (which is where got_oid() put them in the long run anyway). And as a bonus, we can drop the secondary "common" array used in process_haves_and_send_acks(). It was essentially a copy of "haves" minus the objects we do not have. But now that we are using "have_obj" directly, we know everything in it is useful. So in addition to protecting ourselves against malicious input, we should slightly lower our memory usage for normal inputs. Note that there is one user-visible effect. The trace2 output records the number of "haves". Previously this was the total number of "have" lines we saw, but now is the number of useful ones. We could retain the original meaning by keeping a separate counter, but it doesn't seem worth the effort; this trace info is for debugging and metrics, and arguably the count of common oids is at least as useful as the total count. Reported-by: Benjamin Flesch <benjaminflesch@icloud.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28revision: implement `git log --merge` also for rebase/cherry-pick/revertMichael Lohmann2-10/+28
'git log' learned in ae3e5e1ef2 (git log -p --merge [[--] paths...], 2006-07-03) to show commits touching conflicted files in the range HEAD...MERGE_HEAD, an addition documented in d249b45547 (Document rev-list's option --merge, 2006-08-04). It can be useful to look at the commit history to understand what lead to merge conflicts also for other mergy operations besides merges, like cherry-pick, revert and rebase. For rebases and cherry-picks, an interesting range to look at is HEAD...{REBASE_HEAD,CHERRY_PICK_HEAD}, since even if all the commits included in that range are not directly part of the 3-way merge, conflicts encountered during these operations can indeed be caused by changes introduced in preceding commits on both sides of the history. For revert, as we are (most likely) reversing changes from a previous commit, an appropriate range is REVERT_HEAD..HEAD, which is equivalent to REVERT_HEAD...HEAD and to HEAD...REVERT_HEAD, if we keep HEAD and its parents on the left side of the range. As such, adjust the code in prepare_show_merge so it constructs the range HEAD...$OTHER for OTHER={MERGE_HEAD, CHERRY_PICK_HEAD, REVERT_HEAD or REBASE_HEAD}. Note that we try these pseudorefs in order, so keep REBASE_HEAD last since the three other operations can be performed during a rebase. Note also that in the uncommon case where $OTHER and HEAD do not share a common ancestor, this will show the complete histories of both sides since their root commits, which is the same behaviour as currently happens in that case for HEAD and MERGE_HEAD. Adjust the documentation of this option accordingly. Co-authored-by: Johannes Sixt <j6t@kdbg.org> Co-authored-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Michael Lohmann <mi.al.lohmann@gmail.com> [jc: tweaked in j6t's precedence fix that tries REBASE_HEAD last] Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28revision: ensure MERGE_HEAD is a ref in prepare_show_mergeMichael Lohmann1-1/+5
This is done to (1) ensure MERGE_HEAD is a ref, (2) obtain the oid without any prefixing by refs.c:repo_dwim_ref() (3) error out when MERGE_HEAD is a symref. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Michael Lohmann <mi.al.lohmann@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28commit-reach(repo_in_merge_bases_many): report missing commitsJohannes Schindelin12-36/+173
Some functions in Git's source code follow the convention that returning a negative value indicates a fatal error, e.g. repository corruption. Let's use this convention in `repo_in_merge_bases()` to report when one of the specified commits is missing (i.e. when `repo_parse_commit()` reports an error). Also adjust the callers of `repo_in_merge_bases()` to handle such negative return values. Note: As of this patch, errors are returned only if any of the specified merge heads is missing. Over the course of the next patches, missing commits will also be reported by the `paint_down_to_common()` function, which is called by `repo_in_merge_bases_many()`, and those errors will be properly propagated back to the caller at that stage. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28commit-reach(repo_in_merge_bases_many): optionally expect missing commitsJohannes Schindelin5-9/+12
Currently this function treats unrelated commit histories the same way as commit histories with missing commit objects. Typically, missing commit objects constitute a corrupt repository, though, and should be reported as such. The next commits will make it so, but there is one exception: In `git fetch --update-shallow` we _expect_ commit objects to be missing, and we do want to treat the now-incomplete commit histories as unrelated. To allow for that, let's introduce an additional parameter that is passed to `repo_in_merge_bases_many()` to trigger this behavior, and use it in the two callers in `shallow.c`. This commit changes behavior slightly: unless called from the `shallow.c` functions that set the `ignore_missing_commits` bit, any non-existing tip commit that is passed to `repo_in_merge_bases_many()` will now result in an error. Note: When encountering missing commits while traversing the commit history in search for merge bases, with this commit there won't be a change in behavior just yet, their children will still be interpreted as root commits. This bug will get fixed by follow-up commits. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28commit-reach(paint_down_to_common): plug two memory leaksJohannes Schindelin1-1/+4
When a commit is missing, we return early (currently pretending that no merge basis could be found in that case). At that stage, it is possible that a merge base could have been found already, and added to the `result`, which is now leaked. The priority queue has a similar issue: There might still be a commit in that queue. Let's release both, to address the potential memory leaks. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28revision: fix --missing=[print|allow*] for annotated tagsChristian Couder2-7/+25
In 9830926c7d (rev-list: add commit object support in `--missing` option, 2023-10-27) we fixed the `--missing` option in `git rev-list` so that it works with missing commits, not just blobs/trees. Unfortunately, such a command was still failing with a "fatal: bad object <oid>" if it was passed a missing commit, blob or tree as an argument (before the rev walking even begins). This was fixed in a recent commit. That fix still doesn't work when an argument passed to the command is an annotated tag pointing to a missing commit though. In that case `git rev-list --missing=...` still errors out with a "fatal: bad object <oid>" error where <oid> is the object ID of the missing commit. Let's fix this issue, and also, while at it, let's add tests not just for annotated tags but also for regular tags and branches. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27The second batchJunio C Hamano1-0/+21
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27Merge branch 'jb/doc-interactive-singlekey-do-not-need-perl'Junio C Hamano1-3/+1
Doc clean-up. * jb/doc-interactive-singlekey-do-not-need-perl: doc: remove outdated information about interactive.singleKey
2024-02-27Merge branch 'jk/t0303-clean'Junio C Hamano1-2/+24
Test clean-up. * jk/t0303-clean: t0303: check that helper_test_clean removes all credentials
2024-02-27Merge branch 'mh/libsecret-empty-password-fix'Junio C Hamano1-0/+3
Credential helper based on libsecret (in contrib/) has been updated to handle an empty password correctly. * mh/libsecret-empty-password-fix: libsecret: retrieve empty password
2024-02-27Merge branch 'bb/completion-no-grep-into-awk'Junio C Hamano1-2/+4
Some parts of command line completion script (in contrib/) have been micro-optimized. * bb/completion-no-grep-into-awk: completion: use awk for filtering the config entries
2024-02-27Merge branch 'km/mergetool-vimdiff-layout-fallback'Junio C Hamano3-9/+25
Variants of vimdiff learned to honor mergetool.<variant>.layout settings. * km/mergetool-vimdiff-layout-fallback: mergetools: vimdiff: use correct tool's name when reading mergetool config
2024-02-27Merge branch 'ba/credential-test-clean-fix'Junio C Hamano1-0/+1
Test clean-up. * ba/credential-test-clean-fix: t/lib-credential: clean additional credential
2024-02-27Merge branch 'rj/tag-column-fix'Junio C Hamano1-1/+2
"git tag --column" failed to check the exit status of its "git column" invocation, which has been corrected. * rj/tag-column-fix: tag: error when git-column fails
2024-02-27Merge branch 'jc/am-whitespace-doc'Junio C Hamano2-1/+5
"git am --help" now tells readers what actions are available in "git am --whitespace=<action>", in addition to saying that the option is passed through to the underlying "git apply". * jc/am-whitespace-doc: doc: add shortcut to "am --whitespace=<action>"
2024-02-27refs/reftable: don't fail empty transactions in repo without HEADPatrick Steinhardt2-0/+14
Under normal circumstances, it shouldn't ever happen that a repository has no HEAD reference. In fact, git-update-ref(1) would fail any request to delete the HEAD reference, and a newly initialized repository always pre-creates it, too. We have however changed git-clone(1) to partially initialize the refdb just up to the point where remote helpers can find the repository. With that change, we are going to run into a situation where repositories have no refs at all. Now there is a very particular edge case in this situation: when preparing an empty ref transacton, we end up returning whatever value `read_ref_without_reload()` returned to the caller. Under normal conditions this would be fine: "HEAD" should usually exist, and thus the function would return `0`. But if "HEAD" doesn't exist, the function returns a positive value which we end up returning to the caller. Fix this bug by resetting the return code to `0` and add a test. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27Merge branch 'ps/remote-helper-repo-initialization-fix' into ↵Junio C Hamano3-1/+59
ps/reftable-repo-init-fix * ps/remote-helper-repo-initialization-fix: builtin/clone: allow remote helpers to detect repo
2024-02-27completion: fix __git_complete_worktree_pathsRubén Justo2-1/+24
Use __git to invoke "worktree list" in __git_complete_worktree_paths, to respect any "-C" and "--git-dir" options present on the command line. Signed-off-by: Rubén Justo <rjusto@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27builtin/clone: allow remote helpers to detect repoPatrick Steinhardt3-1/+59
In 18c9cb7524 (builtin/clone: create the refdb with the correct object format, 2023-12-12), we have changed git-clone(1) so that it delays creation of the refdb until after it has learned about the remote's object format. This change was required for the reftable backend, which encodes the object format into the tables. So if we pre-initialized the refdb with the default object format, but the remote uses a different object format than that, then the resulting tables would have encoded the wrong object format. This change unfortunately breaks remote helpers which try to access the repository that is about to be created. Because the refdb has not yet been initialized at the point where we spawn the remote helper, we also don't yet have "HEAD" or "refs/". Consequently, any Git commands ran by the remote helper which try to access the repository would fail because it cannot be discovered. This is essentially a chicken-and-egg problem: we cannot initialize the refdb because we don't know about the object format. But we cannot learn about the object format because the remote helper may be unable to access the partially-initialized repository. Ideally, we would address this issue via capabilities. But the remote helper protocol is not structured in a way that guarantees that the capability announcement happens before the remote helper tries to access the repository. Instead, fix this issue by partially initializing the refdb up to the point where it becomes discoverable by Git commands. Reported-by: Mike Hommey <mh@glandium.org> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27rebase -i: stop setting GIT_CHERRY_PICK_HELPPhillip Wood3-12/+18
Setting this environment variable causes the sequencer to display a custom message when it stops for the user to resolve conflicts and remove CHERRY_PICK_HEAD. Setting it in "git rebase" is a vestige of the scripted implementation, now that it is a builtin command we do not need to communicate with the sequencer machinery via environment variables. Move the conflicts advice to use when rebasing into sequencer.c so we do not need to pass it via the environment. Note that we retain the changes in e4301f73fff (sequencer: unset GIT_CHERRY_PICK_HELP for 'exec' commands, 2024-02-02) just in case GIT_CHERRY_PICK_HELP is set in the environment when "git rebase" is run. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27git: extend --no-lazy-fetch to work across subprocessesJunio C Hamano5-0/+29
Modeling after how the `--no-replace-objects` option is made usable across subprocess spawning (e.g., cURL based remote helpers are spawned as a separate process while running "git fetch"), allow the `--no-lazy-fetch` option to be passed across process boundaries. Do not model how the value of GIT_NO_REPLACE_OBJECTS environment variable is ignored, though. Just use the usual git_env_bool() to allow "export GIT_NO_LAZY_FETCH=0" and "unset GIT_NO_LAZY_FETCH" to be equivalents. Also do not model how the request is not propagated to subprocesses we spawn (e.g. "git clone --local" that spawns a new process to work in the origin repository, while the original one working in the newly created one) by the "--no-replace-objects" option, as this "do not lazily fetch from the promisor" is more about a per-request debugging aid, not "this repository's promisor should not be relied upon" property specific to a repository. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27commit: unify logic to avoid multiple scissors lines when mergingJosh Triplett1-5/+3
prepare_to_commit has some logic to figure out whether merge already added a scissors line, and therefore it shouldn't add another. Now that wt_status_add_cut_line has built-in state for whether it has already added a previous line, just set that state instead, and then remove that condition from subsequent calls to wt_status_add_cut_line. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27commit: avoid redundant scissor line with --cleanup=scissors -vJosh Triplett4-7/+17
`git commit --cleanup=scissors -v` prints two scissors lines: one at the start of the comment lines, and the other right before the diff. This is redundant, and pushes the diff further down in the user's editor than it needs to be. Make wt_status_add_cut_line() remember if it has added a cut line before, and avoid adding a redundant one. Add a test for this. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27doc: clarify the wording on <git-compat-util.h> requirementJunio C Hamano1-6/+35
The reason why we require the <git-compat-util.h> file to be the first header file to be included is because it insulates other header files and source files from platform differences, like which system header files must be included in what order, and what C preprocessor feature macros must be defined to trigger certain features we want out of the system. We tried to clarify the rule in the coding guidelines document, but the wording was a bit fuzzy that can lead to misinterpretations like you can include <xdiff/xinclude.h> only to avoid having to include <git-compat-util.h> even if you have nothing to do with the xdiff implementation, for example. "You do not have to include more than one of these" was also misleading and would have been puzzling if you _needed_ to depend on more than one of these approved headers (answer: you are allowed to include them all if you need the declarations in them for reasons other than that you want to avoid including compat-util yourself). Instead of using the phrase "approved headers", enumerate them as exceptions, each labeled with its intended audiences, to avoid such misinterpretations. The structure also makes it easier to add new exceptions, so add the description of "t/unit-tests/test-lib.h" being an exception only for the unit tests implementation as an example. Signed-off-by: Junio C Hamano <gitster@pobox.com> Acked-by: Kyle Lippincott <spectral@google.com> Acked-by: Elijah Newren <newren@gmail.com>
2024-02-27rebase: fix typo in autosquash documentationRichard Macklin1-1/+1
This is a minor follow-up to cb00f524df (rebase: rewrite --(no-)autosquash documentation, 2023-11-14) to fix a typo introduced in that commit. Signed-off-by: Richard Macklin <code@rmacklin.dev> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26git: document GIT_NO_REPLACE_OBJECTS environment variableJunio C Hamano1-2/+8
This variable is used as the primary way to disable the object replacement mechanism, with the "--no-replace-objects" command line option as an end-user visible way to set it, but has not been documented. The original reason why it was left undocumented might be because it was meant as an internal implementation detail, but the thing is, that our tests use the environment variable directly without the command line option, and there certainly are folks who learned its use from there, making it impossible to deprecate or change its behaviour by now. Add documentation and note that for this variable, unlike many boolean-looking environment variables, only the presence matters, not what value it is set to. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26Start the 2.45 cycleJunio C Hamano2-1/+41
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26Merge branch 'ps/ref-tests-update-even-more'Junio C Hamano10-140/+125
More tests that are marked as "ref-files only" have been updated to improve test coverage of reftable backend. * ps/ref-tests-update-even-more: t7003: ensure filter-branch prunes reflogs with the reftable backend t2011: exercise D/F conflicts with HEAD with the reftable backend t1405: remove unneeded cleanup step t1404: make D/F conflict tests compatible with reftable backend t1400: exercise reflog with gaps with reftable backend t0410: convert tests to use DEFAULT_REPO_FORMAT prereq t: move tests exercising the "files" backend
2024-02-26Merge branch 'gt/at-is-synonym-for-head-in-add-patch'Junio C Hamano11-82/+92
Teach "git checkout -p" and friends that "@" is a synonym for "HEAD". * gt/at-is-synonym-for-head-in-add-patch: add -p tests: remove PERL prerequisites add-patch: classify '@' as a synonym for 'HEAD'
2024-02-26Merge branch 'kh/column-reject-negative-padding'Junio C Hamano3-0/+17
"git column" has been taught to reject negative padding value, as it would lead to nonsense behaviour including division by zero. * kh/column-reject-negative-padding: column: guard against negative padding column: disallow negative padding
2024-02-26Merge branch 'jc/t9210-lazy-fix'Junio C Hamano1-1/+8
Adjust use of "rev-list --missing" in an existing tests so that it does not depend on a buggy failure mode. * jc/t9210-lazy-fix: t9210: do not rely on lazy fetching to fail
2024-02-26Merge branch 'ps/reftable-iteration-perf'Junio C Hamano7-37/+100
The code to iterate over refs with the reftable backend has seen some optimization. * ps/reftable-iteration-perf: reftable/reader: add comments to `table_iter_next()` reftable/record: don't try to reallocate ref record name reftable/block: swap buffers instead of copying reftable/pq: allocation-less comparison of entry keys reftable/merged: skip comparison for records of the same subiter reftable/merged: allocation-less dropping of shadowed records reftable/record: introduce function to compare records by key
2024-02-26Merge branch 'rs/use-xstrncmpz'Junio C Hamano10-16/+38
Code clean-up. * rs/use-xstrncmpz: use xstrncmpz()
2024-02-26Merge branch 'cp/apply-core-filemode'Junio C Hamano2-3/+40
"git apply" on a filesystem without filemode support have learned to take a hint from what is in the index for the path, even when not working with the "--index" or "--cached" option, when checking the executable bit match what is required by the preimage in the patch. * cp/apply-core-filemode: apply: code simplification apply: correctly reverse patch's pre- and post-image mode bits apply: ignore working tree filemode when !core.filemode
2024-02-26Merge branch 'ps/reftable-backend'Junio C Hamano17-7/+3248
Integrate the reftable code into the refs framework as a backend. * ps/reftable-backend: refs/reftable: fix leak when copying reflog fails ci: add jobs to test with the reftable backend refs: introduce reftable backend
2024-02-26fsmonitor: remove custom loop from non-directory path handlerJeff Hostetler1-24/+31
Refactor the code that handles refresh events for pathnames that do not contain a trailing slash. Instead of using a custom loop to try to scan the index and detect if the FSEvent named a file or might be a directory prefix, use the recently created helper function to do that. Also update the comments to describe what and why we are doing this. On platforms that DO NOT annotate FS events with a trailing slash, if we fail to find an exact match for the pathname in the index, we do not know if the pathname represents a directory or simply an untracked file. Pretend that the pathname is a directory and try again before assuming it is an untracked file. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26fsmonitor: return invalidated cache-entry count on directory eventJeff Hostetler1-1/+13
Teach the refresh callback helper function for directory FSEvents to return the number of cache-entries that were invalidated in response to a directory event. This will be used in a later commit to help determine if the observed pathname in the FSEvent was a (possibly) case-incorrect directory prefix (on a case-insensitive filesystem) of one or more actual cache-entries. If there exists at least one case-insensitive prefix match, then we can assume that the directory is a (case-incorrect) prefix of at least one tracked item rather than a completely unknown/untracked file or directory. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26fsmonitor: move untracked-cache invalidation into helper functionsJeff Hostetler1-7/+19
Move the call to invalidate the untracked-cache for the FSEvent pathname into the two helper functions. In a later commit in this series, we will call these helpers from other contexts and it safer to include the UC invalidation in the helpers than to remember to also add it to each helper call-site. This has the side-effect of invalidating the UC *before* we invalidate the ce_flags in the cache-entry. These activities are independent and do not affect each other. Also, by doing the UC work first, we can avoid worrying about "early returns" or the need for the usual "goto the end" in each of the handler functions. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>