git - Mirror of https://git.kernel.org/pub/scm/git/git.git/

Age	Commit message (Collapse)	Author	Files	Lines
2023-09-14	The ninth batch	Junio C Hamano	1	-0/+26
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-13	The eighth batch	Junio C Hamano	1	-0/+2
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-11	diff --no-index: fix -R with stdin	René Scharfe	2	-0/+20
	When -R is given, queue_diff() swaps the mode and name variables of the two files to produce a reverse diff. 1e3f26542a (diff --no-index: support reading from named pipes, 2023-07-05) added variables that indicate whether files are special, i.e named pipes or - for stdin. These new variables were not swapped, though, which broke the handling of stdin with with -R. Swap them like the other metadata variables. Reported-by: Martin Storsjö <martin@martin.st> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-08	completion(switch/checkout): treat --track and -t the same	Johannes Schindelin	2	-4/+12
	When `git switch --track ` is to be completed, only remote refs are eligible because that is what the `--track` option targets. And when the short-hand `-t` is used instead, the same _should_ happen. Let's make it so. Note that the bug exists both in the completions of `switch` and `completion`, even if it manifests in slightly different ways: While the completion of `git switch -t ` will not even look at remote refs, the completion of `git checkout -t ` will look at both remote _and_ local refs. Both should look only at remote refs. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-07	The seventh batch	Junio C Hamano	1	-0/+23
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-07	grep: reject --no-or	René Scharfe	1	-1/+1
	Since 3e230fa1b2 (grep: use parseopt, 2009-05-07) git grep has been accepting the option --no-or. It does the same as --or: nothing. That's confusing and unintended. Forbid negating --or. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-06	rebase -i: fix adding failed command to the todo list	Phillip Wood	3	-23/+40
	When rebasing commands are moved from the todo list in "git-rebase-todo" to the "done" file (which is used by "git status" to show the recently executed commands) just before they are executed. This means that if a command fails because it would overwrite an untracked file it has to be added back into the todo list before the rebase stops for the user to fix the problem. Unfortunately when a failed command is added back into the todo list the command preceding it is erroneously appended to the "done" file. This means that when rebase stops after "pick B" fails the "done" file contains pick A pick B pick A instead of pick A pick B This happens because save_todo() updates the "done" file with the previous command whenever "git-rebase-todo" is updated. When we add the failed pick back into "git-rebase-todo" we do not want to update "done". Fix this by adding a "reschedule" parameter to save_todo() which prevents the "done" file from being updated when adding a failed command back into the "git-rebase-todo" file. A couple of the existing tests are modified to improve their coverage as none of them trigger this bug or check the "done" file. Reported-by: Stefan Haller <lists@haller-berlin.de> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-06	rebase --continue: refuse to commit after failed command	Phillip Wood	3	-1/+26
	If a commit cannot be picked because it would overwrite an untracked file then "git rebase --continue" should refuse to commit any staged changes as the commit was not picked. This is implemented by refusing to commit if the message file is missing. The message file is chosen for this check because it is only written when "git rebase" stops for the user to resolve merge conflicts. Existing commands that refuse to commit staged changes when continuing such as a failed "exec" rely on checking for the absence of the author script in run_git_commit(). This prevents the staged changes from being committed but prints error: could not open '.git/rebase-merge/author-script' for reading before the message about not being able to commit. This is confusing to users and so checking for the message file instead improves the user experience. The existing test for refusing to commit after a failed exec is updated to check that we do not print the error message about a missing author script anymore. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-06	rebase: fix rewritten list for failed pick	Phillip Wood	4	-14/+63
	git rebase keeps a list that maps the OID of each commit before it was rebased to the OID of the equivalent commit after the rebase. This list is used to drive the "post-rewrite" hook that is called at the end of a successful rebase. When a rebase stops for the user to resolve merge conflicts the OID of the commit being picked is written to ".git/rebase-merge/stopped-sha". Then when the rebase is continued that OID is added to the list of rewritten commits. Unfortunately if a commit cannot be picked because it would overwrite an untracked file we still write the "stopped-sha1" file. This means that when the rebase is continued the commit is added into the list of rewritten commits even though it has not been picked yet. Fix this by not calling error_with_patch() for failed commands. The pick has failed so there is nothing to commit and therefore we do not want to set up the state files for committing staged changes when the rebase continues. This change means we no-longer write a patch for the failed command or display the error message printed by error_with_patch(). As the command has failed the patch isn't really useful and in any case the user can inspect the commit associated with the failed command by inspecting REBASE_HEAD. Unless the user has disabled it we already print an advice message that is more helpful than the message from error_with_patch() which the user will still see. Even if the advice is disabled the user will see the messages from the merge machinery detailing the problem. The code to add a failed command back into the todo list is duplicated between pick_one_commit() and the loop in pick_commits(). Both sites print advice about the command being rescheduled, decrement the current item and save the todo list. To avoid duplicating this code pick_one_commit() is modified to set a flag to indicate that the command should be rescheduled in the main loop. This simplifies things as only the remaining copy of the code needs to be modified to set REBASE_HEAD rather than calling error_with_patch(). Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-06	sequencer: factor out part of pick_commits()	Phillip Wood	1	-61/+71
	This simplifies the next commit. If a pick fails we now return the error at the end of the loop body rather than returning early, a successful "edit" command continues to return early. There are three things to check to ensure that removing the early return for an error does not change the behavior of the code: (1) We could enter the block guarded by "if (reschedule)". This block is not entered because "reschedlue" is always zero when picking a commit. (2) We could enter the block guarded by "else if (is_rebase_i(opts) && check_todo && !res)". This block is not entered when returning an error because "res" is non-zero in that case. (3) todo_list->current could be incremented before returning. That is avoided by moving the increment which is of course a potential change in behavior itself. The move is safe because none of the callers look at todo_list after this function returns. Moving the increment makes it clear we only want to advance the current item if the command was successful. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-06	sequencer: use rebase_path_message()	Phillip Wood	1	-5/+2
	Rather than constructing the path in a struct strbuf use the ready made function to get the path name instead. This was the last remaining use of the strbuf so remove it as well. As with the previous patch we now use a hard coded string rather than git_dir() when constructing the path. This is safe for the same reason (make_patch() is only called when rebasing) and is protected by the assertion added in the previous patch. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-06	rebase -i: remove patch file after conflict resolution	Phillip Wood	2	-4/+30
	When a rebase stops for the user to resolve conflicts it writes a patch for the conflicting commit to .git/rebase-merge/patch. This file has been written since the introduction of "git-rebase-interactive.sh" in 1b1dce4bae7 (Teach rebase an interactive mode, 2007-06-25). I assume the idea was to enable the user inspect the conflicting commit in the same way as they could for the patch based rebase. This file should be deleted when the rebase continues as if the rebase stops for a failed "exec" command or a "break" command it is confusing to the user if there is a stale patch lying around from an unrelated command. As the path is now used in two different places rebase_path_patch() is added and used to obtain the path for the patch. To construct the path write_patch() previously used get_dir() which returns different paths depending on whether we're rebasing or cherry-picking/reverting. As this function is only called when rebasing it is safe to use a hard coded string for the directory instead. An assertion is added to make sure we don't starting calling this function when cherry-picking in the future. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-06	rebase -i: move unlink() calls	Phillip Wood	1	-3/+4
	At the start of each iteration the loop that picks commits removes the state files from the previous pick. However some of these files are only written if there are conflicts in which case we exit the loop before the end of the loop body. Therefore they only need to be removed when the rebase continues, not at the start of each iteration. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-06	doc/diff-options: fix link to generating patch section	Sergey Organov	1	-7/+1
	When formatted as man-page, the section title is rendered "GENERATING PATCH TEXT WITH -P" whereas reference still reads "Generating patch text with -p", that is inconsistent and makes searching harder than it needs to be. Fix this by getting rid of custom reference text. Also, documentation for every command that describes `-p` option by including the "diff-options.txt" file does include the "diff-generate-patch.txt" file as well (as it should), so the internal link is in fact useful for any of them. Fix this by getting rid of conditionals around the reference. Fixes: ebdc46c242 (docs: link generating patch sections) Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05	var: avoid a segmentation fault when `HOME` is unset	Johannes Schindelin	2	-1/+10
	The code introduced in 576a37fccbf (var: add attributes files locations, 2023-06-27) paid careful attention to use `xstrdup()` for pointers known never to be `NULL`, and `xstrdup_or_null()` otherwise. One spot was missed, though: `git_attr_global_file()` can return `NULL`, when the `HOME` variable is not set (and neither `XDG_CONFIG_HOME`), a scenario not too uncommon in certain server scenarios. Fix this, and add a test case to avoid future regressions. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05	sequencer: fix error message on failure to copy SQUASH_MSG	Oswald Buddenhagen	1	-1/+1
	The message talked about renaming, while the actual action is copying. This was introduced by 6e98de72c ("sequencer (rebase -i): add support for the 'fixup' and 'squash' commands", 2017-01-02). Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Acked-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05	grep: use OPT_INTEGER_F for --max-depth	René Scharfe	1	-3/+2
	a91f453f64 (grep: Add --max-depth option., 2009-07-22) added the option --max-depth, defining it using a positional struct option initializer of type OPTION_INTEGER. It also sets defval to 1 for some reason, but that value would only be used if the flag PARSE_OPT_OPTARG was given. Use the macro OPT_INTEGER_F instead to standardize the definition and specify only the necessary values. This also normalizes argh to N_("n") as a side-effect, which is OK. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05	name-rev: use OPT_HIDDEN_BOOL for --peel-tag	René Scharfe	1	-6/+2
	adfc1857bd (describe: fix --contains when a tag is given as input, 2013-07-18) added the option --peel-tag, defining it using a positional struct option initializer and a comment indicating that it's intended to be a hidden OPT_BOOL. 4741edd549 (Remove deprecated OPTION_BOOLEAN for parsing arguments, 2013-08-03) added the macro OPT_HIDDEN_BOOL, which allows to express this more succinctly. Use it. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05	ref-filter: sort numerically when ":size" is used	Kousik Sanagavarapu	2	-10/+26
	Atoms like "raw" and "contents" have a ":size" option which can be used to know the size of the data. Since these atoms have the cmp_type FIELD_STR, they are sorted alphabetically from 'a' to 'z' and '0' to '9'. Meaning, even when the ":size" option is used and what we ultimatlely have is numbers, we still sort alphabetically. For example, consider the the following case in a repo refname contents:size raw:size ======= ============= ======== refs/heads/branch1 1130 1210 refs/heads/master 300 410 refs/tags/v1.0 140 260 Sorting with "--format="%(refname) %(contents:size) --sort=contents:size" would give refs/heads/branch1 1130 refs/tags/v1.0.0 140 refs/heads/master 300 which is an alphabetic sort, while what one might really expect is refs/tags/v1.0.0 140 refs/heads/master 300 refs/heads/branch1 1130 which is a numeric sort (that is, a "$ sort -n file" as opposed to a "$ sort file", where "file" contains only the "contents:size" or "raw:size" info, each of which is on a newline). Same is the case with "--sort=raw:size". So, sort numerically whenever the sort is done with "contents:size" or "raw:size" and do it the normal alphabetic way when "contents" or "raw" are used with some other option (they are FIELD_STR anyways). Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05	parse-options: mark unused parameters in noop callback	Jeff King	1	-1/+3
	Unsurprisingly, the noop options callback doesn't bother to look at any of its parameters. Let's mark them so that -Wunused-parameter does not complain. Another option would be to drop the callback and have parse-options itself recognize OPT_NOOP_NOARG. But that seems like extra work for no real benefit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05	interpret-trailers: mark unused "unset" parameters in option callbacks	Jeff King	1	-3/+6
	There are a few parse-option callbacks that do not look at their "unset" parameters, but also do not set PARSE_OPT_NONEG. At first glance this seems like a bug, as we'd ignore "--no-if-exists", etc. But they do work fine, because when "unset" is true, then "arg" is NULL. And all three functions pass "arg" on to helper functions which do the right thing with the NULL. Note that this shortcut would not be correct if any callback used PARSE_OPT_NOARG (in which case "arg" would be NULL but "unset" would be false). But none of these do. So the code is fine as-is. But we'll want to mark the unused "unset" parameters to quiet -Wunused-parameter. I've also added a comment to make this rather subtle situation more explicit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05	parse-options: add more BUG_ON() annotations	Jeff King	2	-0/+4
	These callbacks are similar to the ones touched by 517fe807d6 (assert NOARG/NONEG behavior of parse-options callbacks, 2018-11-05), but were either missed in that commit (the one in add.c) or were added later (the one in log.c). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05	merge: do not pass unused opt->value parameter	Jeff King	1	-1/+1
	The option_parse_strategy() callback does not look at opt->value; instead it calls append_strategy(), which manipulates the global use_strategies array directly. But the OPT_CALLBACK declaration assigns "&use_strategies" to opt->value. One could argue this is good, as it tells the reader what we generally expect the callback to do. But it is also bad, because it can mislead you into thinking that swapping out "&use_strategies" there might have any effect. Let's switch it to pass NULL (which is what every other "does not bother to look at opt->value" callback does). If you want to know what the callback does, it's easy to read the function itself. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05	parse-options: mark unused "opt" parameter in callbacks	Jeff King	6	-12/+14
	The previous commit argued that parse-options callbacks should try to use opt->value rather than touching globals directly. In some cases, however, that's awkward to do. Some callbacks touch multiple variables, or may even just call into an abstracted function that does so. In some of these cases we _could_ convert them by stuffing the multiple variables into a single struct and passing the struct pointer through opt->value. But that may make other parts of the code less readable, as the struct relationship has to be mentioned everywhere. Let's just accept that these cases are special and leave them as-is. But we do need to mark their "opt" parameters to satisfy -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05	parse-options: prefer opt->value to globals in callbacks	Jeff King	6	-37/+50
	We have several parse-options callbacks that ignore their "opt" parameters entirely. This is a little unusual, as we'd normally put the result of the parsing into opt->value. In the case of these callbacks, though, they directly manipulate global variables instead (and in most cases the caller sets opt->value to NULL in the OPT_CALLBACK declaration). The immediate symptom we'd like to deal with is that the unused "opt" variables trigger -Wunused-parameter. But how to fix that is debatable. One option is to annotate them with UNUSED. But another is to have the caller pass in the appropriate variable via opt->value, and use it. That has the benefit of making the callbacks reusable (in theory at least), and makes it clear from the OPT_CALLBACK declaration which variables will be affected (doubly so for the cases in builtin/fast-export.c, where we do set opt->value, but it is completely ignored!). The slight downside is that we lose type safety, since they're now passing through void pointers. I went with the "just use them" approach here. The loss of type safety is unfortunate, but that is already an issue with most of the other callbacks. If we want to try to address that, we should do so more consistently (and this patch would prepare these callbacks for whatever we choose to do there). Note that in the cases in builtin/fast-export.c, we are passing anonymous enums. We'll have to give them names so that we can declare the appropriate pointer type within the callbacks. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05	checkout-index: delay automatic setting of to_tempfile	Jeff King	2	-2/+27
	Using --stage=all requires writing to tempfiles, since we cannot put multiple stages into a single file. So --stage=all implies --temp. But we do so by setting to_tempfile in the options callback for --stage, rather than after all options have been parsed. This leads to two bugs: 1. If you run "checkout-index --stage=all --stage=2", this should not imply --temp, but it currently does. The callback cannot just unset to_tempfile when it sees the "2" value, because it no longer knows if its value was from the earlier --stage call, or if the user specified --temp explicitly. 2. If you run "checkout-index --stage=all --no-temp", the --no-temp will overwrite the earlier implied --temp. But this mode of operation cannot work, and the command will fail with "<path> already exists" when trying to write the higher stages. We can fix both by lazily setting to_tempfile. We'll make it a tristate, with -1 as "not yet given", and have --stage=all enable it only after all options are parsed. Likewise, after all options are parsed we can detect and reject the bogus "--no-temp" case. Note that this does technically change the behavior for "--stage=all --no-temp" for paths which have only one stage present (which accidentally worked before, but is now forbidden). But this behavior was never intended, and you'd have to go out of your way to try to trigger it. The new tests cover both cases, as well the general "--stage=all implies --temp", as most of the other tests explicitly say "--temp". Ironically, the test "checkout --temp within subdir" is the only one that _doesn't_ use "--temp", and so was implicitly covering this case. But it seems reasonable to have a more explicit test alongside the other related ones. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05	The sixth batch	Junio C Hamano	1	-0/+16
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-02	git-revert.txt: add discussion	Oswald Buddenhagen	1	-0/+10
	The section is inspired by git-commit.txt. Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-02	sequencer: beautify subject of reverts of reverts	Oswald Buddenhagen	2	-0/+34
	Instead of generating a silly-looking `Revert "Revert "foo""`, make it a more humane `Reapply "foo"`. This is done for two reasons: - To cover the actually common case of just a double revert. - To encourage people to rewrite summaries of recursive reverts by setting an example (a subsequent commit will also do this explicitly in the documentation). To achieve these goals, the mechanism does not need to be particularly sophisticated. Therefore, more complicated alternatives which would "compress more efficiently" have not been implemented. Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-01	The fifth batch	Junio C Hamano	1	-0/+14
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	treewide: fix various bugs w/ OpenSSL 3+ EVP API	Eric Wong	5	-3/+11
	The OpenSSL 3+ EVP API for SHA-* cannot support our prior use cases supported by other SHA-* implementations. It has the following differences: 1. ->init_fn is required before all use 2. struct assignments don't work and requires ->clone_fn 3. can't support ->update_fn after ->final_fn While fixing cases 1 and 2 is merely the matter of calling ->init_fn and ->clone_fn as appropriate, fixing case 3 requires calling ->final_fn on a temporary context that's cloned from the primary context. Reported-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/ZPCL11k38PXTkFga@debian.me/ Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Fixes: 3e440ea0aba0 ("sha256: avoid functions deprecated in OpenSSL 3+") Fixes: bda9c12073e7 ("avoid SHA-1 functions deprecated in OpenSSL 3+") Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	lower core.maxTreeDepth default to 2048	Jeff King	1	-1/+1
	On my Linux system, all of our recursive tree walking algorithms can run up to the 4096 default limit without segfaulting. But not all platforms will have stack sizes as generous (nor might even Linux if we kick off a recursive walk within a thread). In particular, several of the tests added in the previous few commits fail in our Windows CI environment. Through some guess-and-check pushing, I found that 3072 is still too much, but 2048 is OK. These are obviously vague heuristics, and there is nothing to promise that another system might not have trouble at even lower values. But it seems unlikely anybody will be too angry about a 2048-depth limit (this is close to the default max-pathname limit on Linux even for a pathological path like "a/a/a/..."). So let's just lower it. Some alternatives are: - configure separate defaults for Windows versus other platforms. - just skip the tests on Windows. This leaves Windows users with the annoying case that they can be crashed by running out of stack space, but there shouldn't be any security implications (they can't go deep enough to hit integer overflow problems). Since the original default was arbitrary, it seems less confusing to just lower it, keeping behavior consistent across platforms. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	tree-diff: respect max_allowed_tree_depth	Jeff King	2	-8/+24
	When diffing trees, we recurse to handle subtrees. That means we may run out of stack space and segfault. Let's teach this code path about core.maxTreeDepth in order to fail more gracefully. As with the previous patch, we have no way to return an error (and other tree-loading problems would just cause us to die()). So we'll likewise call die() if we exceed the maximum depth. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	list-objects: respect max_allowed_tree_depth	Jeff King	2	-0/+17
	The tree traversal in list-objects.c, which is used by "rev-list --objects", etc, uses recursion and may run out of stack space. Let's teach it about the new core.maxTreeDepth config option. We unfortunately can't return an error here, as this code doesn't produce an error return at all. We'll die() instead, which matches the behavior when we see an otherwise broken tree. Note that this will also generally reject such deep trees from entering the repository from a fetch or push, due to the use of rev-list in the connectivity check. But it's not foolproof! We stop traversing when we see an UNINTERESTING object, and the connectivity check marks existing ref tips as UNINTERESTING. So imagine commit X has a tree with maximum depth N. If you then create a new commit Y with a tree entry "Y:subdir" that points to "X^{tree}", then the depth of Y will be N+1. But a connectivity check running "git rev-list --objects Y --not X" won't realize that; it will stop traversing at X^{tree}, since that was already reachable. So this will stop naive pushes of too-deep trees, but not carefully crafted malicious ones. Doing it robustly and efficiently would require caching the maximum depth of each tree (i.e., the longest path to any leaf entry). That's much more complex and not strictly needed. If each recursive algorithm limits itself already, then that's sufficient. Blocking the objects from entering the repo would be a nice belt-and-suspenders addition, but it's not worth the extra cost. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	read_tree(): respect max_allowed_tree_depth	Jeff King	5	-4/+19
	The read_tree() function reads trees recursively (via its read_tree_at() helper). This can cause it to run out of stack space on very deep trees. Let's teach it about the new core.maxTreeDepth option. The easiest way to demonstrate this is via "ls-tree -r", which the test covers. Note that I needed a tree depth of ~30k to trigger a segfault on my Linux system, not the 4100 used by our "big" test in t6700. However, that test still tells us what we want: that the default 4096 limit is enough to prevent segfaults on all platforms. We could bump it, but that increases the cost of the test setup for little gain. As an interesting side-note: when I originally wrote this patch about 4 years ago, I needed a depth of ~50k to segfault. But porting it forward, the number is much lower. Seemingly little things like cf0983213c (hash: add an algo member to struct object_id, 2021-04-26) take it from 32,722 to 29,080. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	traverse_trees(): respect max_allowed_tree_depth	Jeff King	2	-0/+70
	The tree-walk.c code walks trees recursively, and may run out of stack space. The easiest way to see this is with git-archive; on my 64-bit Linux system it runs out of stack trying to generate a tarfile with a tree depth of 13,772. I've picked 4100 as the depth for our "big" test. I ran it with a much higher value to confirm that we do get a segfault without this patch. But really anything over 4096 is sufficient for its stated purpose, which is to find out if our default limit of 4096 is low enough to prevent segfaults on all platforms. Keeping it small saves us time on the test setup. The tree-walk code that's touched here underlies unpack_trees(), so this protects any programs which use it, not just git-archive (but archive is easy to test, and was what alerted me to this issue in a real-world case). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	add core.maxTreeDepth config	Jeff King	4	-0/+13
	Most of our tree traversal algorithms use recursion to visit sub-trees. For pathologically large trees, this can cause us to run out of stack space and abort in an uncontrolled way. Let's put our own limit here so that we can fail gracefully rather than segfaulting. In similar cases where we recursed along the commit graph, we rewrote the algorithms to avoid recursion and keep any stack data on the heap. But the commit graph is meant to grow without bound, whereas it's not an imposition to put a limit on the maximum size of tree we'll handle. And this has a bonus side effect: coupled with a limit on individual tree entry names, this limits the total size of a path we may encounter. This gives us an extra protection against code handling long path names which may suffer from integer overflows in the size (which could then be exploited by malicious trees). The default of 4096 is set to be much longer than anybody would care about in the real world. Even with single-letter interior tree names (like "a/b/c"), such a path is at least 8191 bytes. While most operating systems will let you create such a path incrementally, trying to reference the whole thing in a system call (as Git would do when actually trying to access it) will result in ENAMETOOLONG. Coupled with the recent fsck.largePathname warning, the maximum total pathname Git will handle is (by default) 16MB. This config option doesn't do anything yet; future patches will convert various algorithms to respect the limit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	fsck: detect very large tree pathnames	Jeff King	4	-1/+41
	In general, Git tries not to arbitrarily limit what it will store, and there are currently no limits at all on the size of the path we find in a tree. In theory you could have one that is gigabytes long. But in practice this freedom is not really helping anybody, and is potentially harmful: 1. Most operating systems have much lower limits for the size of a single pathname component (e.g., on Linux you'll generally get ENAMETOOLONG for anything over 255 bytes). And while you _can_ use Git in a way that never touches the filesystem (manipulating the index and trees directly), it's still probably not a good idea to have gigantic tree names. Many operations load and traverse them, so any clever Git-as-a-database scheme is likely to perform poorly in that case. 2. We still have a lot of code which assumes strings are reasonably sized, and I won't be at all surprised if you can trigger some interesting integer overflows with gigantic pathnames. Stopping malicious trees from entering the repository provides an extra line of defense, protecting downstream code. This patch implements an fsck check so that such trees can be rejected by transfer.fsckObjects. I've picked a reasonably high maximum depth here (4096) that hopefully should not bother anybody in practice. I've also made it configurable, as an escape hatch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	tree-walk: rename "error" variable	Jeff King	1	-3/+3
	The "error" variable in traverse_trees() shadows the global error() function (meaning we can't call error() from here). Let's call the local variable "ret" instead, which matches the idiom in other functions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	tree-walk: drop MAX_TRAVERSE_TREES macro	Jeff King	2	-3/+1
	Since the previous commit dropped the hard-coded limit in traverse_trees(), we don't need this macro there anymore (the code can handle any number of trees in parallel). We do define MAX_UNPACK_TREES using MAX_TRAVERSE_TREES, due to 5290d45134 (tree-walk.c: break circular dependency with unpack-trees, 2020-02-01). So we can just directly define that as "8" now; we know traverse_trees() can handle whatever we throw at it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	tree-walk: reduce stack size for recursive functions	Jeff King	2	-6/+13
	The traverse_trees() and traverse_trees_recursive() functions call each other recursively. In a deep tree, this can result in running out of stack space and crashing. There's obviously going to be some limit here based on available stack, but the problem is exacerbated by a few large structs, many of which we over-allocate. For example, in traverse_trees() we store a name_entry and tree_desc_x per tree, both of which contain an object_id (which is now 32 bytes). And we allocate 8 of them (from MAX_TRAVERSE_TREES), even though many traversals will only look at 1 or 2. Interestingly, we used to allocate these on the heap, prior to 8dd40c0472 (traverse_trees(): use stack array for name entries, 2020-01-30). That commit was trying to simplify away allocation size computations, and naively assumed that the sizes were small enough not to matter. And they don't in normal cases, but on my stock Debian system I see a crash running "git archive" on a tree with ~3600 entries. That's deep enough we wouldn't see it in practice, but probably shallow enough that we'd prefer not to make it a hard limit. Especially because other systems may have even smaller stacks. We can replace these stack variables with a few malloc invocations. This reduces the stack sizes for the two functions from 1128 and 752 bytes, respectively, down to 40 and 92 bytes. That allows a depth of ~13000 on my machine (the improvement isn't in linear proportion because my numbers don't count the size of parameters and other function overhead). The possible downsides are: 1. We now have to remember to free(). But both functions have an easy single exit (and already had to clean up other bits anyway). 2. The extra malloc()/free() overhead might be measurable. I tested this by setting up a 3000-depth tree with a single blob and running "git archive" on it. After switching to the heap, it consistently runs 2-3% faster! Presumably this is because the 1K+ of wasted stack space penalized memory caches. On a more real-world case like linux.git, the speed difference isn't measurable at all, simply because most trees aren't that deep and there's so much other work going on (like accessing the objects themselves). So the improvement I saw should be taken as evidence that we're not making anything worse, but isn't really that interesting on its own. The main motivation here is that we're now less likely to run out of stack space and crash. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	format-patch: use OPT_STRING_LIST for to/cc options	Jeff King	1	-20/+2
	The to_callback() and cc_callback() functions are identical to the generic parse_opt_string_list() function (except that they don't handle optional arguments, but that's OK because their callers do not use the OPTARG flag). Let's simplify the code by using OPT_STRING_LIST. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	merge: simplify parsing of "-n" option	Jeff King	1	-11/+2
	The "-n" option is implemented by an option callback, as it is really a "reverse bool". If given, it sets show_diffstat to 0. In theory, when negated, it would set the same flag to 1. But it's not possible to trigger that, since short options cannot be negated. So in practice this is really just a SET_INT to 0. Let's use that instead, which shortens the code. Note that negation here would do the wrong thing (as with any SET_INT with a value of "0"). We could specify PARSE_OPT_NONEG to future-proof ourselves against somebody adding a long option name (which would make it possible to negate). But there's not much point: 1. Nobody is going to do that, because the negated form already exists, and is called "--stat" (which is defined separately so that "--no-stat" works). 2. If they did, the BUG() check added by 3284b93862 (parse-options: disallow negating OPTION_SET_INT 0, 2023-08-08) will catch it (and that check is smart enough to realize that our short-only option is OK). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	merge: make xopts a strvec	Jeff King	1	-19/+7
	The "xopts" variable uses a custom array with ALLOC_GROW(). Using a strvec simplifies things a bit. We need fewer variables, and we can also ditch our custom parseopt callback in favor of OPT_STRVEC(). As a bonus, this means that "--no-strategy-option", which was previously a silent noop, now does something useful: like other list-like options, it will clear the list of -X options seen so far. This matches the behavior of revert/cherry-pick, which made the same change in fb60b9f37f (sequencer: use struct strvec to store merge strategy options, 2023-04-10). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31	format-patch: --rfc honors what --subject-prefix sets	Drew DeVault	3	-23/+48
	Rather than replacing the configured subject prefix (either through the git config or command line) entirely with "RFC PATCH", this change prepends RFC to whatever subject prefix was already in use. This is useful, for example, when a user is working on a repository that has a subject prefix considered to disambiguate patches: git config format.subjectPrefix 'PATCH my-project' Prior to this change, formatting patches with --rfc would lose the 'my-project' information. The data flow for the subject-prefix was that rev.subject_prefix were to be kept the authoritative version of the subject prefix even while parsing command line options, and sprefix variable was used as a temporary area to futz with it. Now, the parsing code has been refactored to build the subject prefix into the sprefix variable and assigns its value at the end to rev.subject_prefix, which makes the flow easier to grasp. Signed-off-by: Drew DeVault <sir@cmpwn.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-30	git-svn: drop FakeTerm hack	Wesley Schwengle	1	-18/+2
	Drop the FakeTerm hack, just like dfd46bae (send-email: drop FakeTerm hack, 2023-08-08) did, for exactly the same reason. It has been obsolete in git-svn since 30d45f798d (git-svn: delay term initialization, 2014-09-14). Note that unlike send-email, we already make sure to load Term::ReadLine only once. So this is just a cleanup, and not fixing any bug. Signed-off-by: Wesley Schwengle <wesleys@opperschaap.net> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-30	ci: deprecate ci/config/allow-ref script	Jeff King	2	-30/+6
	Now that we have the CI_BRANCHES mechanism, there is no need for anybody to use the ci/config/allow-ref mechanism. In the long run, we can hopefully remove it and the whole "config" job, as it consumes CPU and adds to the end-to-end latency of the whole workflow. But we don't want to do that immediately, as people need time to migrate until the CI_BRANCHES change has made it into the workflow file of every branch. So let's issue a warning, which will appear in the "annotations" section below the workflow result in GitHub's web interface. And let's remove the sample allow-refs script, as we don't want to encourage anybody to use it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-30	ci: allow branch selection through "vars"	Jeff King	2	-0/+15
	When we added config to skip CI for certain branches in e76eec3554 (ci: allow per-branch config for GitHub Actions, 2020-05-07), there wasn't any way to avoid spinning up a VM just to check the config. From the developer's perspective this isn't too bad, as the "skipped" branches complete successfully after running the config job (the workflow result is "success" instead of "skipped", but that is a minor lie). But we are still wasting time and GitHub's CPU to spin up a VM just to check the result of a short shell script. At the time there wasn't any way to avoid this. But they've since introduced repo-level variables that should let us do the same thing: https://github.blog/2023-01-10-introducing-required-workflows-and-configuration-variables-to-github-actions/#configuration-variables This is more efficient, and as a bonus is probably less confusing to configure (the existing system requires sticking your config on a magic ref). See the included docs for how to configure it. The code itself is pretty simple: it checks the variable and skips the config job if appropriate (and everything else depends on the config job already). There are two slight inaccuracies here: - we don't insist on branches, so this likewise applies to tag names or other refs. I think in practice this is OK, and keeping the code (and docs) short is more important than trying to be more exact. We are targeting developers of git.git and their limited workflows. - the match is done as a substring (so if you want to run CI for "foobar", then branch "foo" will accidentally match). Again, this should be OK in practice, as anybody who uses this is likely to only specify a handful of well-known names. If we want to be more exact, we can have the code check for adjoining spaces. Or even move to a more general CI_CONFIG variable formatted as JSON. I went with this scheme for the sake of simplicity. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-30	The fourth batch	Junio C Hamano	1	-0/+12
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-29	update-ref: mark unused parameter in parser callbacks	Jeff King	1	-7/+7
	The parsing of stdin is driven by a table of function pointers; mark unused parameters in concrete functions to avoid -Wunused-parameter warnings. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>