aboutsummaryrefslogtreecommitdiffstats
path: root/builtin (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-06-13merge/pull: add the "--compact-summary" optionJunio C Hamano2-4/+38
"git merge" and "git pull" shows "git diff --stat --summary @{1}" when they finish to indicate the extent of the changes brought into the history by default. While it gives a good overview, it becomes annoying when there are very many created or deleted paths. Introduce "--compact-summary" option to these two commands that tells it to instead show "git diff --compact-summary @{1}", which gives the same information in a lot more compact form in such a situation. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-12builtin/stash: provide a way to import stashes from a refbrian m. carlson1-0/+167
Now that we have a way to export stashes to a ref, let's provide a way to import them from such a ref back to the stash. This works much the way the export code does, except that we strip off the first parent chain commit and then store each resulting commit back to the stash. We don't clear the stash first and instead add the specified stashes to the top of the stash. This is because users may want to export just a few stashes, such as to share a small amount of work in progress with a colleague, and it would be undesirable for the receiving user to lose all of their data. For users who do want to replace the stash, it's easy to do to: simply run "git stash clear" first. We specifically rely on the fact that we'll produce identical stash commits on both sides in our tests. This provides a cheap, straightforward check for our tests and also makes it easy for users to see if they already have the same data in both repositories. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-12builtin/stash: provide a way to export stashes to a refbrian m. carlson1-0/+260
A common user problem is how to sync in-progress work to another machine. Users currently must use some sort of transfer of the working tree, which poses security risks and also necessarily causes the index to become dirty. The experience is suboptimal and frustrating for users. A reasonable idea is to use the stash for this purpose, but the stash is stored in the reflog, not in a ref, and as such it cannot be pushed or pulled. This also means that it cannot be saved into a bundle or preserved elsewhere, which is a problem when using throwaway development environments. In addition, users often want to replicate stashes across machines, such as when they must use multiple machines or when they use throwaway dev environments, such as those based on the Devcontainer spec, where they might otherwise lose various in-progress work. Let's solve this problem by allowing the user to export the stash to a ref (or, to just write it into the repository and print the hash, à la git commit-tree). Introduce git stash export, which writes a chain of commits where the first parent is always a chain to the previous stash, or to a single, empty commit (for the final item) and the second is the stash commit normally written to the reflog. Iterate over each stash from top to bottom, looking up the data for each one, and then create the chain from the single empty commit back up in reverse order. Generate a predictable empty commit so our behavior is reproducible. Create a useful commit message, preserving the author and committer information, to help users identify stash commits when viewing them as normal commits. If the user has specified specific stashes they'd like to export instead, use those instead of iterating over all of the stashes. As part of this, specifically request quiet behavior when looking up the OID for a revision because we will eventually hit a revision that doesn't exist and we don't want to die when that occurs. When exporting stashes, be sure to verify that they look like valid stashes and don't contain invalid data. This will help avoid failures on import or problems due to attempting to export invalid refs that are not stashes. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-12builtin/stash: factor out revision parsing into a functionbrian m. carlson1-11/+22
We allow several special forms of stash names in this code. In the future, we'll want to allow these same forms without parsing a stash commit, so let's refactor this code out into a function for reuse. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-11stash: fix incorrect branch name in stash messageK Jayatheerth1-2/+8
When creating a stash, Git uses the current branch name of the superproject to construct the stash commit message. However, in repositories with submodules, the message may mistakenly display the submodule branch name instead. This is because `refs_resolve_ref_unsafe()` returns a pointer to a static buffer. Subsequent calls to the same function overwrite the buffer, corrupting the originally fetched `branch_name` used for the stash message. Use `xstrdup()` to duplicate the branch name immediately after resolving it, so that later buffer overwrites do not affect the stash message. Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-09rebase: write script before initializing stateØystein Walle1-21/+21
If rebase.instructionFormat is invalid the repository is left in a strange state when the interactive rebase fails. `git status` outputs boths the same as it would in the normal case *and* something related to interactive rebase: $ git -c rebase.instructionFormat=blah rebase -i fatal: invalid --pretty format: blah $ git status On branch master Your branch is ahead of 'upstream/master' by 1 commit. (use "git push" to publish your local commits) git-rebase-todo is missing. No commands done. No commands remaining. You are currently editing a commit while rebasing branch 'master' on '8db3019401'. (use "git commit --amend" to amend the current commit) (use "git rebase --continue" once you are satisfied with your changes) By attempting to write the rebase script before initializing the state this potential scenario is avoided. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-08builtin/submodule--helper: fix leak when remote_submodule_branch() failedLidong Yan1-1/+3
In builtin/submodule--helper.c:update_submodule(), the variable remote_name is allocated in get_default_remote_submodule() but may be leaked if remote_submodule_branch() fails. Although it is unlikely that remote_submodule_branch() would fail after successfully obtaining a remote ref name from get_default_remote_submodule(), it is still possible. To prevent a potential memory leak, add a call to free(remote_name) at the early exit point. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-07stash: allow "git stash [<options>] --patch <pathspec>" to assume pushPhillip Wood1-3/+7
The support for assuming "push" when "-p" is given introduced in 9e140909f61 (stash: allow pathspecs in the no verb form, 2017-02-28) is very narrow, neither "git stash -m <message> -p <pathspec>" nor "git stash --patch <pathspec>" imply "push" and die instead. Relax this by passing PARSE_OPT_STOP_AT_NON_OPTION when push is being assumed and then setting "force_assume" if "--patch" was present. This means "git stash <pathspec> -p" still dies so that it does not assume the user meant "push" if they mistype a subcommand name but "git stash -m <message> -p <pathspec>" will now succeed. The test added in the last commit is adjusted to check that push is still assumed when "--patch" comes after other options on the command-line. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-07stash: allow "git stash -p <pathspec>" to assume push againPhillip Wood1-1/+1
Historically "git stash [<options>]" was assumed to mean "git stash save [<options>]". Since 1ada5020b38 (stash: use stash_push for no verb form, 2017-02-28) it is assumed to mean "git stash push [<options>]". As the push subcommand supports pathspecs, 9e140909f61 (stash: allow pathspecs in the no verb form, 2017-02-28) allowed "git stash -p <pathspec>" to mean "git stash push -p <pathspec>". This was broken in 8c3713cede7 (stash: eliminate crude option parsing, 2020-02-17) which failed to account for "push" being added to the start of argv in cmd_stash() before it calls push_stash() and kept looking in argv[0] for "-p" after moving the code to push_stash(). Fix this by regression by checking argv[1] instead of argv[0] and add a couple of tests to prevent future regressions. Helped-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-05repo_logmsg_reencode: fix memory leak when use repo_logmsg_reencode ()Lidong Yan2-1/+3
pretty.c:repo_logmsg_reencode() allocated memory should be freed with repo_unuse_commit_buffer(). Callers sometimes forgot free it at exit point. Add `repo_unuse_commit_buffer()` in insert_records_from_trailers at builtin/shortlog.c and create_commit at builtin/replay.c Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-04commit-graph: fix start_delayed_progress() leakLidong Yan1-0/+1
In commit-graph.c:graph_write(), if read_one_commit() failed, progress allocated in start_delayed_progress() will leak. Add stop_progress() before goto cleanup. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-04builtin/fetch-pack: cleanup before return errorLidong Yan1-2/+5
In builtin/fetch-pack.c:cmd_fetch_pack(), if finish_connect() failed, it returns error code without cleanup which cause memory leak. Add cleanup label before frees in the end of cmd_fetch_pack(), and add `goto cleanup` if finish_connect() failed. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03cat-file.c: add batch handling for submodulesVictoria Dye1-1/+4
When an object specification is passed to 'cat-file --batch[-check]' referring to a submodule (e.g. 'HEAD:path/to/my/submodule'), the current behavior of the command is to print the "missing" error message. However, it is often valuable for callers to distinguish between paths that are actually missing and "the submodule tree entry exists, but the object does not exist in the repository". To disambiguate without needing to invoke a separate Git process (e.g. 'ls-tree'), print the message "<oid> submodule" for such objects instead of "<object> missing". In addition to the change from "missing" to "submodule", the new message differs from the old in that it always prints the resolved tree entry's OID, rather than the input object specification. Note that this implementation maintains a distinction between submodules where the commit OID is not present in the repo, and submodules where the commit OID *is* present; the former will now print "<object> submodule", but the latter will still print the full object content. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03cat-file: add %(objectmode) atomVictoria Dye1-2/+7
Add a formatting atom, used with the --batch-check/--batch-command options, that prints the octal representation of the object mode if a given revision includes that information, e.g. one that follows the format <tree-ish>:<path>. If the mode information does not exist, an empty string is printed instead. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03Merge branch 'bs/total-ram-bsd'Junio C Hamano1-1/+3
Update total_ram() functrion on BSD variants. * bs/total-ram-bsd: builtin/gc: correct physical memory detection for OpenBSD / NetBSD
2025-06-03BUG(): remove leading underscore of the format stringLidong Yan2-2/+2
BUG() is not end-user facing but programmer facing, and we do not use _("...") in them. Replace all `BUG(_("..."))` with `BUG("...")` Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03builtin/maintenance: fix locking race when handling "gc" taskPatrick Steinhardt1-14/+27
The "gc" task has a similar locking race as the one that we have fixed for the "pack-refs" and "reflog-expire" tasks in preceding commits. Fix this by splitting up the logic of the "gc" task: - We execute `gc_before_repack()` in the foreground, which contains the logic that git-gc(1) itself would execute in the foreground, as well. - We spawn git-gc(1) after detaching, but with a new hidden flag that suppresses calling `gc_before_repack()`. Like this we have roughly the same logic as git-gc(1) itself and know to repack refs and reflogs before detaching, thus fixing the race. Note that `gc_before_repack()` is renamed to `gc_foreground_tasks()` to better reflect what this function does. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03builtin/gc: avoid global state in `gc_before_repack()`Patrick Steinhardt1-15/+9
The `gc_before_repack()` should only ever run once in git-gc(1), but we may end up calling it twice when the "--detach" flag is passed. The duplicated call is avoided though via a static flag in this function. This pattern is somewhat unintuitive though. Refactor it to drop the static flag and instead guard the second call of `gc_before_repack()` via `opts.detach`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03usage: allow dying without writing an error messagePatrick Steinhardt4-11/+11
Sometimes code wants to die in a situation where it already has written an error message. To use the same error code as `die()` we have to use `exit(128)`, which is easy to get wrong and leaves magic numbers all over our codebase. Teach `die_message_builtin()` to not print any error when passed a `NULL` pointer as error string. Like this, such users can now call `die(NULL)` to achieve the same result without any hardcoded error codes. Adapt a couple of builtins to use this new pattern to demonstrate that there is a need for such a helper. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03builtin/maintenance: fix locking race with refs and reflogs tasksPatrick Steinhardt1-2/+2
As explained in the preceding commit, git-gc(1) knows to detach only after it has already packed references and expired reflogs. This is done to avoid racing around their respective lockfiles. Adapt git-maintenance(1) accordingly and run the "pack-refs" and "reflog-expire" tasks in the foreground. Note that the "gc" task has the same issue, but the fix is a bit more involved there and will thus be done in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03builtin/maintenance: split into foreground and background tasksPatrick Steinhardt1-21/+49
Both git-gc(1) and git-maintenance(1) have logic to daemonize so that the maintenance tasks are performed in the background. git-gc(1) has some special logic though to not perform _all_ housekeeping tasks in the background: both references and reflogs are still handled synchronously in the foreground. This split exists because otherwise it may easily happen that git-gc(1) keeps the "packed-refs" file locked for an extended amount of time, where the next Git command that wants to modify any reference could now fail. This was especially important in the past, where git-gc(1) was still executed directly as part of our automatic maintenance: git-gc(1) was invoked via `git gc --auto --detach`, so we knew to handle most of the maintenance tasks in the background while doing those parts that may cause locking issues in the foreground. We have since moved to git-maintenance(1), which is a more flexible replacement for git-gc(1). By default this command runs git-gc(1), only, but it can be configured to run different tasks, as well. This command does not know about the split between maintenance tasks that should run before and after detach though, and this has led to several bug reports about spurious locking errors for the "packed-refs" file. Prepare for a fix by introducing this split for maintenance tasks. Note that this commit does not yet change any of the tasks, so there should not (yet) be a change in behaviour. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03builtin/maintenance: fix typedef for function pointersPatrick Steinhardt1-5/+5
The typedefs for `maintenance_task_fn` and `maintenance_auto_fn` are somewhat confusingly not true function pointers. As such, any user of those typedefs needs to manually add the pointer to make use of them. Fix this by making these true function pointers. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03builtin/maintenance: extract function to run tasksPatrick Steinhardt1-12/+23
Extract the function to run maintenance tasks. This function will be reused in a subsequent commit where we introduce a split between maintenance tasks that run before and after daemonizing the process. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03builtin/maintenance: stop modifying global array of tasksPatrick Steinhardt1-94/+112
When configuring maintenance tasks run by git-maintenance(1) we do so by modifying the global array of tasks directly. This is already quite bad on its own, as global state makes for logic that is hard to follow. Even more importantly though we use multiple different fields to track whether or not a task should be run: - "enabled" tracks the "maintenance.*.enabled" config key. This field disables execution of a task, unless the user has explicitly asked for the task. - "selected_order" tracks the order in which jobs have been asked for by the user via the "--task=" command line option. It overrides everything else, but only has an effect if at least one job has been selected. - "schedule" tracks the schedule priority for a job, that is how often it should run. This field only plays a role when the user has passed the "--schedule=" command line option. All of this makes it non-trivial to figure out which job really should be running right now. The logic to configure these fields and the logic that interprets them is distributed across multiple functions, making it even harder to follow it. Refactor the logic so that we stop modifying global state. Instead, we now compute which jobs should be run in `initialize_task_config()`, represented as an array of jobs to run that is stored in the options structure. Like this, all logic becomes self-contained and any users of this array only need to iterate through the tasks and execute them one by one. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03builtin/maintenance: mark "--task=" and "--schedule=" as incompatiblePatrick Steinhardt1-2/+4
The "--task=" option explicitly allows the user to say which maintenance tasks should be run, whereas "--schedule=" only respects the maintenance strategy configured for a specific repository. As such, it is not sensible to accept both options at the same time. Mark them as incompatible with one another. While at it, also convert the existing logic that marks "--auto" and "--schedule=" as incompatible to use `die_for_incompatible_opt2()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03builtin/maintenance: centralize configuration of explicit tasksPatrick Steinhardt1-23/+24
Users of git-maintenance(1) can explicitly ask it to run specific tasks by passing the `--task=` command line option. This option can be passed multiple times, which causes us to execute tasks in the same order as the tasks have been provided by the user. The order in which tasks are run is computed in `task_option_parse()`: every time we parse such a command line argument, we modify the global array of tasks by seting the selected index for that specific task. This has two downsides: - We modify global state, which makes it hard to follow the logic. - The configuration of tasks is split across multiple different functions, so it is not easy to figure out the different factors that play a role in selecting tasks. Refactor the logic so that `task_option_parse()` does not modify global state anymore. Instead, this function now only collects the list of configured tasks. The logic to configure ordering of the respective tasks is then deferred to `initialize_task_config()`. This refactoring solves the second problem, that the configuration of tasks is spread across multiple different locations. The first problem, that we modify global state, will be fixed in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03builtin/gc: drop redundant local variablePatrick Steinhardt1-6/+5
We have two different variables that track the quietness for git-gc(1): - The local variable `quiet`, which we wire up. - The `quiet` field of `struct maintenance_run_opts`. This leads to confusion which of these variables should be used and what the respective effect is. Simplify this logic by dropping the local variable in favor of the options field. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03builtin/gc: use designated field initializers for maintenance tasksPatrick Steinhardt1-27/+27
Convert the array of maintenance tasks to use designated field initializers. This makes it easier to add more fields to the struct without having to modify all tasks. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-01builtin/gc: correct physical memory detection for OpenBSD / NetBSDBrad Smith1-1/+3
OpenBSD / NetBSD use HW_PHYSMEM64 to detect the amount of physical memory in a system. HW_PHYSMEM will not provide the correct amount on a system with >=4GB of memory. Signed-off-by: Brad Smith <brad@comstyle.com> Reviewed-by: Collin Funk <collin.funk1@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-28fast-export: --signed-commits is experimentalJunio C Hamano1-6/+1
As the design of signature handling is still being discussed, it is likely that the data stream produced by the code in Git 2.50 would have to be changed in such a way that is not backward compatible. Mark the feature as experimental and discourge its use for now. Also flip the default on the generation side to "strip"; users of existing versions would not have passed --signed-commits=strip and will be broken by this change if the default is made to abort, and will be encouraged by the error message to produce data stream with future breakage guarantees by passing --signed-commits option. As we tone down the default behaviour, we no longer need the FAST_EXPORT_SIGNED_COMMITS_NOABORT environment variable, which was not discoverable enough. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-28Merge branch 'jt/receive-pack-skip-connectivity-check'Junio C Hamano1-18/+22
"git receive-pack" optionally learns not to care about connectivity check, which can be useful when the repository arranges to ensure connectivity by some other means. * jt/receive-pack-skip-connectivity-check: builtin/receive-pack: add option to skip connectivity check t5410: test receive-pack connectivity check
2025-05-27Merge branch 'js/misc-fixes'Junio C Hamano2-2/+3
Assorted fixes for issues found with CodeQL. * js/misc-fixes: sequencer: stop pretending that an assignment is a condition bundle-uri: avoid using undefined output of `sscanf()` commit-graph: avoid using stale stack addresses trace2: avoid "futile conditional" Avoid redundant conditions fetch: avoid unnecessary work when there is no current branch has_dir_name(): make code more obvious upload-pack: rename `enum` to reflect the operation commit-graph: avoid malloc'ing a local variable fetch: carefully clear local variable's address after use commit: simplify code
2025-05-27Merge branch 'ds/sparse-apply-add-p'Junio C Hamano3-7/+13
"git apply" and "git add -i/-p" code paths no longer unnecessarily expand sparse-index while working. * ds/sparse-apply-add-p: p2000: add performance test for patch-mode commands reset: integrate sparse index with --patch git add: make -p/-i aware of sparse index apply: integrate with the sparse index
2025-05-27Merge branch 'en/merge-tree-check'Junio C Hamano1-0/+18
"git merge-tree" learned an option to see if it resolves cleanly without actually creating a result. * en/merge-tree-check: merge-tree: add a new --quiet flag merge-ort: add a new mergeability_only option
2025-05-27Merge branch 'jk/no-funny-object-types'Junio C Hamano3-85/+28
Support to create a loose object file with unknown object type has been dropped. * jk/no-funny-object-types: object-file: drop support for writing objects with unknown types hash-object: handle --literally with OPT_NEGBIT hash-object: merge HASH_* and INDEX_* flags hash-object: stop allowing unknown types t: add lib-loose.sh t/helper: add zlib test-tool oid_object_info(): drop type_name strbuf fsck: stop using object_info->type_name strbuf oid_object_info_convert(): stop using string for object type cat-file: use type enum instead of buffer for -t option object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag cat-file: make --allow-unknown-type a noop object-file.h: fix typo in variable declaration
2025-05-23Merge branch 'en/replay-wo-the-repository'Junio C Hamano1-30/+35
The dependency on the_repository variable has been reduced from the code paths in "git replay". * en/replay-wo-the-repository: replay: replace the_repository with repo parameter passed to cmd_replay ()
2025-05-22diff --no-index: support limiting by pathspecJacob Keller1-1/+1
The --no-index option of git-diff enables using the diff machinery from git while operating outside of a repository. This mode of git diff is able to compare directories and produce a diff of their contents. When operating git diff in a repository, git has the notion of "pathspecs" which can specify which files to compare. In particular, when using git to diff two trees, you might invoke: $ git diff-tree -r <treeish1> <treeish2>. where the treeish could point to a subdirectory of the repository. When invoked this way, users can limit the selected paths of the tree by using a pathspec. Either by providing some list of paths to accept, or by removing paths via a negative refspec. The git diff --no-index mode does not support pathspecs, and cannot limit the diff output in this way. Other diff programs such as GNU difftools have options for excluding paths based on a pattern match. However, using git diff as a diff replacement has several advantages over many popular diff tools, including coloring moved lines, rename detections, and similar. Teach git diff --no-index how to handle pathspecs to limit the comparisons. This will only be supported if both provided paths are directories. For comparisons where one path isn't a directory, the --no-index mode already has some DWIM shortcuts implemented in the fixup_paths() function. Modify the fixup_paths function to return 1 if both paths are directories. If this is the case, interpret any extra arguments to git diff as pathspecs via parse_pathspec. Use parse_pathspec to load the remaining arguments (if any) to git diff --no-index as pathspec items. Disable PATHSPEC_ATTR support since we do not have a repository to do attribute lookup. Disable PATHSPEC_FROMTOP since we do not have a repository root. All pathspecs are treated as rooted at the provided comparison paths. After loading the pathspec data, calculate skip offsets for skipping past the root portion of the paths. This is required to ensure that pathspecs start matching from the provided path, rather than matching from the absolute path. We could instead pass the paths as prefix values to parse_pathspec. This is slightly problematic because the paths come from the command line and don't necessarily have the proper trailing slash. Additionally, that would require parsing pathspecs multiple times. Pass the pathspec object and the skip offsets into queue_diff, which in-turn must pass them along to read_directory_contents. Modify read_directory_contents to check against the pathspecs when scanning the directory. Use the skip offset to skip past the initial root of the path, and only match against portions that are below the intended directory structure being compared. The search algorithm for finding paths is recursive with read_dir. To make pathspec matching work properly, we must set both DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC. Without DO_MATCH_DIRECTORY, paths like "a/b/c/d" will not match against pathspecs like "a/b/c". This is usually achieved by setting the is_dir parameter of match_pathspec. Without DO_MATCH_LEADING_PATHSPEC, paths like "a/b/c" would not match against pathspecs like "a/b/c/d". This is crucial because we recursively iterate down the directories. We could simply avoid checking pathspecs at subdirectories, but this would force recursion down directories which would simply be skipped. If we always passed DO_MATCH_LEADING_PATHSPEC, then we will incorrectly match in certain cases such as matching 'a/c' against ':(glob)**/d'. The match logic will see that a matches the leading part of the **/ and accept this even tho c doesn't match. To avoid this, use the match_leading_pathspec() variant recently introduced. This sets both flags when is_dir is set, but leaves them both cleared when is_dir is 0. Add test cases and documentation covering the new functionality. Note for the documentation I opted not to move the placement of '--' which is sometimes used to disambiguate arguments. The diff --no-index mode requires exactly 2 arguments determining what to compare. Any additional arguments are interpreted as pathspecs and must come afterwards. Use of '--' would not actually disambiguate anything, since there will never be ambiguity over which arguments represent paths or pathspecs. Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-20builtin/receive-pack: add option to skip connectivity checkJustin Tobler1-18/+22
During git-receive-pack(1), connectivity of the object graph is validated to ensure that the received packfile does not leave the repository in a broken state. This is done via git-rev-list(1) and walking the objects, which can be expensive for large repositories. Generally, this check is critical to avoid an incomplete received packfile from corrupting a repository. Server operators may have additional knowledge though around exactly how Git is being used on the server-side which can be used to facilitate more efficient connectivity computation of incoming objects. For example, if it can be ensured that all objects in a repository are connected and do not depend on any missing objects, the connectivity of newly written objects can be checked by walking the object graph containing only the new objects from the updated tips and identifying the missing objects which represent the boundary between the new objects and the repository. These boundary objects can be checked in the canonical repository to ensure the new objects connect as expected and thus avoid walking the rest of the object graph. Git itself cannot make the guarantees required for such an optimization as it is possible for a repository to contain an unreachable object that references a missing object without the repository being considered corrupt. Introduce the --skip-connectivity-check option for git-receive-pack(1) which bypasses this connectivity check to give more control to the server-side. Note that without proper server-side validation of newly received objects handled outside of Git, usage of this option risks corrupting a repository. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-19Merge branch 'jk/oidmap-cleanup'Junio C Hamano1-1/+1
Code cleanup. * jk/oidmap-cleanup: raw_object_store: drop extra pointer to replace_map oidmap: add size function oidmap: rename oidmap_free() to oidmap_clear()
2025-05-19Merge branch 'ly/am-split-stgit-leakfix'Junio C Hamano1-1/+3
Leakfix. * ly/am-split-stgit-leakfix: builtin/am: fix memory leak in `split_mail_stgit_series`
2025-05-19receive-pack: use batched reference updatesKarthik Nayak1-16/+48
The reference updates performed as a part of 'git-receive-pack(1)', take place one at a time. For each reference update, a new transaction is created and committed. This is necessary to ensure we can allow individual updates to fail without failing the entire command. The command also supports an 'atomic' mode, which uses a single transaction to update all of the references. But this mode has an all-or-nothing approach, where if a single update fails, all updates would fail. In 23fc8e4f61 (refs: implement batch reference update support, 2025-04-08), we introduced a new mechanism to batch reference updates. Under the hood, this uses a single transaction to perform a batch of reference updates, while allowing only individual updates to fail. Utilize this newly introduced batch update mechanism in 'git-receive-pack(1)'. This provides a significant bump in performance, especially when dealing with repositories with large number of references. With the reftable backend there is a 18x performance improvement, when performing receive-pack with 10000 refs: Benchmark 1: receive: many refs (refformat = reftable, refcount = 10000, revision = master) Time (mean ± σ): 4.276 s ± 0.078 s [User: 0.796 s, System: 3.318 s] Range (min … max): 4.185 s … 4.430 s 10 runs Benchmark 2: receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD) Time (mean ± σ): 235.4 ms ± 6.9 ms [User: 75.4 ms, System: 157.3 ms] Range (min … max): 228.5 ms … 254.2 ms 11 runs Summary receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran 18.16 ± 0.63 times faster than receive: many refs (refformat = reftable, refcount = 10000, revision = master) In similar conditions, the files backend sees a 1.21x performance improvement: Benchmark 1: receive: many refs (refformat = files, refcount = 10000, revision = master) Time (mean ± σ): 1.121 s ± 0.021 s [User: 0.128 s, System: 0.975 s] Range (min … max): 1.097 s … 1.156 s 10 runs Benchmark 2: receive: many refs (refformat = files, refcount = 10000, revision = HEAD) Time (mean ± σ): 927.9 ms ± 22.6 ms [User: 99.0 ms, System: 815.2 ms] Range (min … max): 903.1 ms … 978.0 ms 10 runs Summary receive: many refs (refformat = files, refcount = 10000, revision = HEAD) ran 1.21 ± 0.04 times faster than receive: many refs (refformat = files, refcount = 10000, revision = master) As using batched updates requires the error handling to be moved to the end of the flow, create and use a 'struct strset' to track the failed refs and attribute the correct errors to them. This change also uncovers an issue when a client provides multiple updates to the same reference. For example: $ git send-pack remote.git A:foo B:foo Enumerating objects: 3, done. Counting objects: 100% (3/3), done. Delta compression using up to 20 threads Compressing objects: 100% (2/2), done. Writing objects: 100% (3/3), 226 bytes | 226.00 KiB/s, done. Total 3 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0) remote: error: cannot lock ref 'refs/heads/foo': reference already exists To remote.git ! [remote rejected] A -> foo (failed to update ref) ! [remote failure] B -> foo (remote failed to report status) As you can see, the remote runs into an error because it cannot lock the target reference for the second update. Furthermore, the remote complains that the first update has been rejected whereas the second update didn't receive any status update because we failed to lock it. Reading this status message alone a user would probably expect that `foo` has not been updated at all. But that's not the case: while we claim that the ref wasn't updated, it surprisingly points to `A` now. One could argue that this is merely an error in how we report the result of this push. But ultimately, the user's request itself is already broken and doesn't make any sense in the first place and cannot ever lead to a sensible outcome that honors the full request. The conversion to batched transactions fixes the issue because we now try to queue both updates in the same transaction. As such, the transaction itself will notice this conflict and refuse the update altogether before we commit any of the values. Note that this requires changes to a couple of tests in t5408 that happened to exercise this behaviour. Given that the generated output is misleading and given that the user request cannot ever be fully honored this really feels more like a bug than properly designed behaviour. As such, changing the behaviour feels like the right thing to do. Since now reference updates are batched, the 'reference-transaction' hook will be invoked with all updates together. Currently git will 'die' when the hook returns with a non-zero exit status in the 'prepared' stage. For 'git-receive-pack(1)', this allowed users to reject an individual reference update, git would have applied previous updates but immediately abort further execution. This is definitely an incorrect usage of this hook, since the right place to do this would be the 'update' hook. This patch retains the latter behavior, but 'reference-transaction' hook now changes to a all-or-nothing behavior when a non-zero exit status is returned in the 'prepared' stage, since batch updates use a transaction under the hood. This explains the change in 't1416'. Helped-by: Jeff King <peff@peff.net> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-19fetch: use batched reference updatesKarthik Nayak1-54/+73
The reference updates performed as a part of 'git-fetch(1)', take place one at a time. For each reference update, a new transaction is created and committed. This is necessary to ensure we can allow individual updates to fail without failing the entire command. The command also supports an '--atomic' mode, which uses a single transaction to update all of the references. But this mode has an all-or-nothing approach, where if a single update fails, all updates would fail. In 23fc8e4f61 (refs: implement batch reference update support, 2025-04-08), we introduced a new mechanism to batch reference updates. Under the hood, this uses a single transaction to perform a batch of reference updates, while allowing only individual updates to fail. Utilize this newly introduced batch update mechanism in 'git-fetch(1)'. This provides a significant bump in performance, especially when dealing with repositories with large number of references. Adding support for batched updates is simply modifying the flow to also create a batch update transaction in the non-atomic flow. With the reftable backend there is a 22x performance improvement, when performing 'git-fetch(1)' with 10000 refs: Benchmark 1: fetch: many refs (refformat = reftable, refcount = 10000, revision = master) Time (mean ± σ): 3.403 s ± 0.775 s [User: 1.875 s, System: 1.417 s] Range (min … max): 2.454 s … 4.529 s 10 runs Benchmark 2: fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD) Time (mean ± σ): 154.3 ms ± 17.6 ms [User: 102.5 ms, System: 56.1 ms] Range (min … max): 145.2 ms … 220.5 ms 18 runs Summary fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran 22.06 ± 5.62 times faster than fetch: many refs (refformat = reftable, refcount = 10000, revision = master) In similar conditions, the files backend sees a 1.25x performance improvement: Benchmark 1: fetch: many refs (refformat = files, refcount = 10000, revision = master) Time (mean ± σ): 605.5 ms ± 9.4 ms [User: 117.8 ms, System: 483.3 ms] Range (min … max): 595.6 ms … 621.5 ms 10 runs Benchmark 2: fetch: many refs (refformat = files, refcount = 10000, revision = HEAD) Time (mean ± σ): 485.8 ms ± 4.3 ms [User: 91.1 ms, System: 396.7 ms] Range (min … max): 477.6 ms … 494.3 ms 10 runs Summary fetch: many refs (refformat = files, refcount = 10000, revision = HEAD) ran 1.25 ± 0.02 times faster than fetch: many refs (refformat = files, refcount = 10000, revision = master) With this we'll either be using a regular transaction or a batch update transaction. This helps cleanup some code which is no longer needed as we'll now always have some type of 'ref_transaction' object being propagated. One big change is that earlier, each individual update would propagate a failure. Whereas now, the `ref_transaction_for_each_rejected_update` function is called at the end of the flow to capture the exit status for 'git-fetch(1)' and also to print F/D conflict errors. This does change the order of the errors being printed, but the behavior stays the same. Since transaction errors are now explicitly defined as part of 76e760b999 (refs: introduce enum-based transaction error types, 2025-04-08), utilize them and get rid of custom errors defined within 'builtin/fetch.c'. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-19refs: add function to translate errors to stringsKarthik Nayak1-24/+1
The commit 76e760b999 (refs: introduce enum-based transaction error types, 2025-04-08) introduced enum-based transaction error types. The refs transaction logic was also modified to propagate these errors. For clients of the ref transaction system, it would be beneficial to provide human readable messages for these errors. There is already an existing mapping in 'builtin/update-ref.c', move it to 'refs.c' as `ref_transaction_error_msg()` and use the same within the 'builtin/update-ref.c'. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16merge-tree: add a new --quiet flagElijah Newren1-0/+18
Git Forges may be interested in whether two branches can be merged while not being interested in what the resulting merge tree is nor which files conflicted. For such cases, add a new --quiet flag which will make use of the new mergeability_only flag added to merge-ort in the previous commit. This option allows the merge machinery to, in the outer layer of the merge: * exit early when a conflict is detected * avoid writing (most) merged blobs/trees to the object store Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16pack-objects: allow --shallow and --path-walkDerrick Stolee1-3/+2
There does not appear to be anything particularly incompatible about the --shallow and --path-walk options of 'git pack-objects'. If shallow commits are to be handled differently, then it is by the revision walk that defines the commit set and which are interesting or uninteresting. However, before the previous change, a trivial removal of the warning would cause a failure in t5500-fetch-pack.sh when GIT_TEST_PACK_PATH_WALK is enabled. The shallow fetch would provide more objects than we desired, due to some incorrect behavior of the path-walk API, especially around walking uninteresting objects. The recently-added tests in t5538-push-shallow.sh help to confirm this behavior is working with the --path-walk option if GIT_TEST_PACK_PATH_WALK is enabled. These tests passed previously due to the --path-walk feature being disabled in the presence of a shallow clone. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16pack-objects: thread the path-based compressionDerrick Stolee1-2/+164
Adapting the implementation of ll_find_deltas(), create a threaded version of the --path-walk compression step in 'git pack-objects'. This involves adding a 'regions' member to the thread_params struct, allowing each thread to own a section of paths. We can simplify the way jobs are split because there is no value in extending the batch based on name-hash the way sections of the object entry array are attempted to be grouped. We re-use the 'list_size' and 'remaining' items for the purpose of borrowing work in progress from other "victim" threads when a thread has finished its batch of work more quickly. Using the Git repository as a test repo, the p5313 performance test shows that the resulting size of the repo is the same, but the threaded implementation gives gains of varying degrees depending on the number of objects being packed. (This was tested on a 16-core machine.) Test HEAD~1 HEAD --------------------------------------------------- 5313.20: big pack 2.38 1.99 -16.4% 5313.21: big pack size 16.1M 16.0M -0.2% 5313.24: repack 107.32 45.41 -57.7% 5313.25: repack size 213.3M 213.2M -0.0% (Test output is formatted to better fit in message.) This ~60% reduction in 'git repack --path-walk' time is typical across all repos I used for testing. What is interesting is to compare when the overall time improves enough to outperform the --name-hash-version=1 case. These time improvements correlate with repositories with data shapes that significantly improve their data size as well. The --path-walk feature frequently takes longer than --name-hash-version=2, trading some extra computation for some additional compression. The natural place where this additional computation comes from is the two compression passes that --path-walk takes, though the first pass is naturally faster due to the path boundaries avoiding a number of delta compression attempts. For example, the microsoft/fluentui repo has significant size reduction from --name-hash-version=1 to --name-hash-version=2 followed by further improvements with --path-walk. The threaded computation makes --path-walk more competitive in time compared to --name-hash-version=2, though still ~31% more expensive in that metric. Repack Method Pack Size Time ------------------------------------------ Hash v1 439.4M 87.24s Hash v2 161.7M 21.51s Path Walk (Before) 142.5M 81.29s Path Walk (After) 142.5M 28.16s Similar results hold for the Git repository: Repack Method Pack Size Time ------------------------------------------ Hash v1 248.8M 30.44s Hash v2 249.0M 30.15s Path Walk (Before) 213.2M 142.50s Path Walk (After) 213.3M 45.41s ...as well as the nodejs/node repository: Repack Method Pack Size Time ------------------------------------------ Hash v1 739.9M 71.18s Hash v2 764.6M 67.82s Path Walk (Before) 698.1M 208.10s Path Walk (After) 698.0M 75.10s Finally, the Linux kernel repository is a good test for this repacking time change, even though the space savings is more subtle: Repack Method Pack Size Time ------------------------------------------ Hash v1 2.5G 554.41s Hash v2 2.5G 549.62s Path Walk (before) 2.2G 1562.36s Path Walk (before) 2.2G 559.00s Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16pack-objects: refactor path-walk delta phaseDerrick Stolee1-26/+57
Previously, the --path-walk option to 'git pack-objects' would compute deltas inline with the path-walk logic. This would make the progress indicator look like it is taking a long time to enumerate objects, and then very quickly computed deltas. Instead of computing deltas on each region of objects organized by tree, store a list of regions corresponding to these groups. These can later be pulled from the list for delta compression before doing the "global" delta search. This presents a new progress indicator that can be used in tests to verify that this stage is happening. The current implementation is not integrated with threads, but we are setting it up to arrive in the next change. Since we do not attempt to sort objects by size until after exploring all trees, we can remove the previous change to t5530 due to a different error message appearing first. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16pack-objects: enable --path-walk via configDerrick Stolee1-1/+10
Users may want to enable the --path-walk option for 'git pack-objects' by default, especially underneath commands like 'git push' or 'git repack'. This should be limited to client repositories, since the --path-walk option disables bitmap walks, so would be bad to include in Git servers when serving fetches and clones. There is potential that it may be helpful to consider when repacking the repository, to take advantage of improved deltas across historical versions of the same files. Much like how "pack.useSparse" was introduced and included in "feature.experimental" before being enabled by default, use the repository settings infrastructure to make the new "pack.usePathWalk" config enabled by "feature.experimental" and "feature.manyFiles". In order to test that this config works, add a new trace2 region around the path walk code that can be checked by a 'git push' command. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16repack: add --path-walk optionDerrick Stolee1-1/+6
Since 'git pack-objects' supports a --path-walk option, allow passing it through in 'git repack'. This presents interesting testing opportunities for comparing the different repacking strategies against each other. Add the --path-walk option to the performance tests in p5313. For the microsoft/fluentui repo [1] checked out at a specific commit [2], the --path-walk tests in p5313 look like this: Test this tree ------------------------------------------------------------------------- 5313.18: thin pack with --path-walk 0.08(0.06+0.02) 5313.19: thin pack size with --path-walk 18.4K 5313.20: big pack with --path-walk 2.10(7.80+0.26) 5313.21: big pack size with --path-walk 19.8M 5313.22: shallow fetch pack with --path-walk 1.62(3.38+0.17) 5313.23: shallow pack size with --path-walk 33.6M 5313.24: repack with --path-walk 81.29(96.08+0.71) 5313.25: repack size with --path-walk 142.5M [1] https://github.com/microsoft/fluentui [2] e70848ebac1cd720875bccaa3026f4a9ed700e08 Along with the earlier tests in p5313, I'll instead reformat the comparison as follows: Repack Method Pack Size Time --------------------------------------- Hash v1 439.4M 87.24s Hash v2 161.7M 21.51s Path Walk 142.5M 81.29s There are a few things to notice here: 1. The benefits of --name-hash-version=2 over --name-hash-version=1 are significant, but --path-walk still compresses better than that option. 2. The --path-walk command is still using --name-hash-version=1 for the second pass of delta computation, using the increased name hash collisions as a potential method for opportunistic compression on top of the path-focused compression. 3. The --path-walk algorithm is currently sequential and does not use multiple threads for delta compression. Threading will be implemented in a future change so the computation time will improve to better compete in this metric. There are small benefits in size for my copy of the Git repository: Repack Method Pack Size Time --------------------------------------- Hash v1 248.8M 30.44s Hash v2 249.0M 30.15s Path Walk 213.2M 142.50s As well as in the nodejs/node repository [3]: Repack Method Pack Size Time --------------------------------------- Hash v1 739.9M 71.18s Hash v2 764.6M 67.82s Path Walk 698.1M 208.10s [3] https://github.com/nodejs/node This benefit also repeats in my copy of the Linux kernel repository: Repack Method Pack Size Time --------------------------------------- Hash v1 2.5G 554.41s Hash v2 2.5G 549.62s Path Walk 2.2G 1562.36s It is important to see that even when the repository shape does not have many name-hash collisions, there is a slight space boost to be found using this method. As this repacking strategy was released in Git for Windows 2.47.0, some users have reported cases where the --path-walk compression is slightly worse than the --name-hash-version=2 option. In those cases, it may be beneficial to combine the two options. However, there has not been a released version of Git that has both options and I don't have access to these repos for testing. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16pack-objects: introduce GIT_TEST_PACK_PATH_WALKDerrick Stolee1-2/+10
There are many tests that validate whether 'git pack-objects' works as expected. Instead of duplicating these tests, add a new test environment variable, GIT_TEST_PACK_PATH_WALK, that implies --path-walk by default when specified. This was useful in testing the implementation of the --path-walk implementation, helping to find tests that are overly specific to the default object walk. These include: - t0411-clone-from-partial.sh : One test fetches from a repo that does not have the boundary objects. This causes the path-based walk to fail. Disable the variable for this test. - t5306-pack-nobase.sh : Similar to t0411, one test fetches from a repo without a boundary object. - t5310-pack-bitmaps.sh : One test compares the case when packing with bitmaps to the case when packing without them. Since we disable the test variable when writing bitmaps, this causes a difference in the object list (the --path-walk option adds an extra object). Specify --no-path-walk in both processes for the comparison. Another test checks for a specific delta base, but when computing dynamically without using bitmaps, the base object it too small to be considered in the delta calculations so no base is used. - t5316-pack-delta-depth.sh : This script cares about certain delta choices and their chain lengths. The --path-walk option changes how these chains are selected, and thus changes the results of this test. - t5322-pack-objects-sparse.sh : This demonstrates the effectiveness of the --sparse option and how it combines with --path-walk. - t5332-multi-pack-reuse.sh : This test verifies that the preferred pack is used for delta reuse when possible. The --path-walk option is not currently aware of the preferred pack at all, so finds a different delta base. - t7406-submodule-update.sh : When using the variable, the --depth option collides with the --path-walk feature, resulting in a warning message. Disable the variable so this warning does not appear. I want to call out one specific test change that is only temporary: - t5530-upload-pack-error.sh : One test cares specifically about an "unable to read" error message. Since the current implementation performs delta calculations within the path-walk API callback, a different "unable to get size" error message appears. When this is changed in a future refactoring, this test change can be reverted. Similar to GIT_TEST_NAME_HASH_VERSION, we do not add this option to the linux-TEST-vars CI build as that's already an overloaded build. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>