aboutsummaryrefslogtreecommitdiffstats
path: root/builtin/cat-file.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2023-06-21cache.h: remove this no-longer-used headerElijah Newren1-2/+1
Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-18replace strbuf_expand() with strbuf_expand_step()René Scharfe1-18/+17
Avoid the overhead of passing context to a callback function of strbuf_expand() by using strbuf_expand_step() in a loop instead. It requires explicit handling of %% and unrecognized placeholders, but is simpler, more direct and avoids void pointers. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12repository: create disable_replace_refs()Derrick Stolee1-1/+1
Several builtins depend on being able to disable the replace references so we actually operate on each object individually. These currently do so by directly mutating the 'read_replace_refs' global. A future change will move this global into a different place, so it will be necessary to change all of these lines. However, we can simplify that transition by abstracting the purpose of these global assignments with a method call. We will need to keep this read_replace_refs global forever, as we want to make sure that we never use replace refs throughout the life of the process if this method is called. Future changes may present a repository-scoped version of the variable to represent that repository's core.useReplaceRefs config value, but a zero-valued read_replace_refs will always override such a setting. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12cat-file: add option '-Z' that delimits input and output with NULPatrick Steinhardt1-21/+36
In db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`, 2022-07-22), we have introduced a new mode to read the input via NUL-delimited records instead of newline-delimited records. This allows the user to query for revisions that have newlines in their path component. While unusual, such queries are perfectly valid and thus it is clear that we should be able to support them properly. Unfortunately, the commit only changed the input to be NUL-delimited, but didn't change the output at the same time. While this is fine for queries that are processed successfully, it is less so for queries that aren't. In the case of missing commits for example the result can become entirely unparsable: ``` $ printf "7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10\n1234567890\n\n\commit000" | git cat-file --batch -z 7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10 1234567890 commit missing ``` This is of course a crafted query that is intentionally gaming the deficiency, but more benign queries that contain newlines would have similar problems. Ideally, we should have also changed the output to be NUL-delimited when `-z` is specified to avoid this problem. As the input is NUL-delimited, it is clear that the output in this case cannot ever contain NUL characters by itself. Furthermore, Git does not allow NUL characters in revisions anyway, further stressing the point that using NUL-delimited output is safe. The only exception is of course the object data itself, but as git-cat-file(1) prints the size of the object data clients should read until that specified size has been consumed. But even though `-z` has only been introduced a few releases ago in Git v2.38.0, changing the output format retroactively to also NUL-delimit output would be a backwards incompatible change. And while one could make the argument that the output is inherently broken already, we need to assume that there are existing users out there that use it just fine given that revisions containing newlines are quite exotic. Instead, introduce a new option `-Z` that switches to NUL-delimited input and output. While this new option could arguably only switch the output format to be NUL-delimited, the consequence would be that users have to always specify both `-z` and `-Z` when the input may contain newlines. On the other hand, if the user knows that there never will be newlines in the input, they don't have to use either of those options. There is thus no usecase that would warrant treating input and output format separately, which is why we instead opt to "do the right thing" and have `-Z` mean to NUL-terminate both formats. The old `-z` option is marked as deprecated with a hint that its output may become unparsable. It is thus hidden both from the synopsis as well as the command's help output. Co-authored-by: Toon Claes <toon@iotcl.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12cat-file: simplify reading from standard inputPatrick Steinhardt1-23/+9
The batch modes of git-cat-file(1) read queries from stantard input that are either newline- or NUL-delimited. This code was introduced via db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`, 2022-07-22), which notes that: """ The refactoring here is slightly unfortunate, since we turn loops like: while (strbuf_getline(&buf, stdin) != EOF) into: while (1) { int ret; if (opt->nul_terminated) ret = strbuf_getline_nul(&input, stdin); else ret = strbuf_getline(&input, stdin); if (ret == EOF) break; } """ The commit proposed introducing a helper function that is easier to use, which is just what we have done in the preceding commit. Refactor the code to use this new helper to simplify the loop. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-file.h: move declarations for object-file.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-name.h: move declarations for object-name.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on convert.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano1-10/+15
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-28cocci: apply the "promisor-remote.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-1/+1
Apply the part of "the_repository.pending.cocci" pertaining to "promisor-remote.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28cocci: apply the "object-store.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-9/+14
Apply the part of "the_repository.pending.cocci" pertaining to "object-store.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21write-or-die.h: move declarations for write-or-die.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren1-0/+1
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-17Merge branch 'jk/unused-post-2.39-part2'Junio C Hamano1-4/+4
More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...
2023-02-24for_each_object: mark unused callback parametersJeff King1-4/+4
The for_each_{loose,packed}_object interface uses callback functions, but not every callback needs all of the parameters. Mark the unused ones to satisfy -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23replace-object.h: move read_replace_refs declaration from cache.h to hereElijah Newren1-0/+1
Adjust several files to be more explicit about their dependency on replace-objects to accommodate this change. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23ident.h: move ident-related declarations out of cache.hElijah Newren1-0/+1
These functions were all defined in a separate ident.c already, so create ident.h and move the declarations into that file. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren1-0/+1
This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-05Merge branch 'sa/cat-file-mailmap--batch-check'Junio C Hamano1-0/+28
'cat-file' gains mailmap support for its '--batch-check' and '-s' options. * sa/cat-file-mailmap--batch-check: cat-file: add mailmap support to --batch-check option cat-file: add mailmap support to -s option
2022-12-20cat-file: add mailmap support to --batch-check optionSiddharth Asthana1-0/+15
Even though the cat-file command with `--batch-check` option does not complain when `--use-mailmap` option is given, the latter option is ignored. Compute the size of the object after replacing the idents and report it instead. In order to make `--batch-check` option honour the mailmap mechanism we have to read the contents of the commit/tag object. There were two ways to do it: 1. Make two calls to `oid_object_info_extended()`. If `--use-mailmap` option is given, the first call will get us the type of the object and second call will only be made if the object type is either a commit or tag to get the contents of the object. 2. Make one call to `oid_object_info_extended()` to get the type of the object. Then, if the object type is either of commit or tag, make a call to `repo_read_object_file()` to read the contents of the object. I benchmarked the following command with both the above approaches and compared against the current implementation where `--use-mailmap` option is ignored: `git cat-file --use-mailmap --batch-all-objects --batch-check --buffer --unordered` The results can be summarized as follows: Time (mean ± σ) default 827.7 ms ± 104.8 ms first approach 6.197 s ± 0.093 s second approach 1.975 s ± 0.217 s Since, the second approach is faster than the first one, I implemented it in this patch. The command git cat-file can now use the mailmap mechanism to replace idents with canonical versions for commit and tag objects. There are several options like `--batch`, `--batch-check` and `--batch-command` that can be combined with `--use-mailmap`. But the documentation for `--batch`, `--batch-check` and `--batch-command` doesn't say so. This patch fixes that documentation. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: John Cai <johncai86@gmail.com> Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-20cat-file: add mailmap support to -s optionSiddharth Asthana1-0/+13
Even though the cat-file command with `-s` option does not complain when `--use-mailmap` option is given, the latter option is ignored. Compute the size of the object after replacing the idents and report it instead. In order to make `-s` option honour the mailmap mechanism we have to read the contents of the commit/tag object. Make use of the call to `oid_object_info_extended()` to get the contents of the object and store in `buf`. `buf` is later freed in the function. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: John Cai <johncai86@gmail.com> Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-21{builtin/*,repository}.c: add & use "USE_THE_INDEX_VARIABLE"Ævar Arnfjörð Bjarmason1-1/+1
Split up the "USE_THE_INDEX_COMPATIBILITY_MACROS" into that setting and a more narrow "USE_THE_INDEX_VARIABLE". In the case of these built-ins we only need "the_index" variable, but not the compatibility wrapper for functions we're not using. Let's then have some users of "USE_THE_INDEX_COMPATIBILITY_MACROS" use this more narrow and descriptive define. For context: The USE_THE_INDEX_COMPATIBILITY_MACROS macro was added to test-tool.h in f8adbec9fea (cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch, 2019-01-24). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-13doc txt & -h consistency: add "-z" to cat-file "-h"Ævar Arnfjörð Bjarmason1-1/+1
Fix a bug in db9d67f2e9c (builtin/cat-file.c: support NUL-delimited input with `-z`, 2022-07-22), before that change the SYNOPSIS and "-h" output were the same, but not afterwards. That change followed a similar earlier divergence in 473fa2df08d (Documentation: add --batch-command to cat-file synopsis, 2022-04-07). Subsequent commits will fix this sort of thing more systematically, but let's fix this one as a one-off. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-05Merge branch 'tb/cat-file-z'Junio C Hamano1-3/+25
Operating modes like "--batch" of "git cat-file" command learned to take NUL-terminated input, instead of one-item-per-line. * tb/cat-file-z: builtin/cat-file.c: support NUL-delimited input with `-z` t1006: extract --batch-command inputs to variables
2022-08-03Merge branch 'sa/cat-file-mailmap'Junio C Hamano1-1/+42
"git cat-file" learned an option to use the mailmap when showing commit and tag objects. * sa/cat-file-mailmap: cat-file: add mailmap support ident: rename commit_rewrite_person() to apply_mailmap_to_header() ident: move commit_rewrite_person() to ident.c revision: improve commit_rewrite_person()
2022-07-22builtin/cat-file.c: support NUL-delimited input with `-z`Taylor Blau1-3/+25
When callers are using `cat-file` via one of the stdin-driven `--batch` modes, all input is newline-delimited. This presents a problem when callers wish to ask about, e.g. tree-entries that have a newline character present in their filename. To support this niche scenario, introduce a new `-z` mode to the `--batch`, `--batch-check`, and `--batch-command` suite of options that instructs `cat-file` to treat its input as NUL-delimited, allowing the individual commands themselves to have newlines present. The refactoring here is slightly unfortunate, since we turn loops like: while (strbuf_getline(&buf, stdin) != EOF) into: while (1) { int ret; if (opt->nul_terminated) ret = strbuf_getline_nul(&input, stdin); else ret = strbuf_getline(&input, stdin); if (ret == EOF) break; } It's tempting to think that we could use `strbuf_getwholeline()` and specify either `\n` or `\0` as the terminating character. But for input on platforms that include a CR character preceeding the LF, this wouldn't quite be the same, since `strbuf_getline(...)` will trim any trailing CR, while `strbuf_getwholeline(&buf, stdin, '\n')` will not. In the future, we could clean this up further by introducing a variant of `strbuf_getwholeline()` that addresses the aforementioned gap, but that approach felt too heavy-handed for this pair of uses. Some tests are added in t1006 to ensure that `cat-file` produces the same output in `--batch`, `--batch-check`, and `--batch-command` modes with and without the new `-z` option. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18cat-file: add mailmap supportSiddharth Asthana1-1/+42
git-cat-file is used by tools like GitLab to get commit tag contents that are then displayed to users. This content which has author, committer or tagger information, could benefit from passing through the mailmap mechanism before being sent or displayed. This patch adds --[no-]use-mailmap command line option to the git cat-file command. It also adds --[no-]mailmap option as an alias to --[no-]use-mailmap. This patch also introduces new test cases to test the mailmap mechanism in git cat-file command. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: John Cai <johncai86@gmail.com> Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-01cat-file: fix a common "struct object_context" memory leakÆvar Arnfjörð Bjarmason1-10/+22
Fix a memory leak where "cat-file" will leak the "path" member. See e5fba602e59 (textconv: support for cat_file, 2010-06-15) for the code that introduced the offending get_oid_with_context() call (called get_sha1_with_context() at the time). As a result we can mark several tests as passing with SANITIZE=leak using "TEST_PASSES_SANITIZE_LEAK=true". As noted in dc944b65f1d (get_sha1_with_context: dynamically allocate oc->path, 2017-05-19) callers must free the "path" member. That same commit added the relevant free() to this function, but we weren't catching cases where we'd return early. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-01cat-file: fix a memory leak in --batch-command modeÆvar Arnfjörð Bjarmason1-0/+1
Fix a memory leak introduced in 440c705ea63 (cat-file: add --batch-command mode, 2022-02-18). The free_cmds() function was only called on "queued_nr" if we had a "flush" command. As the "without flush for blob info" test added in the same commit shows we can't rely on that, so let's call free_cmds() again at the end. Since "nr" follows the usual pattern of being set to 0 if we've free()'d the memory already it's OK to call it twice, even in cases where we are doing a "flush". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-23Merge branch 'jc/cat-file-batch-default-format-optim'Junio C Hamano1-6/+23
Optimize away strbuf_expand() call with a hardcoded formatting logic specific for the default format in the --batch and --batch-check options of "git cat-file". * jc/cat-file-batch-default-format-optim: cat-file: skip expanding default format
2022-03-16Merge branch 'ab/object-file-api-updates'Junio C Hamano1-4/+7
Object-file API shuffling. * ab/object-file-api-updates: object-file API: pass an enum to read_object_with_reference() object-file.c: add a literal version of write_object_file_prepare() object-file API: have hash_object_file() take "enum object_type" object API: rename hash_object_file_literally() to write_*() object-file API: split up and simplify check_object_signature() object API users + docs: check <0, not !0 with check_object_signature() object API docs: move check_object_signature() docs to cache.h object API: correct "buf" v.s. "map" mismatch in *.c and *.h object-file API: have write_object_file() take "enum object_type" object-file API: add a format_object_header() function object-file API: return "void", not "int" from hash_object_file() object-file.c: split up declaration of unrelated variables
2022-03-15cat-file: skip expanding default formatJohn Cai1-6/+23
When format is passed into --batch, --batch-check, --batch-command, the format gets expanded. When nothing is passed in, the default format is set and the expand_format() gets called. We can save on these cycles by hardcoding how to print the information when nothing is passed as the format, or when the default format is passed. There is no need for the fully expanded format with the default. Since batch_object_write() happens on every object provided in batch mode, we get a nice performance improvement. git rev-list --all > /tmp/all-obj.txt git cat-file --batch-check </tmp/all-obj.txt with HEAD^: Time (mean ± σ): 57.6 ms ± 1.7 ms [User: 51.5 ms, System: 6.2 ms] Range (min … max): 54.6 ms … 64.7 ms 50 runs with HEAD: Time (mean ± σ): 49.8 ms ± 1.7 ms [User: 42.6 ms, System: 7.3 ms] Range (min … max): 46.9 ms … 55.9 ms 56 runs If nothing is provided as a format argument, or if the default format is passed, skip expanding of the format and print the object info with a default format. See https://lore.kernel.org/git/87eecf8ork.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25object-file API: pass an enum to read_object_with_reference()Ævar Arnfjörð Bjarmason1-4/+7
Change the read_object_with_reference() function to take an "enum object_type". It was not prepared to handle an arbitrary "const char *type", as it was itself calling type_from_string(). Let's change the only caller that passes in user data to use type_from_string(), and convert the rest to use e.g. "OBJ_TREE" instead of "tree_type". The "cat-file" caller is not on the codepath that handles"--allow-unknown", so the type_from_string() there is safe. Its use of type_from_string() doesn't functionally differ from that of the pre-image. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-18cat-file: add --batch-command modeJohn Cai1-1/+143
Add a new flag --batch-command that accepts commands and arguments from stdin, similar to git-update-ref --stdin. At GitLab, we use a pair of long running cat-file processes when accessing object content. One for iterating over object metadata with --batch-check, and the other to grab object contents with --batch. However, if we had --batch-command, we wouldn't need to keep both processes around, and instead just have one --batch-command process where we can flip between getting object info, and getting object contents. Since we have a pair of cat-file processes per repository, this means we can get rid of roughly half of long lived git cat-file processes. Given there are many repositories being accessed at any given time, this can lead to huge savings. git cat-file --batch-command will enter an interactive command mode whereby the user can enter in commands and their arguments that get queued in memory: <command1> [arg1] [arg2] LF <command2> [arg1] [arg2] LF When --buffer mode is used, commands will be queued in memory until a flush command is issued that execute them: flush LF The reason for a flush command is that when a consumer process (A) talks to a git cat-file process (B) and interactively writes to and reads from it in --buffer mode, (A) needs to be able to control when the buffer is flushed to stdout. Currently, from (A)'s perspective, the only way is to either 1. kill (B)'s process 2. send an invalid object to stdin. 1. is not ideal from a performance perspective as it will require spawning a new cat-file process each time, and 2. is hacky and not a good long term solution. With this mechanism of queueing up commands and letting (A) issue a flush command, process (A) can control when the buffer is flushed and can guarantee it will receive all of the output when in --buffer mode. --batch-command also will not allow (B) to flush to stdout until a flush is received. This patch adds the basic structure for adding command which can be extended in the future to add more commands. It also adds the following two commands (on top of the flush command): contents <object> LF info <object> LF The contents command takes an <object> argument and prints out the object contents. The info command takes an <object> argument and prints out the object metadata. These can be used in the following way with --buffer: info <object> LF contents <object> LF contents <object> LF info <object> LF flush LF info <object> LF flush LF When used without --buffer: info <object> LF contents <object> LF contents <object> LF info <object> LF info <object> LF Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-18cat-file: introduce batch_mode enum to replace print_contentsJohn Cai1-4/+16
A future patch introduces a new --batch-command flag. Including --batch and --batch-check, we will have a total of three batch modes. print_contents is the only boolean on the batch_options sturct used to distinguish between the different modes. This makes the code harder to read. To reduce potential confusion, replace print_contents with an enum to help readability and clarity. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-18cat-file: rename cmdmode to transform_modeJohn Cai1-7/+7
In the next patch, we will add an enum on the batch_options struct that indicates which type of batch operation will be used: --batch, --batch-check and the soon to be --batch-command that will read commands from stdin. --batch-command mode might get confused with the cmdmode flag. There is value in renaming cmdmode in any case. cmdmode refers to how the result output of the blob will be transformed, either according to --filter or --textconv. So transform_mode is a more descriptive name for the flag. Rename cmdmode to transform_mode in cat-file.c Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-12cat-file: s/_/-/ in typo'd usage_msg_optf() messageÆvar Arnfjörð Bjarmason1-1/+1
Fix a typo in my recent 03dc51fe849 (cat-file: fix remaining usage bugs, 2021-10-09). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-12cat-file: don't whitespace-pad "(...)" in SYNOPSIS and usage outputÆvar Arnfjörð Bjarmason1-3/+3
Fix up whitespace issues around "(... | ...)" in the SYNOPSIS and usage. These were introduced in ab/cat-file series. See e145efa6059 (Merge branch 'ab/cat-file' into next, 2022-01-05). In particular 57d6a1cf96, 5a40417876 and 97fe7250753 in that series. We'll now correctly emit this usage output: $ git cat-file -h usage: git cat-file <type> <object> or: git cat-file (-e | -p) <object> or: git cat-file (-t | -s) [--allow-unknown-type] <object> [...] Before this the last line of that would be inconsistent with the preceding "(-e | -p)": or: git cat-file ( -t | -s ) [--allow-unknown-type] <object> Reported-by: Jiang Xin <worldhello.net@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05i18n: turn even more messages into "cannot be used together" onesJean-Noël Avila1-1/+1
Even if some of these messages are not subject to gettext i18n, this helps bring a single style of message for a given error type. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-30cat-file: use GET_OID_ONLY_TO_DIE in --(textconv|filters)Ævar Arnfjörð Bjarmason1-10/+6
Change the cat_one_file() logic that calls get_oid_with_context() under --textconv and --filters to use the GET_OID_ONLY_TO_DIE flag, thus improving the error messaging emitted when e.g. <path> is missing but <rev> is not. To service the "cat-file" use-case we need to introduce a new "GET_OID_REQUIRE_PATH" flag, otherwise it would exit early as soon as a valid "HEAD" was resolved, but in the "cat-file" case being changed we always need a valid revision and path. This arguably makes the "<bad rev>:<bad path>" and "<bad rev>:<good (in HEAD) path>" use cases worse, as we won't quote the <path> component at the user anymore, but let's just use the existing logic "git log" et al use for now. We can improve the messaging for those cases as a follow-up for all callers. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-30cat-file: correct and improve usage informationÆvar Arnfjörð Bjarmason1-20/+29
Change the usage output emitted on "git cat-file -h" to group related options, making it clear to users which options go with which other ones. The new output is: Check object existence or emit object contents -e check if <object> exists -p pretty-print <object> content Emit [broken] object attributes -t show object type (one of 'blob', 'tree', 'commit', 'tag', ...) -s show object size --allow-unknown-type allow -s and -t to work with broken/corrupt objects Batch objects requested on stdin (or --batch-all-objects) --batch[=<format>] show full <object> or <rev> contents --batch-check[=<format>] like --batch, but don't emit <contents> --batch-all-objects with --batch[-check]: ignores stdin, batches all known objects Change or optimize batch output --buffer buffer --batch output --follow-symlinks follow in-tree symlinks --unordered do not order objects before emitting them Emit object (blob or tree) with conversion or filter (stand-alone, or with batch) --textconv run textconv on object's content --filters run filters on object's content --path blob|tree use a <path> for (--textconv | --filters ); Not with 'batch' The old usage was: <type> can be one of: blob, tree, commit, tag -t show object type -s show object size -e exit with zero when there's no error -p pretty-print object's content --textconv for blob objects, run textconv on object's content --filters for blob objects, run filters on object's content --batch-all-objects show all objects with --batch or --batch-check --path <blob> use a specific path for --textconv/--filters --allow-unknown-type allow -s and -t to work with broken/corrupt objects --buffer buffer --batch output --batch[=<format>] show info and content of objects fed from the standard input --batch-check[=<format>] show info about objects fed from the standard input --follow-symlinks follow in-tree symlinks (used with --batch or --batch-check) --unordered do not order --batch-all-objects output While shorter, I think the new one is easier to understand, as e.g. "--allow-unknown-type" is grouped with "-t" and "-s", as it can only be combined with those options. The same goes for "--buffer", "--unordered" etc. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-30cat-file: fix remaining usage bugsÆvar Arnfjörð Bjarmason1-32/+63
With the migration of --batch-all-objects to OPT_CMDMODE() in the preceding commit one bug with combining it and other OPT_CMDMODE() options was solved, but we were still left with e.g. --buffer silently being discarded when not in batch mode. Fix all those bugs, and in addition emit errors telling the user specifically what options can't be combined with what other options, before this we'd usually just emit the cryptic usage text and leave the users to work it out by themselves. This change is rather large, because to do so we need to untangle the options processing so that we can not only error out, but emit sensible errors, and e.g. emit errors about options before errors about stray argc elements (as they might become valid if the option were removed). Some of the output changes ("error:" to "fatal:" with usage_msg_opt[f]()), but none of the exit codes change, except in those cases where we silently accepted bad option combinations before, now we'll error out. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-30cat-file: make --batch-all-objects a CMDMODEÆvar Arnfjörð Bjarmason1-14/+11
The usage of OPT_CMDMODE() in "cat-file"[1] was added in parallel with the development of[3] the --batch-all-objects option[4], so we've since grown[5] checks that it can't be combined with other command modes, when it should just be made a top-level command-mode instead. It doesn't combine with --filters, --textconv etc. By giving parse_options() information about what options are mutually exclusive with one another we can get the die() message being removed here for free, we didn't even use that removed message in some cases, e.g. for both of: --batch-all-objects --textconv --batch-all-objects --filters We'd take the "goto usage" in the "if (opt)" branch, and never reach the previous message. Now we'll emit e.g.: $ git cat-file --batch-all-objects --filters error: option `filters' is incompatible with --batch-all-objects 1. b48158ac94c (cat-file: make the options mutually exclusive, 2015-05-03) 2. https://lore.kernel.org/git/xmqqtwspgusf.fsf@gitster.dls.corp.google.com/ 3. https://lore.kernel.org/git/20150622104559.GG14475@peff.net/ 4. 6a951937ae1 (cat-file: add --batch-all-objects option, 2015-06-22) 5. 321459439e1 (cat-file: support --textconv/--filters in batch mode, 2016-09-09) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-30cat-file: move "usage" variable to cmd_cat_file()Ævar Arnfjörð Bjarmason1-19/+18
There's no benefit to defining this at a distance, and it makes the code harder to read as you've got to scroll up to see the usage that corresponds to the options. In subsequent commits I'll make use of usage_msg_opt(), which will be quite noisy if I have to use the long "cat_file_usage" variable, there's no other command being defined in this file, so let's rename it to just "usage". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-30cat-file docs: fix SYNOPSIS and "-h" outputÆvar Arnfjörð Bjarmason1-2/+8
There were various inaccuracies in the previous SYNOPSIS output, e.g. "--path" is not something that can optionally go with any options except --textconv or --filters, as the output implied. The opening line of the DESCRIPTION section is also "In its first form[...]", which refers to "git cat-file <type> <object>", but the SYNOPSIS section wasn't showing that as the first form! That part of the documentation made sense in d83a42f34a6 (Documentation: minor grammatical fixes in git-cat-file.txt, 2009-03-22) when it was introduced, but since then various options that were added have made that intro make no sense in the context it was in. Now the two will match again. The usage output here is not properly aligned on "master" currently, but will be with my in-flight 4631cfc20bd (parse-options: properly align continued usage output, 2021-09-21), so let's indent things correctly in the C code in anticipation of that. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08cat-file: use packed_object_info() for --batch-all-objectsJeff King1-14/+36
When "cat-file --batch-all-objects" iterates over each object, it knows where to find each one. But when we look up details of the object, we don't use that information at all. This patch teaches it to use the pack/offset pair when we're iterating over objects in a pack. This yields a measurable speed improvement (timings on a fully packed clone of linux.git): Benchmark #1: ./git.old cat-file --batch-all-objects --unordered --batch-check="%(objecttype) %(objectname)" Time (mean ± σ): 8.128 s ± 0.118 s [User: 7.968 s, System: 0.156 s] Range (min … max): 8.007 s … 8.301 s 10 runs Benchmark #2: ./git.new cat-file --batch-all-objects --unordered --batch-check="%(objecttype) %(objectname)" Time (mean ± σ): 4.294 s ± 0.064 s [User: 4.167 s, System: 0.125 s] Range (min … max): 4.227 s … 4.457 s 10 runs Summary './git.new cat-file --batch-all-objects --unordered --batch-check="%(objecttype) %(objectname)"' ran 1.89 ± 0.04 times faster than './git.old cat-file --batch-all-objects --unordered --batch-check="%(objecttype) %(objectname)" The implementation is pretty simple: we just call packed_object_info() instead of oid_object_info_extended() when we can. Most of the changes are just plumbing the pack/offset pair through the callstack. There is one subtlety: replace lookups are not handled by packed_object_info(). But since those are disabled for --batch-all-objects, and since we'll only have pack info when that option is in effect, we don't have to worry about that. There are a few limitations to this optimization which we could address with further work: - I didn't bother recording when we found an object loose. Technically this could save us doing a fruitless lookup in the pack index. But opening and mmap-ing a loose object is so expensive in the first place that this doesn't matter much. And if your repository is large enough to care about per-object performance, most objects are going to be packed anyway. - This works only in --unordered mode. For the sorted mode, we'd have to record the pack/offset pair as part of our oid-collection. That's more code, plus at least 16 extra bytes of heap per object. It would probably still be a net win in runtime, but we'd need to measure. - For --batch, this still helps us with getting the object metadata, but we still do a from-scratch lookup for the object contents. This probably doesn't matter that much, because the lookup cost will be much smaller relative to the cost of actually unpacking and printing the objects. For small objects, we could probably swap out read_object_file() for using packed_object_info() with a "object_info.contentp" to get the contents. But we'd still need to deal with streaming for larger objects. A better path forward here is to teach the initial oid_object_info_extended() / packed_object_info() calls to retrieve the contents of smaller objects while they are already being accessed. That would save the extra lookup entirely. But it's a non-trivial feature to add to the object_info code, so I left it for now. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08cat-file: split ordered/unordered batch-all-objects callbacksJeff King1-1/+3
When we originally added --batch-all-objects, it stuffed everything into an oid_array(), and then iterated over that array with a callback to write the actual output. When we later added --unordered, that code path writes immediately as we discover each object, but just calls the same batch_object_cb() as our entry point to the writing code. That callback has a narrow interface; it only receives the oid, but we know much more about each object in the unordered write (which we'll make use of in the next patch). So let's just call batch_object_write() directly. The callback wasn't saving us much effort. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08cat-file: disable refs/replace with --batch-all-objectsJeff King1-0/+2
When we're enumerating all objects in the object database, it doesn't make sense to respect refs/replace. The point of this option is to enumerate all of the objects in the database at a low level. By definition we'd already show the replacement object's contents (under its real oid), and showing those contents under another oid is almost certainly working against what the user is trying to do. Note that you could make the same argument for something like: git show-index <foo.idx | awk '{print $2}' | git cat-file --batch but there we can't know in cat-file exactly what the user intended, because we don't know the source of the input. They could be trying to do low-level debugging, or they could be doing something more high-level (e.g., imagine a porcelain built around cat-file for its object accesses). So in those cases, we'll have to rely on the user specifying "git --no-replace-objects" to tell us what to do. One _could_ make an argument that "cat-file --batch" is sufficiently low-level plumbing that it should not respect replace-objects at all (and the caller should do any replacement if they want it). But we have been doing so for some time. The history is a little tangled: - looking back as far as v1.6.6, we would not respect replace refs for --batch-check, but would for --batch (because the former used sha1_object_info(), and the replace mechanism only affected actual object reads) - this discrepancy was made even weirder by 98e2092b50 (cat-file: teach --batch to stream blob objects, 2013-07-10), where we always output the header using the --batch-check code, and then printed the object separately. This could lead to "cat-file --batch" dying (when it notices the size or type changed for a non-blob object) or even producing bogus output (in streaming mode, we didn't notice that we wrote the wrong number of bytes). - that persisted until 1f7117ef7a (sha1_file: perform object replacement in sha1_object_info_extended(), 2013-12-11), which then respected replace refs for both forms. So it has worked reliably this way for over 7 years, and we should make sure it continues to do so. That could also be an argument that --batch-all-objects should not change behavior (which this patch is doing), but I really consider the current behavior to be an unintended bug. It's a side effect of how the code is implemented (feeding the oids back into oid_object_info() rather than looking at what we found while reading the loose and packed object storage). The implementation is straight-forward: we just disable the global read_replace_refs flag when we're in --batch-all-objects mode. It would perhaps be a little cleaner to change the flag we pass to oid_object_info_extended(), but that's not enough. We also read objects via read_object_file() and stream_blob_to_fd(). The former could switch to its _extended() form, but the streaming code has no mechanism for disabling replace refs. Setting the global flag works, and as a bonus, it's impossible to have any "oops, we're sometimes replacing the object and sometimes not" bugs in the output (like the ones caused by 98e2092b50 above). The tests here cover the regular-input and --batch-all-objects cases, for both --batch-check and --batch. There is a test in t6050 that covers the regular-input case with --batch already, but this new one goes much further in actually verifying the output (plus covering --batch-check explicitly). This is perhaps a little overkill and the tests would be simpler just covering --batch-check, but I wanted to make sure we're checking that --batch output is consistent between the header and the content. The global-flag technique used here makes that easy to get right, but this is future-proofing us against regressions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>