aboutsummaryrefslogtreecommitdiffstats
path: root/diff.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-10-24diff: simplify run_external_diff() quiet logicJeff King1-3/+2
We'd sometimes end up in run_external_diff() to do a dry-run diff (e.g., to find content-level changes for --quiet). We recognize this quiet mode by seeing the lack of DIFF_FORMAT_PATCH in the output format. But since introducing an explicit dry-run check via 3ed5d8bd73 (diff: stop output garbled message in dry run mode, 2025-10-20), this logic can never trigger. We can only get to this function by calling diff_flush_patch(), and that comes from only two places: 1. A dry-run flush comes from diff_flush_patch_quietly(), which is always in dry-run mode (so the other half of our "||" is true anyway). 2. A regular flush comes from diff_flush_patch_all_file_pairs(), which is only called when output_format has DIFF_FORMAT_PATCH in it. So we can simplify our "quiet" condition to just checking dry-run mode (which used to be a specific flag, but recently became just a NULL "file" pointer). And since it's so simple, we can just do that inline. This makes the logic about o->file more obvious, since we handle the NULL and non-stdout cases next to each other. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24diff: drop dry-run redirection to /dev/nullJeff King1-28/+3
As an added protection against dry-run diffs accidentally producing output, we redirect diff_options.file to /dev/null. But as of the previous patch, this now does nothing, since dry-run diffs are implemented by setting "file" to NULL. So we can drop this extra code with no change in behavior. This is effectively a revert of 623f7af284 (diff: restore redirection to /dev/null for diff_from_contents, 2025-10-17) and 3da4413dbc (diff: make sure the other caller of diff_flush_patch_quietly() is silent, 2025-10-22), but: 1. We get a conflict because we already dropped the color_moved handling in an earlier patch. But we just resolve the conflicts to "theirs" (removing all of the code). 2. We retain the test from 623f7af284. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24diff: replace diff_options.dry_run flag with NULL fileJeff King1-8/+8
We introduced a dry_run flag to diff_options in b55e6d36eb (diff: ensure consistent diff behavior with ignore options, 2025-08-08), with the idea that the lower-level diff code could skip output when it is set. As we saw with the bugs fixed by 3ed5d8bd73 (diff: stop output garbled message in dry run mode, 2025-10-20), it is easy to miss spots. In the end, we located all of them by checking where diff_options.file is used. That suggests another possible approach: we can replace the dry_run boolean with a NULL pointer for "file", as we know that using "file" in dry_run mode would always be an error. This turns any missed spots from producing extra output[1] into a segfault. Which is less forgiving, but that is the point: this is indicative of a programming error, and complaining loudly and immediately is good. [1] We protect ourselves against garbled output as a separate step, courtesy of 623f7af284 (diff: restore redirection to /dev/null for diff_from_contents, 2025-10-17). So in that sense this patch can only introduce user-visible errors (since any "bugs" were going to /dev/null before), but the idea is to catch them rather than quietly send garbage to /dev/null. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24diff: drop save/restore of color_moved in dry-run modeJeff King1-4/+0
When running a dry-run content-level diff to check whether a "--quiet" diff has any changes, we have always unset the color_moved variable since the feature was added in 2e2d5ac184 (diff.c: color moved lines differently, 2017-06-30). The reasoning is not given explicitly there, but presumably the idea is that since color_moved requires a lot of extra computation to match lines but does not actually affect the found_changes flag, we want to skip it. Later, in 3da4413dbc (diff: make sure the other caller of diff_flush_patch_quietly() is silent, 2025-10-22) we copied the same idea for other dry-run diffs. But neither spot actually needs to reset this flag at all, because diff_flush_patch() will not ever compute color_moved. Nor could it, as it is only looking at a single file-pair, and we detect moves across files. So color_moved is checked only when we are actually doing real DIFF_FORMAT_PATCH output, and call diff_flush_patch_all_file_pairs(). So we can get rid of these extra lines to save and restore the color_moved flag without changing the behavior at all. (Note that there is no "restore" to drop for the second caller, as we know at that point we are not generating any output and can just leave the feature disabled). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24diff: send external diff output to diff_options.fileJeff King1-1/+4
Diff output usually goes to the process stdout, but it can be redirected with the "--output" option. We store this in the "file" pointer of diff_options, and all of the diff code should write there instead of to stdout. But there's one spot we missed: running an external diff cmd. We don't redirect its output at all, so it just defaults to the stdout of the parent process. We should instead point its stdout at our output file. There are a few caveats to watch out for when doing so: - The stdout field takes a descriptor, not a FILE pointer. We can pull out the descriptor with fileno(). - The run-command API always closes the stdout descriptor we pass to it. So we must duplicate it (otherwise we break the FILE pointer, since it now points to a closed descriptor). - We don't need to worry about closing our dup'd descriptor, since the point is that run-command will do it for us (even in the case of an error). But we do need to make sure we skip the dup() if we set no_stdout (because then run-command will not look at it at all). - When the output is going to stdout, it would not be wrong to dup() the descriptor, but we don't need to. We can skip that extra work with a simple pointer comparison. - It seems like you'd need to fflush() the descriptor before handing off a copy to the child process to prevent out-of-order writes. But that was true even before this patch! It works because run-command always calls fflush(NULL) before running the child. The new test shows the breakage (and fix). The need for duplicating the descriptor doesn't need a new test; that is covered by the later test "GIT_EXTERNAL_DIFF with more than one changed files". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-23diff: stop output garbled message in dry run modeLidong Yan1-2/+6
Earlier, b55e6d36 (diff: ensure consistent diff behavior with ignore options, 2025-08-08) introduced "dry-run" mode to the diff machinery so that content-based diff filtering (like ignoring space changes or those that match -I<regex>) can first try to produce a patch without emitting any output to see if under the given diff filtering condition we would get any output lines, and a new helper function diff_flush_patch_quietly() was introduced to use the mode to see an individual filepair needs to be shown. However, the solution was not complete. When files are deleted, file modes change, or there are unmerged entries in the index, dry-run mode still produces output because we overlooked these conditions, and as a result, dry-run mode was not quiet. To fix this, return early in emit_diff_symbol_from_struct() if we are in dry-run mode. This function will be called by all the emit functions to output the results. Returning early can avoid diff output when files are deleted or file modes are changed. Stop print message in dry-run mode if we have unmerged entries in index. Discard output of external diff tool in dry-run mode. Signed-off-by: Lidong Yan <yldhome2d2@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-23Merge branch 'jc/diff-from-contents-fix' into ↵Junio C Hamano1-3/+23
ly/diff-name-only-with-diff-from-content * jc/diff-from-contents-fix: diff: make sure the other caller of diff_flush_patch_quietly() is silent
2025-10-23diff: make sure the other caller of diff_flush_patch_quietly() is silentJunio C Hamano1-3/+23
Earlier, we added is a protection for the loop that computes "git diff --quiet -w" to ensure calls to the diff_flush_patch_quietly() helper stays quiet. Do the same for another loop that deals with options like "--name-status" to make calls to the same helper. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-22Merge branch 'jk/diff-from-contents-fix' into ↵Junio C Hamano1-0/+9
ly/diff-name-only-with-diff-from-content * jk/diff-from-contents-fix: diff: restore redirection to /dev/null for diff_from_contents
2025-10-17diff: restore redirection to /dev/null for diff_from_contentsJeff King1-0/+9
In --quiet mode, since we produce only an exit code for "something was changed" and no actual output, we can often get by with just a tree-level diff. However, certain options require us to actually look at the file contents (e.g., if we are ignoring whitespace changes). We have a flag "diff_from_contents" for that, and if it is set we call diff_flush() on each path. To avoid producing any output (since we were asked to be --quiet), we traditionally just redirected the output to /dev/null. That changed in b55e6d36eb (diff: ensure consistent diff behavior with ignore options, 2025-08-08), which replaced that with a "dry_run" flag. In theory, with dry_run set, we should produce no output. But it carries a risk of regression: if we forget to respect dry_run in any of the output paths, we'll accidentally produce output. And indeed, there is at least one such regression in that commit, as it covered only the case where we actually call into xdiff, and not creation or deletion diffs, where we manually generate the headers. We even test this case in t4035, but only with diff-tree, which does not show the bug by default because it does not require diff_from_contents. But git-diff does, because it allows external diff programs by default (so we must dig into each diff filepair to decide if it requires running an external diff that may declare two distinct blobs to actually be the same). We should fix all of those code paths to respect dry_run correctly, but in the meantime we can protect ourselves more fully by restoring the redirection to /dev/null. This gives us an extra layer of protection against regressions dues to other code paths we've missed. Though the original issue was reported with "git diff" (and due to its default of --ext-diff), I've used "diff-tree -w" in the new test. It triggers the same issue, but I think the fact that "-w" implies diff_from_contents is a bit more obvious, and fits in with the rest of t4035. Reported-by: Jake Zimmerman <jake@zimmerman.io> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-29Merge branch 'jk/color-variable-fixes'Junio C Hamano1-26/+22
Some places in the code confused a variable that is *not* a boolean to enable color but is an enum that records what the user requested to do about color. A couple of bugs of this sort have been fixed, while the code has been cleaned up to prevent similar bugs in the future. * jk/color-variable-fixes: config: store want_color() result in a separate bool add-interactive: retain colorbool values longer color: return bool from want_color() color: use git_colorbool enum type to store colorbools pretty: use format_commit_context.auto_color as colorbool diff: stop passing ecbdata->use_color as boolean diff: pass o->use_color directly to fill_metainfo() diff: don't use diff_options.use_color as a strict bool diff: simplify color_moved check when flushing grep: don't treat grep_opt.color as a strict bool color: return enum from git_config_colorbool() color: use GIT_COLOR_* instead of numeric constants
2025-09-16color: use git_colorbool enum type to store colorboolsJeff King1-3/+3
We traditionally used "int" to store and pass around the values defined by "enum git_colorbool" (which were originally just #define macros). Using an int doesn't produce incorrect results, but using the actual enum makes the intent of the code more clear. It would be nice if the compiler could catch cases where we used the enum and an int interchangeably, since it's very easy to accidentally check the boolean true/false of a colorbool like: if (branch_use_color) This is wrong because GIT_COLOR_UNKNOWN and GIT_COLOR_AUTO evaluate to true in C, even though we may ultimately decide not to use color. But C is pretty happy to convert between ints and enums (even with various -Wenum-* warnings). So this sadly doesn't protect us from such mistakes, but it hopefully does make the code easier to read. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16diff: stop passing ecbdata->use_color as booleanJeff King1-3/+3
In emit_hunk_header(), we evaluate ecbdata->color_diff both as a git_colorbool, passing it to diff_get_color(): const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET); and as a strict boolean: const char *reverse = ecbdata->color_diff ? GIT_COLOR_REVERSE : ""; At first glance this seems wrong. Usually we store the color decision as a git_colorbool, so the second line would get confused by GIT_COLOR_AUTO (which is boolean true, but may still mean we do not produce color). However, the second line is correct because our caller sets color_diff using want_color(), which collapses the colorbool to a strict true/false boolean. The first line is _also_ correct because of the idempotence of want_color(). Even though diff_get_color() will pass our true/false value through want_color() again, the result will be left untouched. But let's pass through the colorbool itself, which makes it more consistent with the rest of the diff code. We'll need to then call want_color() whenever we treat it as a boolean, but there is only such spot (the one quoted above). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16diff: pass o->use_color directly to fill_metainfo()Jeff King1-1/+1
We pass the use_color parameter of fill_metainfo() as a strict boolean, using: want_color(o->use_color) && !pgm to derive its value. But then inside the function, we pass it to diff_get_color(), which expects one of the git_colorbool enum values, and so feeds it to want_color() again. Even though want_color() produces a strict 0/1 boolean, this doesn't produce wrong results because want_color() is idempotent. Since GIT_COLOR_ALWAYS and NEVER are defined as 1 and 0, and because want_color() passes through those values, evaluating "want_color(foo)" and "want_color(want_color(foo))" will return the same result. But as part of a longer strategy to align the types we use for storing these values, let's pass through the colorbool directly. To handle the "&&" case here, we'll convert the presence of "pgm" into "NEVER", which arguably makes the intent of the code more clear anyway. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16diff: don't use diff_options.use_color as a strict boolJeff King1-3/+2
We disable --color-moved if color is not in use at all. This happens in diff_setup_done(), where we set options->color_moved to 0 if options->use_color is not true. But a strict boolean check here is not correct; use_color could be GIT_COLOR_UNKNOWN or GIT_COLOR_AUTO, both of which evaluate to true, even though we may later decide not to show colors. We should be using want_color() to convert that git_colorbool into a true boolean. As it turns out, this does not produce wrong output. Even though we go to the trouble to detect the moved lines, ultimately we get the color values from diff_get_color(), which does check want_color(). And so it returns the empty string for each color, and we "color" the result with nothing. So the output is correct, but there is a small but measurable performance cost to doing the line detection. E.g., in git.git before and after this patch (there are no colors shown because hyperfine redirects output to /dev/null): Benchmark 1: ./git.old log --no-merges -p --color-moved -1000 Time (mean ± σ): 1.019 s ± 0.013 s [User: 0.955 s, System: 0.064 s] Range (min … max): 1.005 s … 1.045 s 10 runs Benchmark 2: ./git.new log --no-merges -p --color-moved -1000 Time (mean ± σ): 982.9 ms ± 14.5 ms [User: 925.8 ms, System: 57.1 ms] Range (min … max): 965.1 ms … 1003.2 ms 10 runs Summary ./git.new log --no-merges -p --color-moved -1000 ran 1.04 ± 0.02 times faster than ./git.old log --no-merges -p --color-moved -1000 Note that the fix is not quite as simple as just calling want_color() from diff_setup_done(). There's a subtle timing issue that goes back to daa0c3d971 (color: delay auto-color decision until point of use, 2011-08-17), the commit that adds want_color() in the first place. As discussed there, we must delay evaluating the colorbool value until all pager setup is complete. So instead, we'll leave the "color_moved" field intact in diff_setup_done(), and modify the point where it is evaluated. Fortunately there is only one such spot that controls whether we run any of the color-moved code at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16diff: simplify color_moved check when flushingJeff King1-14/+11
In diff_flush_patch_all_file_pairs(), we set o->emitted_symbols if and only if o->color_moved is true. That causes the lower-level routines to fill up o->emitted_symbols, which we then analyze in order to do the actual colorizing. But in that final step, we do: if (o->emitted_symbols) { if (o->color_moved) { ...actual coloring... } ...clean up of emitted_symbols... } The inner "if" will always trigger, since we set emitted_symbols only when doing color_moved (it is a little confusing that it is set inside the diff_options struct, but that is for convenience of passing it to the lower-level routines; we always clear it at the end of flushing, since 48edf3a02a (diff: clear emitted_symbols flag after use, 2019-01-24)). Let's simplify the code a bit by just dropping the inner "if" and running its block unconditionally. In theory the current code might be useful if another feature besides color_moved setup and used emitted_symbols, but it would be easy to refactor later to handle that. And in the meantime, this makes further work in this area easier. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16color: use GIT_COLOR_* instead of numeric constantsJeff King1-3/+3
Long ago Git's decision to show color for a subsytem was stored in a tri-state variable: it could be true (1), false (0), or unknown (-1). But since daa0c3d971 (color: delay auto-color decision until point of use, 2011-08-17) we want to carry around a new state, "auto", which bases the decision on the tty-ness of stdout (rather than collapsing that "auto" state to a true/false immediately). That commit introduced a set of GIT_COLOR_* defines to represent each state: UNKNOWN, ALWAYS, NEVER, and AUTO. But it only used the AUTO value, and left alone code using bare 0/1/-1 values. And of course since then we've grown many new spots that use those bare values. Let's switch all of these to use the named constants. That should make the code a bit easier to read, as it is more obvious that we're representing a color decision. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-25Merge branch 'tc/diff-tree-max-depth'Junio C Hamano1-0/+24
"git diff-tree" learned "--max-depth" option. * tc/diff-tree-max-depth: diff: teach tree-diff a max-depth parameter within_depth: fix return for empty path combine-diff: zero memory used for callback filepairs
2025-08-22Merge branch 'ly/diff-name-only-with-diff-from-content'Junio C Hamano1-14/+50
Various options to "git diff" that makes comparison ignore certain aspects of the differences (like "space changes are ignored", "differences in lines that match these regular expressions are ignored") did not work well with "--name-only" and friends. * ly/diff-name-only-with-diff-from-content: diff: ensure consistent diff behavior with ignore options
2025-08-08diff: ensure consistent diff behavior with ignore optionsLidong Yan1-14/+50
In git-diff, options like `-w` and `-I<regex>`, two files are considered equivalent under the specified "ignore" rules, even when they are not bit-for-bit identical. For options like `--raw`, `--name-status`, and `--name-only`, git-diff deliberately compares only the SHA values to determine whether two files are equivalent, for performance reasons. As a result, a file shown in `git diff --name-status` may not appear in `git diff --patch`. To quickly determine whether two files are equivalent, add a helper function diff_flush_patch_quietly() in diff.c. Add `.dry_run` field in `struct diff_options`. When `.dry_run` is true, builtin_diff() returns immediately upon finding any change. Call diff_flush_patch_quietly() to determine if we should flush `--raw`, `--name-only` or `--name-status` output. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Lidong Yan <yldhome2d2@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-07diff: teach tree-diff a max-depth parameterJeff King1-0/+24
When you are doing a tree-diff, there are basically two options: do not recurse into subtrees at all, or recurse indefinitely. While most callers would want to always recurse and see full pathnames, some may want the efficiency of looking only at a particular level of the tree. This is currently easy to do for the top-level (just turn off recursion), but you cannot say "show me what changed in subdir/, but do not recurse". This patch adds a max-depth parameter which is measured from the closest pathspec match, so that you can do: git log --raw --max-depth=1 -- a/b/c and see the raw output for a/b/c/, but not those of a/b/c/d/ (instead of the raw output you would see for a/b/c/d). Co-authored-by: Toon Claes <toon@iotcl.com> Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-02diff: simplify parsing of diff.colormovedwsJunio C Hamano1-13/+7
The code to parse this configuration variable, whose value is a comma-separated list of known tokens like "ignore-space-change" and "ignore-all-space", uses string_list_split() to split the value into pieces, and then places each piece of string in a strbuf to trim, before comparing the result with the list of known tokens. Thanks to the previous steps, now string_list_split() can trim the resulting pieces before it places them in the string list. Use it to simplify the code. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-02string-list: align string_list_split() with its _in_place() counterpartJunio C Hamano1-1/+1
The string_list_split_in_place() function was updated by 52acddf3 (string-list: multi-delimiter `string_list_split_in_place()`, 2023-04-24) to take more than one delimiter characters, hoping that we can later use it to replace our uses of strtok(). We however did not make a matching change to the string_list_split() function, which is very similar. Before giving both functions more features in future commits, allow string_list_split() to also take more than one delimiter characters to make them closer to each other. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `oid_object_info()`Patrick Steinhardt1-9/+9
Rename `oid_object_info()` to `odb_read_object_info()` as well as their `_extended()` variant to match other functions related to the object database and our modern coding guidelines. Introduce compatibility wrappers so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt1-1/+1
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-08Merge branch 'js/diff-codeql-false-positive-workaround'Junio C Hamano1-1/+1
Work around false positive given by CodeQL. * js/diff-codeql-false-positive-workaround: diff: check range before dereferencing an array element
2025-04-29diff: check range before dereferencing an array elementJohannes Schindelin1-1/+1
Before accessing an array element at a given index, it should be verified that the index is within the desired bounds, not afterwards, otherwise it may not make sense to even access the array element in the first place. This is the point of CodeQL's `cpp/offset-use-before-range-check` rule. This CodeQL rule unfortunately is also triggered by the `fill_es_indent_data()` code, even though the condition `off < len - 1` does not even need to guarantee that the offset is in bounds (`s` points to a NUL-terminated string, for which `s[off] == '\r'` would fail before running out of bounds). Let's work around this rare false positive to help us use an otherwise mostly useful tool is a worthy thing to do. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-24Merge branch 'ps/parse-options-integers'Junio C Hamano1-4/+9
Update parse-options API to catch mistakes to pass address of an integral variable of a wrong type/size. * ps/parse-options-integers: parse-options: detect mismatches in integer signedness parse-options: introduce precision handling for `OPTION_UNSIGNED` parse-options: introduce precision handling for `OPTION_INTEGER` parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()` parse-options: support unit factors in `OPT_INTEGER()` global: use designated initializers for options parse: fix off-by-one for minimum signed values
2025-04-17global: use designated initializers for optionsPatrick Steinhardt1-4/+9
While we expose macros for most of our different option types understood by the "parse-options" subsystem, not every combination of fields that has one as that would otherwise quickly lead to an explosion of macros. Instead, we just initialize structures manually for those variants of fields that don't have a macro. Callsites that open-code these structure initialization don't use designated initializers though and instead just provide values for each of the fields that they want to initialize. This has three significant downsides: - Callsites need to specify all values up to the last field that they care about. This often includes fields that should simply be left at their default zero-initialized state, which adds distraction. - Any reader not deeply familiar with the layout of the structure has a hard time figuring out what the respective initializers mean. - Reordering or introducing new fields in the middle of the structure is impossible without adapting all callsites. Convert all sites to instead use designated initializers, which we have started using in our codebase quite a while ago. This allows us to skip any default-initialized fields, gives the reader context by specifying the field names and allows us to reorder or introduce new fields where we want to. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt1-1/+1
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08Merge branch 'ps/object-wo-the-repository' into ps/object-file-cleanupJunio C Hamano1-6/+8
* ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-03-10hash: stop depending on `the_repository` in `null_oid()`Patrick Steinhardt1-4/+4
The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10environment: move access to "core.bigFileThreshold" into repo settingsPatrick Steinhardt1-2/+4
The "core.bigFileThreshold" setting is stored in a global variable and populated via `git_default_core_config()`. This may cause issues in the case where one is handling multiple different repositories in a single process with different values for that config key, as we may or may not see the correct value in that case. Furthermore, global state blocks our path towards libification. Refactor the code so that we instead store the value in `struct repo_settings`, where the value is computed as-needed and cached. Note that this change requires us to adapt one test in t1050 that verifies that we die when parsing an invalid "core.bigFileThreshold" value. The exercised Git command doesn't use the value at all, and thus it won't hit the new code path that parses the value. This is addressed by using git-hash-object(1) instead, which does read the value. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-03diff: add option to skip resolving diff statusesJustin Tobler1-1/+1
By default, `diffcore_std()` resolves the statuses for queued diff file pairs by calling `diff_resolve_rename_copy()`. If status information is already manually set, invoking `diffcore_std()` may change the status value. Introduce the `skip_resolving_statuses` diff option that prevents `diffcore_std()` from resolving file pair statuses when enabled. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-03diff: return diff_filepair from diff queue helpersJustin Tobler1-20/+50
The `diff_addremove()` and `diff_change()` functions set up and queue diffs, but do not return the `diff_filepair` added to the queue. In a subsequent commit, modifications to `diff_filepair` need to occur in certain cases after being queued. Since the existing `diff_addremove()` and `diff_change()` are also used for callbacks in `diff_options` as types `add_remove_fn_t` and `change_fn_t`, modifying the existing function signatures requires further changes. The diff options for pruning use `file_add_remove()` and `file_change()` where file pairs do not even get queued. Thus, separate functions are implemented instead. Split out the queuing operations into `diff_queue_addremove()` and `diff_queue_change()` which also return a handle to the queued `diff_filepair`. Both `diff_addremove()` and `diff_change()` are reimplemented as thin wrappers around the new functions. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-25Merge branch 'bc/diff-reject-empty-arg-to-pickaxe'Junio C Hamano1-0/+4
The -G/-S options to the "diff" family of commands caused us to hit a BUG() when they get no values; they have been corrected. * bc/diff-reject-empty-arg-to-pickaxe: diff: don't crash with empty argument to -G or -S
2025-02-18diff: don't crash with empty argument to -G or -Sbrian m. carlson1-0/+4
The pickaxe options, -G and -S, need either a regex or a string to look through the history for. An empty value isn't very useful since it would either match everything or nothing, and what's worse, we presently crash with a BUG like so when the user provides one: BUG: diffcore-pickaxe.c:241: should have needle under -G or -S Since it's not very nice of us to crash and this wouldn't do anything useful anyway, let's simply inform the user that they must provide a non-empty argument and exit with an error if they provide an empty one instead. Reported-by: Jared Van Bortel <cebtenzzre@gmail.com> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31global: adapt callers to use generic hash context helpersPatrick Steinhardt1-12/+12
Adapt callers to use generic hash context helpers instead of using the hash algorithm to update them. This makes the callsites easier to reason about and removes the possibility that the wrong hash algorithm is used to update the hash context's state. And as a nice side effect this also gets rid of a bunch of users of `the_hash_algo`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31hash: stop typedeffing the hash contextPatrick Steinhardt1-5/+5
We generally avoid using `typedef` in the Git codebase. One exception though is the `git_hash_ctx`, likely because it used to be a union rather than a struct until the preceding commit refactored it. But now that it is a normal `struct` there isn't really a need for a typedef anymore. Drop the typedef and adapt all callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-23Merge branch 'ps/build-sign-compare'Junio C Hamano1-0/+1
Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-18pager: stop using `the_repository`Patrick Steinhardt1-2/+2
Stop using `the_repository` in the "pager" subsystem by passing in a repository when setting up the pager and when configuring it. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18Merge branch 'ps/build-sign-compare' into ps/the-repositoryJunio C Hamano1-0/+1
* ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt1-0/+1
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04packfile: pass down repository to `has_object[_kept]_pack`Karthik Nayak1-1/+2
The functions `has_object[_kept]_pack` currently rely on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Let's remove its usage from these functions and any related ones. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-10Merge branch 'jk/output-prefix-cleanup'Junio C Hamano1-7/+3
Code clean-up. * jk/output-prefix-cleanup: diff: store graph prefix buf in git_graph struct diff: return line_prefix directly when possible diff: return const char from output_prefix callback diff: drop line_prefix_length field line-log: use diff_line_prefix() instead of custom helper
2024-10-03diff: return const char from output_prefix callbackJeff King1-6/+3
The diff_options structure has an output_prefix callback for returning a prefix string, but it does so by returning a pointer to a strbuf. This makes the interface awkward. There's no reason the callback should need to use a strbuf, and it creates questions about whether the ownership of the resulting buffer should be transferred to the caller (it should not be, but a recent attempt to clean up this code led to a double-free in some cases). The one advantage we get is that the strbuf contains a ptr/len pair, so we could in theory have a prefix with embedded NULs. But we can observe that none of the existing callbacks would ever produce such a NUL (they are usually just indentation or graph symbols, and even the "--line-prefix" option takes a NUL-terminated string). And anyway, only one caller (the one in log_tree_diff_flush) actually looks at the strbuf length. In every other case we use a helper function which discards the length and just returns the NUL-terminated string. So let's just have the callback return a "const char *" pointer. It's up to the callbacks themselves if they want to use a strbuf under the hood. And now the caller in log_tree_diff_flush() can just use the helper function along with everybody else. That lets us even simplify out the function pointer check, since the helper returns an empty string (technically this does mean we'll sometimes issue an empty fputs() call, but I don't think this code path is hot enough to care about that). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-03diff: drop line_prefix_length fieldJeff King1-1/+0
The diff_options structure holds a line_prefix string and an associated length. But the length is always just the strlen() of the NUL-terminated string. Let's simplify the code by just storing the string pointer and assuming it is NUL-terminated when we use it. This will cause us to compute the string length in a few extra spots, but I don't think any of these are particularly hot code paths. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30diff: improve lifecycle management of diff queuesPatrick Steinhardt1-10/+12
The lifecycle management of diff queues is somewhat confusing: - For most of the part this can be attributed to `DIFF_QUEUE_CLEAR()`, which does not release any memory but rather initializes the queue, only. This is in contrast to our common naming schema, where "clearing" means that we release underlying memory and then re-initialize the data structure such that it is ready to use. - A second offender is `diff_free_queue()`, which does not free the queue structure itself. It is rather a release-style function. Refactor the code to make things less confusing. `DIFF_QUEUE_CLEAR()` is replaced by `DIFF_QUEUE_INIT` and `diff_queue_init()`, while `diff_free_queue()` is replaced by `diff_queue_release()`. While on it, adapt callsites where we call `DIFF_QUEUE_CLEAR()` with the intent to release underlying memory to instead call `diff_queue_clear()` to fix memory leaks. This memory leak is exposed by t4211, but plugging it alone does not make the whole test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30Merge branch 'ps/leakfixes-part-7' into ps/leakfixes-part-8Junio C Hamano1-2/+5
* ps/leakfixes-part-7: (23 commits) diffcore-break: fix leaking filespecs when merging broken pairs revision: fix leaking parents when simplifying commits builtin/maintenance: fix leak in `get_schedule_cmd()` builtin/maintenance: fix leaking config string promisor-remote: fix leaking partial clone filter grep: fix leaking grep pattern submodule: fix leaking submodule ODB paths trace2: destroy context stored in thread-local storage builtin/difftool: plug several trivial memory leaks builtin/repack: fix leaking configuration diffcore-order: fix leaking buffer when parsing orderfiles parse-options: free previous value of `OPTION_FILENAME` diff: fix leaking orderfile option builtin/pull: fix leaking "ff" option dir: fix off by one errors for ignored and untracked entries builtin/submodule--helper: fix leaking remote ref on errors t/helper: fix leaking subrepo in nested submodule config helper builtin/submodule--helper: fix leaking error buffer builtin/submodule--helper: clear child process when not running it submodule: fix leaking update strategy ...
2024-09-27diff: fix leaking orderfile optionPatrick Steinhardt1-2/+5
The `orderfile` diff option is being assigned via `OPT_FILENAME()`, which assigns an allocated string to the variable. We never free it though, causing a memory leak. Change the type of the string to `char *` and free it to plug the leak. This also requires us to use `xstrdup()` to assign the global config to it in case it is set. This leak is being hit in t7621, but plugging it alone does not make the test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>