aboutsummaryrefslogtreecommitdiffstats
path: root/commit.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-08-05Merge branch 'ps/object-file-wo-the-repository'Junio C Hamano1-2/+2
Reduce implicit assumption and dependence on the_repository in the object-file subsystem. * ps/object-file-wo-the-repository: object-file: get rid of `the_repository` in index-related functions object-file: get rid of `the_repository` in `force_object_loose()` object-file: get rid of `the_repository` in `read_loose_object()` object-file: get rid of `the_repository` in loose object iterators object-file: remove declaration for `for_each_file_in_obj_subdir()` object-file: inline `for_each_loose_file_in_objdir_buf()` object-file: get rid of `the_repository` when writing objects odb: introduce `odb_write_object()` loose: write loose objects map via their source object-file: get rid of `the_repository` in `finalize_object_file()` object-file: get rid of `the_repository` in `loose_object_info()` object-file: get rid of `the_repository` when freshening objects object-file: inline `check_and_freshen()` functions object-file: get rid of `the_repository` in `has_loose_object()` object-file: stop using `the_hash_algo` object-file: fix -Wsign-compare warnings
2025-07-28Merge branch 'rs/pop-recent-commit-with-prio-queue'Junio C Hamano1-3/+11
The pop_most_recent_commit() function can have quite expensive worst case performance characteristics, which has been optimized by using prio-queue data structure. * rs/pop-recent-commit-with-prio-queue: commit: use prio_queue_replace() in pop_most_recent_commit() prio-queue: add prio_queue_replace() commit: convert pop_most_recent_commit() to prio_queue
2025-07-22commit: use prio_queue_replace() in pop_most_recent_commit()René Scharfe1-2/+9
Optimize pop_most_recent_commit() by adding the first parent using the more efficient prio_queue_peek() and prio_queue_replace() instead of prio_queue_get() and prio_queue_put(). On my machine this neutralizes the performance hit it took in Git's own repository when we converted it to prio_queue two patches ago (git_pq): $ hyperfine -w3 -L git ./git_2.50.1,./git_pq,./git '{git} rev-parse :/^Initial.revision' Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision Time (mean ± σ): 1.073 s ± 0.003 s [User: 1.053 s, System: 0.019 s] Range (min … max): 1.069 s … 1.078 s 10 runs Benchmark 2: ./git_pq rev-parse :/^Initial.revision Time (mean ± σ): 1.077 s ± 0.002 s [User: 1.057 s, System: 0.018 s] Range (min … max): 1.072 s … 1.079 s 10 runs Benchmark 3: ./git rev-parse :/^Initial.revision Time (mean ± σ): 1.069 s ± 0.003 s [User: 1.049 s, System: 0.018 s] Range (min … max): 1.065 s … 1.074 s 10 runs Summary ./git rev-parse :/^Initial.revision ran 1.00 ± 0.00 times faster than ./git_2.50.1 rev-parse :/^Initial.revision 1.01 ± 0.00 times faster than ./git_pq rev-parse :/^Initial.revision Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-22commit: convert pop_most_recent_commit() to prio_queueRené Scharfe1-3/+4
pop_most_recent_commit() calls commit_list_insert_by_date() for parent commits, which is itself called in a loop. This can lead to quadratic complexity if there are many merges. Replace the commit_list with a prio_queue to ensure logarithmic worst case complexity and convert all three users. Add a performance test that exercises one of them using a pathological history that consists of 50% merges and 50% root commits to demonstrate the speedup: Test v2.50.1 HEAD ---------------------------------------------------------------------- 1501.2: rev-parse ':/65535' 2.48(2.47+0.00) 0.20(0.19+0.00) -91.9% Alas, sane histories don't benefit from the conversion much, and traversing Git's own history takes a 1% performance hit on my machine: $ hyperfine -w3 -L git ./git_2.50.1,./git '{git} rev-parse :/^Initial.revision' Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision Time (mean ± σ): 1.071 s ± 0.004 s [User: 1.052 s, System: 0.017 s] Range (min … max): 1.067 s … 1.078 s 10 runs Benchmark 2: ./git rev-parse :/^Initial.revision Time (mean ± σ): 1.079 s ± 0.003 s [User: 1.060 s, System: 0.017 s] Range (min … max): 1.074 s … 1.083 s 10 runs Summary ./git_2.50.1 rev-parse :/^Initial.revision ran 1.01 ± 0.00 times faster than ./git rev-parse :/^Initial.revision Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16odb: introduce `odb_write_object()`Patrick Steinhardt1-2/+2
We do not have a backend-agnostic way to write objects into an object database. While there is `write_object_file()`, this function is rather specific to the loose object format. Introduce `odb_write_object()` to plug this gap. For now, this function is a simple wrapper around `write_object_file()` and doesn't even use the passed-in object database yet. This will change in subsequent commits, where `write_object_file()` is converted so that it works on top of an `odb_source`. `odb_write_object()` will then become responsible for deciding which source an object shall be written to. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `has_object()`Patrick Steinhardt1-1/+1
Rename `has_object()` to `odb_has_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `repo_read_object_file()`Patrick Steinhardt1-3/+3
Rename `repo_read_object_file()` to `odb_read_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `oid_object_info()`Patrick Steinhardt1-1/+2
Rename `oid_object_info()` to `odb_read_object_info()` as well as their `_extended()` variant to match other functions related to the object database and our modern coding guidelines. Introduce compatibility wrappers so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: get rid of `the_repository` in `assert_oid_type()`Patrick Steinhardt1-1/+1
Get rid of our dependency on `the_repository` in `assert_oid_type()` by passing in the object database as a parameter and adjusting all callers. Rename the function to `odb_assert_oid_type()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt1-1/+1
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-24Merge branch 'ps/object-file-cleanup'Junio C Hamano1-1/+2
Code clean-up. * ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-15Merge branch 'ps/object-wo-the-repository'Junio C Hamano1-1/+1
The object layer has been updated to take an explicit repository instance as a parameter in more code paths. * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt1-1/+1
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-file: split out functions relating to object store subsystemPatrick Steinhardt1-0/+1
While we have the "object-store.h" header, most of the functionality for object stores is actually hosted in "object-file.c". This makes it hard to find relevant functions and causes us to mix up concerns. Split out functions relating to the object store subsystem into a new "object-store.c" file. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-24commit: move clear_commit_marks_many() loop body to clear_commit_marks()René Scharfe1-9/+7
clear_commit_marks_many() clears multiple commits one by one. Move the code for handling a single commit to clear_commit_marks() and call it instead of the other way around, to simplify the code. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10object-file-convert: stop depending on `the_repository`Patrick Steinhardt1-1/+1
There are multiple sites in "object-file-convert.c" where we use the global `the_repository` variable, either explicitly or implicitly by using `the_hash_algo`. All of these callsites are transitively called from `convert_object_file()`, which indeed has no repo as input. Refactor the function so that it receives a repository as a parameter and pass it through to all internal functions to get rid of the dependency. Remove the `USE_THE_REPOSITORY_VARIABLE` define. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-24commit: avoid parent list buildup in clear_commit_marks_many()René Scharfe1-4/+4
clear_commit_marks_1() clears the marks of the first parent and its first parent and so on, and saves the higher numbered parents in a list for later. There is no benefit in keeping that list growing with each handled commit. Clear it after each run to reduce peak memory usage. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-27commit-reach: use `size_t` to track indices in `get_reachable_subset()`Patrick Steinhardt1-2/+2
Similar as with the preceding commit, adapt `get_reachable_subset()` so that it tracks array indices via `size_t` instead of using signed integers to fix a couple of -Wsign-compare warnings. Adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-23Merge branch 'ps/build-sign-compare'Junio C Hamano1-2/+1
Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-15Merge branch 'bf/explicit-config-set-in-advice-messages'Junio C Hamano1-1/+1
The advice messages now tell the newer 'git config set' command to set the advice.token configuration variable to squelch a message. * bf/explicit-config-set-in-advice-messages: advice: suggest using subcommand "git config set"
2024-12-06global: trivial conversions to fix `-Wsign-compare` warningsPatrick Steinhardt1-3/+1
We have a bunch of loops which iterate up to an unsigned boundary using a signed index, which generates warnigs because we compare a signed and unsigned value in the loop condition. Address these sites for trivial cases and enable `-Wsign-compare` warnings for these code units. This patch only adapts those code units where we can drop the `DISABLE_SIGN_COMPARE_WARNINGS` macro in the same step. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt1-0/+1
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06advice: suggest using subcommand "git config set"Bence Ferdinandy1-1/+1
The advice message currently suggests using "git config advice..." to disable advice messages, but since 00bbdde141 (builtin/config: introduce "set" subcommand, 2024-05-06) we have the "set" subcommand for config. Since using the subcommand is more in-line with the modern interface, any advice should be promoting its usage. Change the disable advice message to use the subcommand instead. Change all uses of "git config advice" in the tests to use the subcommand. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-25commit: avoid leaking already-saved bufferJeff King1-1/+2
When we parse a commit via repo_parse_commit_internal(), if save_commit_buffer is set we'll stuff the buffer of the object contents into a cache, overwriting any previous value. This can result in a leak of that previously cached value, though it's rare in practice. If we have a value in the cache it would have come from a previous parse, and during that parse we'd set the object.parsed flag, causing any subsequent parse attempts to exit without doing any work. But it's possible to "unparse" a commit, which we do when registering a commit graft. And since shallow fetches are implemented using grafts, the leak is triggered in practice by t5539. There are a number of possible ways to address this: 1. the unparsing function could clear the cached commit buffer, too. I think this would work for the case I found, but I'm not sure if there are other ways to end up in the same state (an unparsed commit with an entry in the commit buffer cache). 2. when we parse, we could check the buffer cache and prefer it to reading the contents from the object database. In theory the contents of a particular sha1 are immutable, but the code in question is violating the immutability with grafts. So this approach makes me a bit nervous, although I think it would work in practice (the grafts are applied to what we parse, but we still retain the original contents). 3. We could realize the cache is already populated and discard its contents before overwriting. It's possible some other code could be holding on to a pointer to the old cache entry (and we'd introduce a use-after-free), but I think the risk of that is relatively low. 4. The reverse of (3): when the cache is populated, don't bother saving our new copy. This is perhaps a little weird, since we'll have just populated the commit struct based on a different buffer. But the two buffers should be the same, even in the presence of grafts (as in (2) above). I went with option 4. It addresses the leak directly and doesn't carry any risk of breaking other assumptions. And it's the same technique used by parse_object_buffer() for this situation, though I'm not sure when it would even come up there. The extra safety has been there since bd1e17e245 (Make "parse_object()" also fill in commit message buffer data., 2005-05-25). This lets us mark t5539 as leak-free. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-23Merge branch 'ps/environ-wo-the-repository'Junio C Hamano1-2/+2
Code clean-up. * ps/environ-wo-the-repository: (21 commits) environment: stop storing "core.notesRef" globally environment: stop storing "core.warnAmbiguousRefs" globally environment: stop storing "core.preferSymlinkRefs" globally environment: stop storing "core.logAllRefUpdates" globally refs: stop modifying global `log_all_ref_updates` variable branch: stop modifying `log_all_ref_updates` variable repo-settings: track defaults close to `struct repo_settings` repo-settings: split out declarations into a standalone header environment: guard state depending on a repository environment: reorder header to split out `the_repository`-free section environment: move `set_git_dir()` and related into setup layer environment: make `get_git_namespace()` self-contained environment: move object database functions into object layer config: make dependency on repo in `read_early_config()` explicit config: document `read_early_config()` and `read_very_early_config()` environment: make `get_git_work_tree()` accept a repository environment: make `get_graft_file()` accept a repository environment: make `get_index_file()` accept a repository environment: make `get_object_directory()` accept a repository environment: make `get_git_common_dir()` accept a repository ...
2024-09-20Merge branch 'ps/leakfixes-part-6'Junio C Hamano1-16/+7
More leakfixes. * ps/leakfixes-part-6: (22 commits) builtin/repack: fix leaking keep-pack list merge-ort: fix two leaks when handling directory rename modifications match-trees: fix leaking prefixes in `shift_tree()` builtin/fmt-merge-msg: fix leaking buffers builtin/grep: fix leaking object context builtin/pack-objects: plug leaking list of keep-packs builtin/repack: fix leaking line buffer when packing promisors negotiator/skipping: fix leaking commit entries shallow: fix leaking members of `struct shallow_info` shallow: free grafts when unregistering them object: clear grafts when clearing parsed object pool gpg-interface: fix misdesigned signing key interfaces send-pack: fix leaking push cert nonce remote: fix leak in reachability check of a remote-tracking ref remote: fix leaking tracking refs builtin/submodule--helper: fix leaking refs on push-check submodule: fix leaking fetch task data upload-pack: fix leaking child process data on reachability checks builtin/push: fix leaking refspec query result send-pack: fix leaking common object IDs ...
2024-09-12environment: make `get_graft_file()` accept a repositoryPatrick Steinhardt1-2/+2
The `get_graft_file()` function retrieves the path to the graft file of `the_repository`. Make it accept a `struct repository` such that it can work on arbitrary repositories and make it part of the repository subsystem. This reduces our reliance on `the_repository` and clarifies scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05object: clear grafts when clearing parsed object poolPatrick Steinhardt1-13/+1
We do not clear grafts part of the parsed object pool when clearing the pool itself, which can lead to memory leaks when a repository is being cleared. Fix this by moving `reset_commit_grafts()` into "object.c" and making it part of the `struct parsed_object_pool` interface such that we can call it from `parsed_object_pool_clear()`. Adapt `parsed_object_pool_new()` to take and store a reference to its owning repository, which is needed by `unparse_commit()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05gpg-interface: fix misdesigned signing key interfacesPatrick Steinhardt1-3/+6
The interfaces to retrieve signing keys and their IDs are misdesigned as they return string constants even though they indeed allocate memory, which leads to memory leaks. Refactor the code to instead always return allocated strings and let the callers free them accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-26Merge branch 'ds/for-each-ref-is-base'Junio C Hamano1-1/+7
'git for-each-ref' learned a new "--format" atom to find the branch that the history leading to a given commit "%(is-base:<commit>)" is likely based on. * ds/for-each-ref-is-base: p1500: add is-base performance tests for-each-ref: add 'is-base' token commit: add gentle reference lookup method commit-reach: add get_branch_base_for_tip
2024-08-14commit: add gentle reference lookup methodDerrick Stolee1-1/+7
The lookup_commit_reference_by_name() method uses lookup_commit_reference() without an option to use lookup_commit_reference_gently(). Create a gentle version of the method so it can be used in locations where non-commits may be found but error messages should be silenced. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-13hooks: remove implicit dependency on `the_repository`Patrick Steinhardt1-1/+1
We implicitly depend on `the_repository` in our hook subsystem because we use `strbuf_git_path()` to compute hook paths. Remove this dependency by accepting a `struct repository` as parameter instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-08Merge branch 'ps/leakfixes-more'Junio C Hamano1-15/+13
More memory leaks have been plugged. * ps/leakfixes-more: (29 commits) builtin/blame: fix leaking ignore revs files builtin/blame: fix leaking prefixed paths blame: fix leaking data for blame scoreboards line-range: plug leaking find functions merge: fix leaking merge bases builtin/merge: fix leaking `struct cmdnames` in `get_strategy()` sequencer: fix memory leaks in `make_script_with_merges()` builtin/clone: plug leaking HEAD ref in `wanted_peer_refs()` apply: fix leaking string in `match_fragment()` sequencer: fix leaking string buffer in `commit_staged_changes()` commit: fix leaking parents when calling `commit_tree_extended()` config: fix leaking "core.notesref" variable rerere: fix various trivial leaks builtin/stash: fix leak in `show_stash()` revision: free diff options builtin/log: fix leaking commit list in git-cherry(1) merge-recursive: fix memory leak when finalizing merge builtin/merge-recursive: fix leaking object ID bases builtin/difftool: plug memory leaks in `run_dir_diff()` object-name: free leaking object contexts ...
2024-07-02Merge branch 'ps/use-the-repository'Junio C Hamano1-0/+2
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help transition the codebase to rely less on the availability of the singleton the_repository instance. * ps/use-the-repository: hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` t/helper: remove dependency on `the_repository` in "proc-receive" t/helper: fix segfault in "oid-array" command without repository t/helper: use correct object hash in partial-clone helper compat/fsmonitor: fix socket path in networked SHA256 repos replace-object: use hash algorithm from passed-in repository protocol-caps: use hash algorithm from passed-in repository oidset: pass hash algorithm when parsing file http-fetch: don't crash when parsing packfile without a repo hash-ll: merge with "hash.h" refs: avoid include cycle with "repository.h" global: introduce `USE_THE_REPOSITORY_VARIABLE` macro hash: require hash algorithm in `empty_tree_oid_hex()` hash: require hash algorithm in `is_empty_{blob,tree}_oid()` hash: make `is_null_oid()` independent of `the_repository` hash: convert `oidcmp()` and `oideq()` to compare whole hash global: ensure that object IDs are always padded hash: require hash algorithm in `oidread()` and `oidclr()` hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
2024-06-27Merge branch 'rs/remove-unused-find-header-mem'Junio C Hamano1-14/+2
Code clean-up. * rs/remove-unused-find-header-mem: commit: remove find_header_mem()
2024-06-20commit: remove find_header_mem()René Scharfe1-14/+2
cfc5cf428b (receive-pack.c: consolidate find header logic, 2022-01-06) introduced find_header_mem() and turned find_commit_header() into a thin wrapper. Since then, the latter has become the last remaining caller of the former. Remove it to restore find_commit_header() to the state before cfc5cf428b, get rid of a strlen(3) call and resolve a NEEDSWORK note in the process. Signed-off-by: René Scharfe <l.s.r@web.de> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt1-0/+2
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11merge: fix leaking merge basesPatrick Steinhardt1-1/+1
When calling either the recursive or the ORT merge machineries we need to provide a list of merge bases. The ownership of that parameter is then implicitly transferred to the callee, which is somewhat fishy. Furthermore, that list may leak in some cases where the merge machinery runs into an error, thus causing a memory leak. Refactor the code such that we stop transferring ownership. Instead, the merge machinery will now create its own local copies of the passed in list as required if they need to modify the list. Free the list at the callsites as required. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11commit: fix leaking parents when calling `commit_tree_extended()`Patrick Steinhardt1-14/+12
When creating commits via `commit_tree_extended()`, the caller passes in a string list of parents. This call implicitly transfers ownership of that list to the function, which is quite surprising to begin with. But to make matters worse, `commit_tree_extended()` doesn't even bother to free the list of parents in error cases. The result is a memory leak, and one that the caller cannot fix by themselves because they do not know whether parts of the string list have already been released. Refactor the code such that callers can keep ownership of the list of parents, which is getting indicated by parameter being a constant pointer now. Free the lists at the calling site and add a common exit path to those sites as required. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07cocci: apply rules to rewrite callers of "refs" interfacesPatrick Steinhardt1-1/+2
Apply the rules that rewrite callers of "refs" interfaces to explicitly pass `struct ref_store`. The resulting patch has been applied with the `--whitespace=fix` option. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-05Merge branch 'jk/core-comment-string'Junio C Hamano1-1/+2
core.commentChar used to be limited to a single byte, but has been updated to allow an arbitrary multi-byte sequence. * jk/core-comment-string: config: add core.commentString config: allow multi-byte core.commentChar environment: drop comment_line_char compatibility macro wt-status: drop custom comment-char stringification sequencer: handle multi-byte comment characters when writing todo list find multi-byte comment chars in unterminated buffers find multi-byte comment chars in NUL-terminated strings prefer comment_line_str to comment_line_char for printing strbuf: accept a comment string for strbuf_add_commented_lines() strbuf: accept a comment string for strbuf_commented_addf() strbuf: accept a comment string for strbuf_stripspace() environment: store comment_line_char as a string strbuf: avoid shadowing global comment_line_char name commit: refactor base-case of adjust_comment_line_char() strbuf: avoid static variables in strbuf_add_commented_lines() strbuf: simplify comment-handling in add_lines() helper config: forbid newline as core.commentChar
2024-03-28Merge branch 'eb/hash-transition'Junio C Hamano1-45/+176
Work to support a repository that work with both SHA-1 and SHA-256 hash algorithms has started. * eb/hash-transition: (30 commits) t1016-compatObjectFormat: add tests to verify the conversion between objects t1006: test oid compatibility with cat-file t1006: rename sha1 to oid test-lib: compute the compatibility hash so tests may use it builtin/ls-tree: let the oid determine the output algorithm object-file: handle compat objects in check_object_signature tree-walk: init_tree_desc take an oid to get the hash algorithm builtin/cat-file: let the oid determine the output algorithm rev-parse: add an --output-object-format parameter repository: implement extensions.compatObjectFormat object-file: update object_info_extended to reencode objects object-file-convert: convert commits that embed signed tags object-file-convert: convert commit objects when writing object-file-convert: don't leak when converting tag objects object-file-convert: convert tag objects when writing object-file-convert: add a function to convert trees between algorithms object: factor out parse_mode out of fast-import and tree-walk into in object.h cache: add a function to read an OID of a specific algorithm tag: sign both hashes commit: export add_header_signature to support handling signatures on tags ...
2024-03-12find multi-byte comment chars in unterminated buffersJeff King1-1/+2
As with the previous patch, we need to swap out single-byte matching for something like starts_with() to match all bytes of a multi-byte comment character. But for cases where the buffer is not NUL-terminated (and we instead have an explicit size or end pointer), it's not safe to use starts_with(), as it might walk off the end of the buffer. Let's introduce a new starts_with_mem() that does the same thing but also accepts the length of the "haystack" str and makes sure not to walk past it. Note that in most cases the existing code did not need a length check at all, since it was written in a way that knew we had at least one byte available (and that was all we checked). So I had to read each one to find the appropriate bounds. The one exception is sequencer.c's add_commented_lines(), where we can actually get rid of the length check. Just like starts_with(), our starts_with_mem() handles an empty haystack variable by not matching (assuming a non-empty prefix). A few notes on the implementation of starts_with_mem(): - it would be equally correct to take an "end" pointer (and indeed, many of the callers have this and have to subtract to come up with the length). I think taking a ptr/size combo is a more usual interface for our codebase, though, and has the added benefit that the function signature makes it harder to mix up the three parameters. - we could obviously build starts_with() on top of this by passing strlen(str) as the length. But it's possible that starts_with() is a relatively hot code path, and it should not pay that penalty (it can generally return an answer proportional to the size of the prefix, not the whole string). - it naively feels like xstrncmpz() should be able to do the same thing, but that's not quite true. If you pass the length of the haystack buffer, then strncmp() finds that a shorter prefix string is "less than" than the haystack, even if the haystack starts with the prefix. If you pass the length of the prefix, then you risk reading past the end of the haystack if it is shorter than the prefix. So I think we really do need a new function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29commit-reach(repo_get_merge_bases_many): pass on "missing commits" errorsJohannes Schindelin1-3/+4
The `merge_bases_many()` function was just taught to indicate parsing errors, and now the `repo_get_merge_bases_many()` function is aware of that, too. Naturally, there are a lot of callers that need to be adjusted now, too. Next stop: `repo_get_merge_bases_dirty()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-08Merge branch 'en/header-cleanup' into maint-2.43Junio C Hamano1-2/+0
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2024-02-08Merge branch 'la/trailer-cleanups' into maint-2.43Junio C Hamano1-1/+1
Code clean-up. * la/trailer-cleanups: trailer: use offsets for trailer_start/trailer_end trailer: find the end of the log message commit: ignore_non_trailer computes number of bytes to ignore
2024-01-08Merge branch 'en/header-cleanup'Junio C Hamano1-2/+0
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2024-01-02Merge branch 'la/trailer-cleanups'Junio C Hamano1-1/+1
Code clean-up. * la/trailer-cleanups: trailer: use offsets for trailer_start/trailer_end trailer: find the end of the log message commit: ignore_non_trailer computes number of bytes to ignore
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren1-2/+0
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-26commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by defaultPatrick Steinhardt1-1/+1
In 7a5d604443 (commit: detect commits that exist in commit-graph but not in the ODB, 2023-10-31), we have introduced a new object existence check into `repo_parse_commit_internal()` so that we do not parse commits via the commit-graph that don't have a corresponding object in the object database. This new check of course comes with a performance penalty, which the commit put at around 30% for `git rev-list --topo-order`. But there are in fact scenarios where the performance regression is even higher. The following benchmark against linux.git with a fully-build commit-graph: Benchmark 1: git.v2.42.1 rev-list --count HEAD Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms] Range (min … max): 650.2 ms … 666.0 ms 10 runs Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s] Range (min … max): 1.302 s … 1.361 s 10 runs Summary git.v2.42.1 rev-list --count HEAD ran 2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD While it's a noble goal to ensure that results are the same regardless of whether or not we have a potentially stale commit-graph, taking twice as much time is a tough sell. Furthermore, we can generally assume that the commit-graph will be updated by git-gc(1) or git-maintenance(1) as required so that the case where the commit-graph is stale should not at all be common. With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore the behaviour and thus performance previous to the mentioned commit. In order to not be inconsistent, also disable this behaviour by default in `lookup_commit_in_graph()`, where the object existence check has been introduced right at its inception via f559d6d45e (revision: avoid hitting packfiles when commits are in commit-graph, 2021-08-09). This results in another speedup in commands that end up calling this function, even though it's less pronounced compared to the above benchmark. The following has been executed in linux.git with ~1.2 million references: Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s] Range (min … max): 2.943 s … 2.949 s 3 runs Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s] Range (min … max): 2.704 s … 2.759 s 3 runs Summary GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran 1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted So whereas 7a5d604443 initially introduced the logic to start doing an object existence check in `repo_parse_commit_internal()` by default, the updated logic will now instead cause `lookup_commit_in_graph()` to stop doing the check by default. This behaviour continues to be tweakable by the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable. Note that this requires us to amend some tests to manually turn on the paranoid checks again. This is because we cause repository corruption by manually deleting objects which are part of the commit graph already. These circumstances shouldn't usually happen in repositories. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>