aboutsummaryrefslogtreecommitdiffstats
path: root/fetch-pack.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-09-24packfile: split up responsibilities of `reprepare_packed_git()`Patrick Steinhardt1-2/+2
In `reprepare_packed_git()` we perform a couple of operations: - We reload alternate object directories. - We clear the loose object cache. - We reprepare packfiles. While the logic is hosted in "packfile.c", it clearly reaches into other subsystems that aren't related to packfiles. Split up the responsibility and introduce `odb_reprepare()` which now becomes responsible for repreparing the whole object database. The existing `reprepare_packed_git()` function is refactored accordingly and only cares about reloading the packfile store now. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-28Merge branch 'jk/fetch-check-graph-objects-fix'Junio C Hamano1-1/+2
Under a race against another process that is repacking the repository, especially a partially cloned one, "git fetch" may mistakenly think some objects we do have are missing, which has been corrected. * jk/fetch-check-graph-objects-fix: fetch-pack: re-scan when double-checking graph objects
2025-08-25fetch-pack: re-scan when double-checking graph objectsJeff King1-1/+2
The fetch code tries to avoid asking the remote side for an object we already have. It does this by traversing recent commits reachable from our refs looking for matches. Commit 5d4cc78f72 (fetch-pack: die if in commit graph but not obj db, 2024-11-05) introduced an extra check there: if we think we have an object because it's in the commit graph, we double-check that we actually have it in our object database with a call to odb_has_object(). But that call does not pass any flags, and so the function won't call reprepared_packed_git() if it does not find the object. That opens us up to the usual race against some other process repacking the odb: 1. We scan the list of packs in objects/pack but haven't yet opened them. 2. Somebody else packs the object into a new pack (which we don't know about), and deletes the old pack it was in. 3. Our odb_has_object() calls tries to open that old pack, but finds it is gone. We declare that we don't have the object. And this causes us to erroneously complain and abort the fetch, thinking our commit-graph and object database are out of sync. Instead, we should pass HAS_OBJECT_RECHECK_PACKED, which will add a new step: 4. We re-scan the pack directory again, find the new pack, and locate the object. Often the fetch code tries to avoid these kinds of re-scans if it's likely that we won't have the object. If the other side has told us about object X and we want to know if we have it, we'll skip the re-scan (to avoid spending a lot of effort when there are many such objects). We can accept the racy false negative in that case because the worst case is that we ask the other side to send us the object. But this is not one of those cases. These are objects which are accessible from _our_ refs, and which we already found in the commit graph file. We should have them, and if we don't, we'll die() immediately. So the performance impact is negligible, and getting the right answer is important. There's no test here because it's inherently racy. In fact, I had trouble even developing a minimal test. The problem seen in the wild can be produced like this: # Any git.git mirror which supports partial clones; I think this # should work with any repo that contains submodules, but note that # $obj below is specific to this repo url=https://github.com/git/git.git # This is a commit that is not at the tip of any branches (so after # we have it, we'll still have some commits to fetch). obj=cf6f63ea6bf35173e02e18bdc6a4ba41288acff9 git init git fetch --filter=tree:0 $url $obj:refs/heads/foo git checkout foo git commit-graph write --reachable git fetch $url What happens here is that the initial fetch grabs that older commit (and its ancestors) but no trees or blobs, and the subsequent checkout grabs the necessary trees and blobs just for that commit. The final fetch spawns a long sequence of child fetches due to fetch_submodules(), which wants to check whether there have been any gitlink modifications which should trigger a fetch of the related submodule (we'll leave aside the irony that we did not even check out any submodules yet). That series of fetches causes us to accumulate packs, which eventually triggers background maintenance to run. That repacks all-into-one, and the pack containing $obj goes away in favor of a new pack. And then the fetch eventually fails with: fatal: You are attempting to fetch cf6f63ea6bf35173e02e18bdc6a4ba41288acff9, which is in the commit graph file but not in the object database. In the scenario above, the race becomes likely because of the long series of quick fetches. But I _think_ the bug is independent of partial clones entirely, and you could run into the same thing with a single fetch, some other process running "git repack" simultaneously, and a bit of bad luck. I haven't been able to reproduce, though. I'm not sure if that's because there's some mis-analysis above, or if the race window is just small enough that it's hard to trigger. At any rate, re-scanning here seems like an obviously correct thing to do with no downside, and it does fix the partial-clone case shown above. Reported-by: Дилян Палаузов <dilyan.palauzov@aegee.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-21Merge branch 'jc/string-list-split'Junio C Hamano1-1/+1
string_list_split*() family of functions have been extended to simplify common use cases. * jc/string-list-split: string-list: split-then-remove-empty can be done while splitting string-list: optionally omit empty string pieces in string_list_split*() diff: simplify parsing of diff.colormovedws string-list: optionally trim string pieces split by string_list_split*() string-list: unify string_list_split* functions string-list: align string_list_split() with its _in_place() counterpart string-list: report programming error with BUG
2025-08-04Merge branch 'ps/config-wo-the-repository'Junio C Hamano1-8/+8
The config API had a set of convenience wrapper functions that implicitly use the_repository instance; they have been removed and inlined at the calling sites. * ps/config-wo-the-repository: (21 commits) config: fix sign comparison warnings config: move Git config parsing into "environment.c" config: remove unused `the_repository` wrappers config: drop `git_config_set_multivar()` wrapper config: drop `git_config_get_multivar_gently()` wrapper config: drop `git_config_set_multivar_in_file_gently()` wrapper config: drop `git_config_set_in_file_gently()` wrapper config: drop `git_config_set()` wrapper config: drop `git_config_set_gently()` wrapper config: drop `git_config_set_in_file()` wrapper config: drop `git_config_get_bool()` wrapper config: drop `git_config_get_ulong()` wrapper config: drop `git_config_get_int()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string_multi()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get()` wrapper config: drop `git_config_clear()` wrapper ...
2025-08-02string-list: align string_list_split() with its _in_place() counterpartJunio C Hamano1-1/+1
The string_list_split_in_place() function was updated by 52acddf3 (string-list: multi-delimiter `string_list_split_in_place()`, 2023-04-24) to take more than one delimiter characters, hoping that we can later use it to replace our uses of strtok(). We however did not make a matching change to the string_list_split() function, which is very similar. Before giving both functions more features in future commits, allow string_list_split() to also take more than one delimiter characters to make them closer to each other. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-28Merge branch 'rs/pop-recent-commit-with-prio-queue'Junio C Hamano1-5/+8
The pop_most_recent_commit() function can have quite expensive worst case performance characteristics, which has been optimized by using prio-queue data structure. * rs/pop-recent-commit-with-prio-queue: commit: use prio_queue_replace() in pop_most_recent_commit() prio-queue: add prio_queue_replace() commit: convert pop_most_recent_commit() to prio_queue
2025-07-23config: drop `git_config_get_bool()` wrapperPatrick Steinhardt1-4/+4
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_bool()`. All callsites are adjusted so that they use `repo_config_get_bool(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config_get_int()` wrapperPatrick Steinhardt1-2/+2
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_int()`. All callsites are adjusted so that they use `repo_config_get_int(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config_get_string()` wrapperPatrick Steinhardt1-1/+1
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_string()`. All callsites are adjusted so that they use `repo_config_get_string(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config()` wrapperPatrick Steinhardt1-1/+1
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config()`. All callsites are adjusted so that they use `repo_config(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-22commit: convert pop_most_recent_commit() to prio_queueRené Scharfe1-5/+8
pop_most_recent_commit() calls commit_list_insert_by_date() for parent commits, which is itself called in a loop. This can lead to quadratic complexity if there are many merges. Replace the commit_list with a prio_queue to ensure logarithmic worst case complexity and convert all three users. Add a performance test that exercises one of them using a pathological history that consists of 50% merges and 50% root commits to demonstrate the speedup: Test v2.50.1 HEAD ---------------------------------------------------------------------- 1501.2: rev-parse ':/65535' 2.48(2.47+0.00) 0.20(0.19+0.00) -91.9% Alas, sane histories don't benefit from the conversion much, and traversing Git's own history takes a 1% performance hit on my machine: $ hyperfine -w3 -L git ./git_2.50.1,./git '{git} rev-parse :/^Initial.revision' Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision Time (mean ± σ): 1.071 s ± 0.004 s [User: 1.052 s, System: 0.017 s] Range (min … max): 1.067 s … 1.078 s 10 runs Benchmark 2: ./git rev-parse :/^Initial.revision Time (mean ± σ): 1.079 s ± 0.003 s [User: 1.060 s, System: 0.017 s] Range (min … max): 1.074 s … 1.083 s 10 runs Summary ./git_2.50.1 rev-parse :/^Initial.revision ran 1.01 ± 0.00 times faster than ./git rev-parse :/^Initial.revision Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-21Merge branch 'bc/use-sha256-by-default-in-3.0'Junio C Hamano1-1/+1
Prepare to flip the default hash function to SHA-256. * bc/use-sha256-by-default-in-3.0: Enable SHA-256 by default in breaking changes mode help: add a build option for default hash t5300: choose the built-in hash outside of a repo t4042: choose the built-in hash outside of a repo t1007: choose the built-in hash outside of a repo t: default to compile-time default hash if not set setup: use the default algorithm to initialize repo format Use legacy hash for legacy formats builtin: use default hash when outside a repository hash: add a constant for the legacy hash algorithm hash: add a constant for the default hash algorithm
2025-07-17Merge branch 'bc/use-sha256-by-default-in-3.0' into ps/config-wo-the-repositoryJunio C Hamano1-1/+1
* bc/use-sha256-by-default-in-3.0: Enable SHA-256 by default in breaking changes mode help: add a build option for default hash t5300: choose the built-in hash outside of a repo t4042: choose the built-in hash outside of a repo t1007: choose the built-in hash outside of a repo t: default to compile-time default hash if not set setup: use the default algorithm to initialize repo format Use legacy hash for legacy formats builtin: use default hash when outside a repository hash: add a constant for the legacy hash algorithm hash: add a constant for the default hash algorithm
2025-07-01Use legacy hash for legacy formatsbrian m. carlson1-1/+1
We have a large variety of data formats and protocols where no hash algorithm was defined and the default was assumed to always be SHA-1. Instead of explicitly stating SHA-1, let's use the constant to represent the legacy hash algorithm (which is still SHA-1) so that it's clear for documentary purposes that it's a legacy fallback option and not an intentional choice to use SHA-1. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `has_object()`Patrick Steinhardt1-4/+4
Rename `has_object()` to `odb_has_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `oid_object_info()`Patrick Steinhardt1-2/+2
Rename `oid_object_info()` to `odb_read_object_info()` as well as their `_extended()` variant to match other functions related to the object database and our modern coding guidelines. Introduce compatibility wrappers so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: get rid of `the_repository` in `for_each()` functionsPatrick Steinhardt1-1/+2
There are a couple of iterator-style functions that execute a callback for each instance of a given set, all of which currently depend on `the_repository`. Refactor them to instead take an object database as parameter so that we can get rid of this dependency. Rename the functions accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt1-1/+1
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29treewide: convert users of `repo_has_object_file()` to `has_object()`Patrick Steinhardt1-4/+3
As the comment of `repo_has_object_file()` and its `_with_flags()` variant tells us, these functions are considered to be deprecated in favor of `has_object()`. There are a couple of slight benefits in favor of the replacement: - The new function has a short-and-sweet name. - More explicit defaults: `has_object()` doesn't fetch missing objects via promisor remotes, and neither does it reload packfiles if an object wasn't found by default. This ensures that it becomes immediately obvious when a simple object existence check may result in expensive actions. Most importantly though, it is confusing that we have two sets of functions that ultimately do the same thing, but with different defaults. Start sunsetting `repo_has_object_file()` and its `_with_flags()` sibling by replacing all callsites with `has_object()`: - `repo_has_object_file(...)` is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED | HAS_OBJECT_FETCH_PROMISOR)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT)` is equivalent to `has_object(..., 0)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_SKIP_FETCH_OBJECT)` is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)` is equivalent to `has_object(..., HAS_OBJECT_FETCH_PROMISOR)`. The replacements should be functionally equivalent. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt1-1/+1
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03Merge branch 'kn/pack-write-with-reduced-globals'Junio C Hamano1-1/+3
Code clean-up. * kn/pack-write-with-reduced-globals: pack-write: pass hash_algo to internal functions pack-write: pass hash_algo to `write_rev_file()` pack-write: pass hash_algo to `write_idx_file()` pack-write: pass repository to `index_pack_lockfile()` pack-write: pass hash_algo to `fixup_pack_header_footer()`
2025-01-21pack-write: pass repository to `index_pack_lockfile()`Karthik Nayak1-1/+3
The `index_pack_lockfile()` function uses the global `the_repository` variable to access the repository. To avoid global variable usage, pass the repository from the layers above. Altough the layers above could have access to the repository internally, simply pass in `the_repository`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-07fsck: reject misconfigured fsck.skipListJustin Tobler1-1/+1
In Git, fsck operations can ignore known broken objects via the `fsck.skipList` configuration. This option expects a path to a file with the list of object names. When the configuration is specified without a path, an error message is printed, but the command continues as if the configuration was not set. Configuring `fsck.skipList` without a value is a misconfiguration so config parsing should be more strict and reject it. Update `git_fsck_config()` to no longer ignore misconfiguration of `fsck.skipList`. The same behavior is also present for `fetch.fsck.skipList` and `receive.fsck.skipList` so the configuration parsers for these are updated to ensure the related operations remain consistent. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-23Merge branch 'ps/build-sign-compare'Junio C Hamano1-0/+1
Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt1-0/+1
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-28fetch-pack: split out fsck config parsingJustin Tobler1-8/+18
When `fetch_pack_config()` is invoked, fetch-pack configuration is parsed from the config. As part of this operation, fsck message severity configuration is assigned to the `fsck_msg_types` global variable. This is optionally used to configure the downstream git-index-pack(1) when the `--strict` option is specified. The same parsed fsck message severity configuration is also needed outside of fetch-pack. Instead of exposing/relying on the existing global state, split out the fsck config parsing logic into `fetch_pack_fsck_config()` and expose it. In a subsequent commit, this is used to provide fsck configuration when invoking `unbundle()`. For `fetch_pack_fsck_config()` to discern between errors and unhandled config variables, the return code when `git_config_path()` errors is changed to a different value also indicating success. This frees up the previous return code to now indicate the provided config variable was unhandled. The behavior remains functionally the same. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-05fetch-pack: die if in commit graph but not obj dbJonathan Tan1-3/+16
When fetching, there is a step in which sought objects are first checked against the local repository; only objects that are not in the local repository are then fetched. This check first looks up the commit graph file, and returns "present" if the object is in there. However, the action of first looking up the commit graph file is not done everywhere in Git, especially if the type of the object at the time of lookup is not known. This means that in a repo corruption situation, a user may encounter an "object missing" error, attempt to fetch it, and still encounter the same error later when they reattempt their original action, because the object is present in the commit graph file but not in the object DB. Therefore, make it a fatal error when this occurs. (Note that we cannot proceed to include this object in the list of objects to be fetched without changing at least the fetch negotiation code: what would happen is that the client will send "want X" and "have X" and when I tested at $DAYJOB with a work server that uses JGit, the server reasonably returned an empty packfile. And changing the fetch negotiation code to only use the object DB when deciding what to report as "have" would be an unnecessary slowdown, I think.) This was discovered when a lazy fetch of a missing commit completed with nothing actually fetched, and the writing of the commit graph file after every fetch then attempted to read said missing commit, triggering a lazy fetch of said missing commit, resulting in an infinite loop with no user-visible indication (until they check the list of processes running on their computer). With this fix, there is no infinite loop. Note that although the repo corruption we discovered was caused by a bug in GC in a partial clone, the behavior that this patch teaches Git to warn about applies to any repo with commit graph enabled and with a missing commit, whether it is a partial clone or not. t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in lookup_commit_in_graph(), 2022-07-01), tests that an interaction between fetch and the commit graph does not cause an infinite loop. This patch changes the exit code in that situation, so that test had to be changed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-05Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"Jonathan Tan1-18/+7
This reverts commit a6e65fb39caf18259c660c1c7910d5bf80bc15cb. This revert simplifies the next patch in this patch set. The commit message of that commit mentions that the new function "will be used for the bundle-uri client in a subsequent commit", but it seems that eventually it wasn't used. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-23Merge branch 'ps/environ-wo-the-repository'Junio C Hamano1-1/+1
Code clean-up. * ps/environ-wo-the-repository: (21 commits) environment: stop storing "core.notesRef" globally environment: stop storing "core.warnAmbiguousRefs" globally environment: stop storing "core.preferSymlinkRefs" globally environment: stop storing "core.logAllRefUpdates" globally refs: stop modifying global `log_all_ref_updates` variable branch: stop modifying `log_all_ref_updates` variable repo-settings: track defaults close to `struct repo_settings` repo-settings: split out declarations into a standalone header environment: guard state depending on a repository environment: reorder header to split out `the_repository`-free section environment: move `set_git_dir()` and related into setup layer environment: make `get_git_namespace()` self-contained environment: move object database functions into object layer config: make dependency on repo in `read_early_config()` explicit config: document `read_early_config()` and `read_very_early_config()` environment: make `get_git_work_tree()` accept a repository environment: make `get_graft_file()` accept a repository environment: make `get_index_file()` accept a repository environment: make `get_object_directory()` accept a repository environment: make `get_git_common_dir()` accept a repository ...
2024-09-20Merge branch 'ps/leakfixes-part-6'Junio C Hamano1-0/+3
More leakfixes. * ps/leakfixes-part-6: (22 commits) builtin/repack: fix leaking keep-pack list merge-ort: fix two leaks when handling directory rename modifications match-trees: fix leaking prefixes in `shift_tree()` builtin/fmt-merge-msg: fix leaking buffers builtin/grep: fix leaking object context builtin/pack-objects: plug leaking list of keep-packs builtin/repack: fix leaking line buffer when packing promisors negotiator/skipping: fix leaking commit entries shallow: fix leaking members of `struct shallow_info` shallow: free grafts when unregistering them object: clear grafts when clearing parsed object pool gpg-interface: fix misdesigned signing key interfaces send-pack: fix leaking push cert nonce remote: fix leak in reachability check of a remote-tracking ref remote: fix leaking tracking refs builtin/submodule--helper: fix leaking refs on push-check submodule: fix leaking fetch task data upload-pack: fix leaking child process data on reachability checks builtin/push: fix leaking refspec query result send-pack: fix leaking common object IDs ...
2024-09-12environment: make `get_object_directory()` accept a repositoryPatrick Steinhardt1-1/+1
The `get_object_directory()` function retrieves the path to the object directory for `the_repository`. Make it accept a `struct repository` such that it can work on arbitrary repositories and make it part of the repository subsystem. This reduces our reliance on `the_repository` and clarifies scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05drop trailing newline from warning/error/die messagesJeff King1-1/+1
Our error reporting routines append a trailing newline, and the strings we pass to them should not include them (otherwise we get an extra blank line after the message). These cases were all found by looking at the results of: git grep -P '[^_](error|error_errno|warning|die|die_errno)\(.*\\n"[,)]' '*.c' Note that we _do_ sometimes include a newline in the middle of such messages, to create multiline output (hence our grep matching "," or ")" after we see the newline, so we know we're at the end of the string). It's possible that one or more of these cases could intentionally be including a blank line at the end, but having looked at them all manually, I think these are all just mistakes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05fetch-pack: fix memory leaks on fetch negotiationPatrick Steinhardt1-0/+3
We leak both the `nt_object_array` and `negotiator` structures in `negotiate_using_fetch()`. Plug both of these leaks. These leaks were exposed by t5516, but fixing them is not sufficient to make the whole test suite leak free. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-09refs: add referent to each_ref_fnJohn Cai1-0/+2
Add a parameter to each_ref_fn so that callers to the ref APIs that use this function as a callback can have acess to the unresolved value of a symbolic ref. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-08Merge branch 'xx/bundie-uri-fixes'Junio C Hamano1-6/+11
When bundleURI interface fetches multiple bundles, Git failed to take full advantage of all bundles and ended up slurping duplicated objects. * xx/bundie-uri-fixes: unbundle: extend object verification for fetches fetch-pack: expose fsckObjects configuration logic bundle-uri: verify oid before writing refs
2024-07-02Merge branch 'ps/use-the-repository'Junio C Hamano1-0/+2
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help transition the codebase to rely less on the availability of the singleton the_repository instance. * ps/use-the-repository: hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` t/helper: remove dependency on `the_repository` in "proc-receive" t/helper: fix segfault in "oid-array" command without repository t/helper: use correct object hash in partial-clone helper compat/fsmonitor: fix socket path in networked SHA256 repos replace-object: use hash algorithm from passed-in repository protocol-caps: use hash algorithm from passed-in repository oidset: pass hash algorithm when parsing file http-fetch: don't crash when parsing packfile without a repo hash-ll: merge with "hash.h" refs: avoid include cycle with "repository.h" global: introduce `USE_THE_REPOSITORY_VARIABLE` macro hash: require hash algorithm in `empty_tree_oid_hex()` hash: require hash algorithm in `is_empty_{blob,tree}_oid()` hash: make `is_null_oid()` independent of `the_repository` hash: convert `oidcmp()` and `oideq()` to compare whole hash global: ensure that object IDs are always padded hash: require hash algorithm in `oidread()` and `oidclr()` hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
2024-06-27Merge branch 'jk/fetch-pack-fsck-wo-lock-pack'Junio C Hamano1-1/+3
"git fetch-pack -k -k" without passing "--lock-pack" (which we never do ourselves) did not work at all, which has been corrected. * jk/fetch-pack-fsck-wo-lock-pack: fetch-pack: fix segfault when fscking without --lock-pack
2024-06-20fetch-pack: fix segfault when fscking without --lock-packJeff King1-1/+3
The fetch-pack internals have multiple options related to creating ".keep" lock-files for the received pack: - if args.lock_pack is set, then we tell index-pack to create a .keep file. In the fetch-pack plumbing command, this is triggered by passing "-k" twice. - if the caller passes in a pack_lockfiles string list, then we use it to record the path of the keep-file created by index-pack. We get that name by reading the stdout of index-pack. In the fetch-pack command, this is triggered by passing the (undocumented) --lock-pack option; without it, we pass in a NULL string list. So it's possible to ask index-pack to create the lock-file (using "-k -k") but not ask to record it (by avoiding "--lock-pack"). This worked fine until 5476e1efde (fetch-pack: print and use dangling .gitmodules, 2021-02-22), but now it causes a segfault. Before that commit, if pack_lockfiles was NULL, we wouldn't bother reading the output from index-pack at all. But since that commit, index-pack may produce extra output if we asked it to fsck. So even if nobody cares about the lockfile path, we still need to read it to skip to the output we do care about. We correctly check that we didn't get a NULL lockfile path (which can happen if we did not ask it to create a .keep file at all), but we missed the case where the lockfile path is not NULL (due to "-k -k") but the pack_lockfiles string_list is NULL (because nobody passed "--lock-pack"), and segfault trying to add to the NULL string-list. We can fix this by skipping the append to the string list when either the value or the list is NULL. In that case we must also free the lockfile path to avoid leaking it when it's non-NULL. Nobody noticed the bug for so long because the transport code used by "git fetch" always passes in a pack_lockfiles pointer, and remote-curl (the main user of the fetch-pack plumbing command) always passes --lock-pack. Reported-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20fetch-pack: expose fsckObjects configuration logicXing Xin1-6/+11
Currently, we can use "transfer.fsckObjects" and the more specific "fetch.fsckObjects" to control checks for broken objects in received packs during fetches. However, these configurations were only acknowledged by `fetch-pack.c:get_pack` and did not take effect in direct bundle fetches or fetches with _bundle-uri_ enabled. This commit exposes the fetch-then-transfer configuration logic by adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. This new function is used to replace the assignment for `fsck_objects` in `fetch-pack.c:get_pack`. In the next commit, this function will also be used to extend fsck support for bundle-involved fetches. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt1-0/+2
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06Merge branch 'ps/leakfixes'Junio C Hamano1-2/+2
Leakfixes. * ps/leakfixes: builtin/mv: fix leaks for submodule gitfile paths builtin/mv: refactor to use `struct strvec` builtin/mv duplicate string list memory builtin/mv: refactor `add_slash()` to always return allocated strings strvec: add functions to replace and remove strings submodule: fix leaking memory for submodule entries commit-reach: fix memory leak in `ahead_behind()` builtin/credential: clear credential before exit config: plug various memory leaks config: clarify memory ownership in `git_config_string()` builtin/log: stop using globals for format config builtin/log: stop using globals for log config convert: refactor code to clarify ownership of check_roundtrip_encoding diff: refactor code to clarify memory ownership of prefixes config: clarify memory ownership in `git_config_pathname()` http: refactor code to clarify memory ownership checkout: clarify memory ownership in `unique_tracking_name()` strbuf: fix leak when `appendwholeline()` fails with EOF transport-helper: fix leaking helper name
2024-05-27config: clarify memory ownership in `git_config_pathname()`Patrick Steinhardt1-2/+2
The out parameter of `git_config_pathname()` is a `const char **` even though we transfer ownership of memory to the caller. This is quite misleading and has led to many memory leaks all over the place. Adapt the parameter to instead be `char **`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-23Merge branch 'dg/fetch-pack-code-cleanup'Junio C Hamano1-5/+0
Code clean-up to remove an unused struct definition. * dg/fetch-pack-code-cleanup: fetch-pack: remove unused 'struct loose_object_iter'
2024-05-13fetch-pack: remove unused 'struct loose_object_iter'Dr. David Alan Gilbert1-5/+0
'struct loose_object_iter' in fetch-pack.c is unused since commit 97b2fa08 (fetch-pack: drop custom loose object cache, 2018-11-12). Remove it. Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07cocci: apply rules to rewrite callers of "refs" interfacesPatrick Steinhardt1-2/+4
Apply the rules that rewrite callers of "refs" interfaces to explicitly pass `struct ref_store`. The resulting patch has been applied with the `--whitespace=fix` option. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-30Merge branch 'sd/negotiate-trace-fix'Junio C Hamano1-1/+1
Tracing fix. * sd/negotiate-trace-fix: push: region_leave trace for negotiate_using_fetch
2024-01-08Merge branch 'en/header-cleanup'Junio C Hamano1-2/+0
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2024-01-03push: region_leave trace for negotiate_using_fetchSam Delmerico1-1/+1
There were two region_enter events for negotiate_using_fetch instead of one enter and one leave. This commit replaces the second region_enter event with a region_leave. Signed-off-by: Sam Delmerico <delmerico@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren1-2/+0
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>