aboutsummaryrefslogtreecommitdiffstats
path: root/object-file.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-07-16object-file: get rid of `the_repository` in index-related functionsPatrick Steinhardt1-3/+3
Both `index_fd()` and `index_path()` still use `the_repository` even though they have a repository available via `struct index_state`. Adapt them so that they use the index' repository instead to get rid of this global dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16object-file: get rid of `the_repository` in `force_object_loose()`Patrick Steinhardt1-9/+9
The function `force_object_loose()` forces an object to become a loose object in case it only exists in its packed form. To do so it implicitly relies on `the_repository`. Refactor the function by passing a `struct odb_source` as parameter. While the check whether any such loose object exists already acts on the whole object database, writing the loose object happens in one specific source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16object-file: get rid of `the_repository` in `read_loose_object()`Patrick Steinhardt1-4/+5
The function `read_loose_object()` takes a path to an object file and tries to parse it. As such, the function does not depend on any specific object database but instead acts as an ODB-independent way to read a specific file. As such, all it needs as input is a repository so that we can derive repo settings and the hash algorithm. That repository isn't passed in as a parameter though, as we implicitly depend on the global `the_repository`. Refactor the function so that we pass in the repository as a parameter. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16object-file: get rid of `the_repository` in loose object iteratorsPatrick Steinhardt1-9/+9
The iterators for loose objects still rely on `the_repository`. Refactor them: - `for_each_loose_file_in_objdir()` is refactored so that the caller is now expected to pass an `odb_source` as parameter instead of the path to that source. Furthermore, it is renamed accordingly to `for_each_loose_file_in_source()`. - `for_each_loose_object()` is refactored to take in an object database now and calls the above function in a loop. This allows us to get rid of the global dependency. Adjust callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16object-file: remove declaration for `for_each_file_in_obj_subdir()`Patrick Steinhardt1-7/+7
The function `for_each_file_in_obj_subdir()` is declared in our headers, but it is not used anywhere else than in the corresponding code file itself. Drop the declaration and mark the function as file-local. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16object-file: inline `for_each_loose_file_in_objdir_buf()`Patrick Steinhardt1-23/+8
The function `for_each_loose_file_in_objdir_buf()` is declared in our headers, but it is not used anywhere else than in the corresponding code file itself. Drop the declaration and inline the function into its only caller. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16object-file: get rid of `the_repository` when writing objectsPatrick Steinhardt1-46/+50
The logic that writes loose objects still relies on `the_repository` to decide where exactly the object shall be written to. Refactor it so that the logic instead operates on a `struct odb_source` so that we can get rid of this global dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16odb: introduce `odb_write_object()`Patrick Steinhardt1-9/+9
We do not have a backend-agnostic way to write objects into an object database. While there is `write_object_file()`, this function is rather specific to the loose object format. Introduce `odb_write_object()` to plug this gap. For now, this function is a simple wrapper around `write_object_file()` and doesn't even use the passed-in object database yet. This will change in subsequent commits, where `write_object_file()` is converted so that it works on top of an `odb_source`. `odb_write_object()` will then become responsible for deciding which source an object shall be written to. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16loose: write loose objects map via their sourcePatrick Steinhardt1-3/+3
When a repository is configured to have a compatibility hash algorithm we keep track of object ID mappings for loose objects via the loose object map. This map simply maps an object ID of the actual hash to the object ID of the compatibility hash. This loose object map is an inherent property of the loose files backend and thus of one specific object source. Refactor the interfaces to reflect this by requiring a `struct odb_source` as input instead of a repository. This prepares for subsequent commits where we will refactor writing of loose objects to work on a `struct odb_source`, as well. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16object-file: get rid of `the_repository` in `finalize_object_file()`Patrick Steinhardt1-6/+8
We implicitly depend on `the_repository` when moving an object file into place in `finalize_object_file()`. Get rid of this global dependency by passing in a repository. Note that one might be pressed to inject an object database instead of a repository. But the function doesn't really care about the ODB at all. All it does is to move a file into place while checking whether there is any collision. As such, the functionality it provides is independent of the object database and only needs the repository as parameter so that it can adjust permissions of the file we are about to finalize. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16object-file: get rid of `the_repository` in `loose_object_info()`Patrick Steinhardt1-1/+1
While `loose_object_info()` already accepts a repository as parameter we still have one callsite in there where we use `the_repository` to figure out the hash algorithm. Use the passed-in repository instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16object-file: get rid of `the_repository` when freshening objectsPatrick Steinhardt1-11/+11
We implicitly depend on `the_repository` when freshening either loose or packed objects. Refactor these functions to instead accept an object database as input so that we can get rid of the global dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16object-file: inline `check_and_freshen()` functionsPatrick Steinhardt1-28/+13
The `check_and_freshen()` functions are only used by a single caller now. Inline them into `freshen_loose_object()`. While at it, rename `check_and_freshen_odb()` to `_source()` to reflect that it works on a single object source instead of on the whole database. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16object-file: get rid of `the_repository` in `has_loose_object()`Patrick Steinhardt1-9/+7
We implicitly depend on `the_repository` in `has_loose_object()`. Refactor the function to accept an `odb_source` as input that should be checked for such a loose object. This refactoring changes semantics of the function to not check the whole object database for such a loose object anymore, but instead we now only check that single source. Existing callers thus need to loop through all sources manually now. While this change may seem illogical at first, whether or not an object exists in a specific format should be answered by the source using that format. As such, we can eventually convert this into a generic function `odb_source_has_object()` that simply checks whether a given object exists in an object source. And as we will know about the format that any given source uses it allows us to derive whether the object exists in a given format. This change also makes `has_loose_object_nonlocal()` obsolete. The only caller of this function is adapted so that it skips the primary object source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16object-file: stop using `the_hash_algo`Patrick Steinhardt1-16/+24
There are a couple of users of the `the_hash_algo` macro, which implicitly depends on `the_repository`. Adapt these callers to not do so anymore, either by deriving it from already-available context or by using `the_repository->hash_algo`. The latter variant doesn't yet help to remove the global dependency, but such users will be adapted in the following commits to not use `the_repository` anymore. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16object-file: fix -Wsign-compare warningsPatrick Steinhardt1-9/+6
There are some trivial -Wsign-compare warnings in "object-file.c". Fix them and drop the preprocessor define that disables those warnings. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `oid_object_info()`Patrick Steinhardt1-1/+1
Rename `oid_object_info()` to `odb_read_object_info()` as well as their `_extended()` variant to match other functions related to the object database and our modern coding guidelines. Introduce compatibility wrappers so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: get rid of `the_repository` when handling alternatesPatrick Steinhardt1-5/+5
The functions to manage alternates all depend on `the_repository`. Refactor them to accept an object database as a parameter and adjust all callers. The functions are renamed accordingly. Note that right now the situation is still somewhat weird because we end up using the object store path provided by the object store's repository anyway. Consequently, we could have instead passed in a pointer to the repository instead of passing in the pointer to the object store. This will be addressed in subsequent commits though, where we will start to use the path owned by the object store itself. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt1-1/+1
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename `object_directory` to `odb_source`Patrick Steinhardt1-40/+40
The `object_directory` structure is used as an access point for a single object directory like ".git/objects". While the structure isn't yet fully self-contained, the intent is for it to eventually contain all information required to access objects in one specific location. While the name "object directory" is a good fit for now, this will change over time as we continue with the agenda to make pluggable object databases a thing. Eventually, objects may not be accessed via any kind of directory at all anymore, but they could instead be backed by any kind of durable storage mechanism. While it seems quite far-fetched for now, it is thinkable that eventually this might even be some form of a database, for example. As such, the current name of this structure will become worse over time as we evolve into the direction of pluggable ODBs. Immediate next steps will start to carve out proper self-contained object directories, which requires us to pass in these object directories as parameters. Based on our modern naming schema this means that those functions should then be named after their subsystem, which means that we would start to bake the current name into the codebase more and more. Let's preempt this by renaming the structure. There have been a couple alternatives that were discussed: - `odb_backend` was discarded because it led to the association that one object database has a single backend, but the model is that one alternate has one backend. Furthermore, "backend" is more about the actual backing implementation and less about the high-level concept. - `odb_alternate` was discarded because it is a bit of a stretch to also call the main object directory an "alternate". Instead, pick `odb_source` as the new name. It makes it sufficiently clear that there can be multiple sources and does not cause confusion when mixed with the already-existing "alternate" terminology. In the future, this change allows us to easily introduce for example a `odb_files_source` and other format-specific implementations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16object-file: drop support for writing objects with unknown typesJeff King1-76/+5
Since "hash-object --literally" no longer supports objects with unknown types, there are now no callers of write_object_file_literally() and its helpers. Let's drop them to simplify the code. In particular, this gets rid of some ugly copy-and-paste code from write_object_file_literally(), which is a parallel implementation of write_object_file(). When the split was originally made, the two weren't that long, but commits like 63a6745a07 (object-file: update the loose object map when writing loose objects, 2023-10-01) ended up having to duplicate some tricky code. This patch drops all of that duplication and should make things less error-prone going forward. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16oid_object_info(): drop type_name strbufJeff King1-3/+1
We provide a mechanism for callers to get the object type as a raw string, rather than an object_type enum. This was in theory useful for returning types that are not representable in the enum, but we consider any such type to be an error, and there are no callers that use the strbuf anymore. Let's drop support to simplify the code a bit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16fsck: stop using object_info->type_name strbufJeff King1-3/+9
When fsck-ing a loose object, we use object_info's type_name strbuf to record the parsed object type as a string. For most objects this is redundant with the object_type enum, but it does let us report the string when we encounter an object with an unknown type (for which there is no matching enum value). There are a few downsides, though: 1. The code to report these cases is not actually robust. Since we did not pass a strbuf to unpack_loose_header(), we only retrieved types from headers up to 32 bytes. In longer cases, we'd simply say "object corrupt or missing". 2. This is the last caller that uses object_info's type_name strbuf support. It would be nice to refactor it so that we can simplify that code. 3. Likewise, we'll check the hash of the object using its unknown type (again, as long as that type is short enough). That depends on the hash_object_file_literally() code, which we'd eventually like to get rid of. So we can simplify things by bailing immediately in read_loose_object() when we encounter an unknown type. This has a few user-visible effects: a. Instead of producing a single line of error output like this: error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object is of unknown type 'bogus': .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 we'll now issue two lines (the first from read_loose_object() when we see the unparsable header, and the second from the fsck code, since we couldn't read the object): error: unable to parse type from header 'bogus 4' of .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object corrupt or missing: .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 This is a little more verbose, but this sort of error should be rare (such objects are almost impossible to work with, and cannot be transferred between repositories as they are not representable in packfiles). And as a bonus, reporting the broken header in full could help with debugging other cases (e.g., a header like "blob xyzzy\0" would fail in parsing the size, but previously we'd not have showed the offending bytes). b. An object with an unknown type will be reported as corrupt, without actually doing a hash check. Again, I think this is unlikely to matter in practice since such objects are totally unusable. We'll update one fsck test to match the new error strings. And we can remove another test that covered the case of an object with an unknown type _and_ a hash corruption. Since we'll skip the hash check now in this case, the test is no longer interesting. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flagJeff King1-38/+7
Since cat-file dropped its "--allow-unknown-type" option in the previous commit, there are no more uses of the internal flag that implemented it. Let's drop it. That in turn lets us drop the strbuf parameter of unpack_loose_header(), which now is always NULL. And without that, we can drop all of the additional code to inflate larger headers into the strbuf. Arguably we could drop ULHR_TOO_LONG, as no callers really care about the distinction from ULHR_BAD. But it's easy enough to retain, and it does let us produce a slightly more specific message in one instance. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29object-store: drop `loose_object_path()`Patrick Steinhardt1-2/+2
The function `loose_object_path()` is a trivial wrapper around `odb_loose_path()`, with the only exception that it always uses the primary object database of the given repository. This doesn't really add a ton of value though, so let's drop the function and inline it at every callsite. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-24Merge branch 'ps/object-file-cleanup' into ps/object-store-cleanupJunio C Hamano1-1193/+27
* ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-15Merge branch 'ps/object-wo-the-repository'Junio C Hamano1-283/+9
The object layer has been updated to take an explicit repository instance as a parameter in more code paths. * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-04-15Merge branch 'jk/zlib-inflate-fixes'Junio C Hamano1-25/+23
Fix our use of zlib corner cases. * jk/zlib-inflate-fixes: unpack_loose_rest(): rewrite return handling for clarity unpack_loose_rest(): simplify error handling unpack_loose_rest(): never clean up zstream unpack_loose_rest(): avoid numeric comparison of zlib status unpack_loose_header(): avoid numeric comparison of zlib status git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT unpack_loose_header(): fix infinite loop on broken zlib input unpack_loose_header(): report headers without NUL as "bad" unpack_loose_header(): simplify next_out assignment loose_object_info(): BUG() on inflating content with unknown type
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt1-0/+1
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-file: drop `index_blob_stream()`Patrick Steinhardt1-24/+2
The `index_blob_stream()` function is a mere wrapper around `index_blob_bulk_checkin()`. This has been the case since 568508e7657 (bulk-checkin: replace fast-import based implementation, 2011-10-28), which has moved the implementation from `index_blob_stream()` (which was still called `index_stream()`) into `index_bulk_checkin()` (which has since been renamed to `index_blob_bulk_checkin()`). Remove the redirection by dropping the wrapper. Move the comment to `index_blob_bulk_checkin()` to retain its context. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-file: split up concerns of `HASH_*` flagsPatrick Steinhardt1-9/+9
The functions `hash_object_file()`, `write_object_file()` and `index_fd()` reuse the same set of flags to alter their behaviour. This not only adds confusion, but given that every function only supports a subset of the flags it becomes very hard to see which flags can be passed to what function. Last but not least, this entangles the implementation of all three function families. Split up concerns by creating separate flags for each of the function families. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-file: split out functions relating to object store subsystemPatrick Steinhardt1-977/+13
While we have the "object-store.h" header, most of the functionality for object stores is actually hosted in "object-file.c". This makes it hard to find relevant functions and causes us to mix up concerns. Split out functions relating to the object store subsystem into a new "object-store.c" file. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-file: move `xmmap()` into "wrapper.c"Patrick Steinhardt1-48/+0
The `xmmap()` function is provided by "object-file.c" even though its functionality has nothing to do with the object file subsystem. Move it into "wrapper.c", whose header already declares those functions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-file: move `git_open_cloexec()` to "compat/open.c"Patrick Steinhardt1-27/+0
The `git_open_cloexec()` wrapper function provides the ability to open a file with `O_CLOEXEC` in a platform-agnostic way. This function is provided by "object-file.c" even though it is not specific to the object subsystem at all. Move the file into "compat/open.c". This file already exists before this commit, but has only been compiled conditionally depending on whether or not open(3p) may return EINTR. With this change we now unconditionally compile the object, but wrap `git_open_with_retry()` in an ifdef. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-file: move `safe_create_leading_directories()` into "path.c"Patrick Steinhardt1-79/+2
The `safe_create_leading_directories()` function and its relatives are located in "object-file.c", which is not a good fit as they provide generic functionality not related to objects at all. Move them into "path.c", which already hosts `safe_create_dir()` and its relative `safe_create_dir_in_gitdir()`. "path.c" is free of `the_repository`, but the moved functions depend on `the_repository` to read the "core.sharedRepository" config. Adapt the function signature to accept a repository as argument to fix the issue and adjust callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-file: move `mkdir_in_gitdir()` into "path.c"Patrick Steinhardt1-31/+2
The `mkdir_in_gitdir()` function is similar to `safe_create_dir()`, but the former is hosted in "object-file.c" whereas the latter is hosted in "path.c". The latter code unit makes way more sense though as the logic has nothing to do with object files in particular. Move the file into "path.c". While at it, we: - Rename the function to `safe_create_dir_in_gitdir()` so that the function names are similar to one another. - Remove the dependency on `the_repository` by making the callers pass the repository instead. Adjust callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21treewide: replace assert() with ASSERT() in special casesElijah Newren1-1/+1
When the compiler/linker cannot verify that an assert() invocation is free of side effects for us (e.g. because the assertion includes some kind of function call), replace the use of assert() with ASSERT(). Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10hash: stop depending on `the_repository` in `null_oid()`Patrick Steinhardt1-1/+1
The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10object-file: split out logic regarding hash algorithmsPatrick Steinhardt1-277/+0
While we have a "hash.h" header, the actual implementation of the subsystem is hosted by "object-file.c". This makes it harder than necessary to find the actual implementation of the hash subsystem and intermingles the different concerns with one another. Split out the implementation of hash algorithms into a new, separate "hash.c" file. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10object-file-convert: stop depending on `the_repository`Patrick Steinhardt1-3/+4
There are multiple sites in "object-file-convert.c" where we use the global `the_repository` variable, either explicitly or implicitly by using `the_hash_algo`. All of these callsites are transitively called from `convert_object_file()`, which indeed has no repo as input. Refactor the function so that it receives a repository as a parameter and pass it through to all internal functions to get rid of the dependency. Remove the `USE_THE_REPOSITORY_VARIABLE` define. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10environment: move access to "core.bigFileThreshold" into repo settingsPatrick Steinhardt1-2/+4
The "core.bigFileThreshold" setting is stored in a global variable and populated via `git_default_core_config()`. This may cause issues in the case where one is handling multiple different repositories in a single process with different values for that config key, as we may or may not see the correct value in that case. Furthermore, global state blocks our path towards libification. Refactor the code so that we instead store the value in `struct repo_settings`, where the value is computed as-needed and cached. Note that this change requires us to adapt one test in t1050 that verifies that we die when parsing an invalid "core.bigFileThreshold" value. The exercised Git command doesn't use the value at all, and thus it won't hit the new code path that parses the value. This is addressed by using git-hash-object(1) instead, which does read the value. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-05Merge branch 'ps/path-sans-the-repository'Junio C Hamano1-7/+7
The path.[ch] API takes an explicit repository parameter passed throughout the callchain, instead of relying on the_repository singleton instance. * ps/path-sans-the-repository: path: adjust last remaining users of `the_repository` environment: move access to "core.sharedRepository" into repo settings environment: move access to "core.hooksPath" into repo settings repo-settings: introduce function to clear struct path: drop `git_path()` in favor of `repo_git_path()` rerere: let `rerere_path()` write paths into a caller-provided buffer path: drop `git_common_path()` in favor of `repo_common_path()` worktree: return allocated string from `get_worktree_git_dir()` path: drop `git_path_buf()` in favor of `repo_git_path_replace()` path: drop `git_pathdup()` in favor of `repo_git_path()` path: drop unused `strbuf_git_path()` function path: refactor `repo_submodule_path()` family of functions submodule: refactor `submodule_to_gitdir()` to accept a repo path: refactor `repo_worktree_path()` family of functions path: refactor `repo_git_path()` family of functions path: refactor `repo_common_path()` family of functions
2025-02-28path: adjust last remaining users of `the_repository`Patrick Steinhardt1-4/+4
With the preceding refactorings we now only have a couple of implicit users of `the_repository` left in the "path" subsystem, all of which depend on global state via `calc_shared_perm()`. Make the dependency on `the_repository` explicit by passing the repo as a parameter instead and adjust callers accordingly. Note that this change bubbles up into a couple of subsystems that were previously declared as free from `the_repository`. Instead of marking all of them as `the_repository`-dependent again, we instead use the repository that is available in the calling context. There are three exceptions though with "copy.c", "pack-write.c" and "tempfile.c". Adjusting these would require us to adapt callsites all over the place, so this is left for a future iteration. Mark "path.c" as free from `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-25unpack_loose_rest(): rewrite return handling for clarityJeff King1-6/+6
We have a pattern like: if (error1) ...handle error 1... else if (error2) ...handle error 2... else ...return buf... ...free buf and return NULL... This is a little subtle because it is the return in the success block that lets us skip the common error handling. Rewrite this instead to free the buffer in each error path, marking it as NULL, and then all code paths can use the common return. This should make the logic a bit easier to follow. It does mean duplicating the buf cleanup for errors, but it's a single line. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-25unpack_loose_rest(): simplify error handlingJeff King1-3/+3
Inflating a loose object is considered successful only if we got Z_STREAM_END and there were no more bytes. We check both of those conditions and return success, but then have to check them a second time to decide which error message to produce. I.e., we do something like this: if (!error_1 && !error_2) ...return success... if (error_1) ...handle error1... else if (error_2) ...handle error2... ...common error handling... This repetition was the source of a small bug fixed in an earlier commit (our Z_STREAM_END check was not the same in the two conditionals). Instead we can chain them all into a single if/else cascade, which avoids repeating ourselves: if (error_1) ...handle error1... else if (error_2) ...handle error2.... else ...return success... ...common error handling... Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-25unpack_loose_rest(): never clean up zstreamJeff King1-10/+8
The unpack_loose_rest() function has funny ownership semantics: we pass in a z_stream opened by the caller, but then only _sometimes_ close it. This oddity has developed over time. When the function was originally split out in 5180cacc20 (Split up unpack_sha1_file() some more, 2005-06-02), it always called inflateEnd() to clean up the stream (though nowadays it is a git_zstream and we call git_inflate_end()). But in 7efbff7531 (unpack_sha1_file(): detect corrupt loose object files., 2007-03-05) we added error code paths which don't close the stream. This makes some sense, as we'd still look at parts of the stream struct to decide which error to show (though I am not sure in practice if inflateEnd() even touches those fields). This subtlety makes it hard to know when the caller has to clean up the stream and when it does not. That led to the leak fixed by aa9ef614dc (object-file: fix memory leak when reading corrupted headers, 2024-08-14). Let's instead always leave the stream intact, forcing the caller to clean it up. You might think that would create more work for the callers, but it actually ends up simplifying them, since they can put the call to git_inflate_end() in the common cleanup code path. Two things to note, though: - The check_stream_oid() function is used as a replacement for unpack_loose_rest() in read_loose_object() to read blobs. It inherited the same funny semantics, and we should fix it here, too (to keep the cleanup in read_loose_object() consistent). - In read_loose_object() we need a second "out" label, as we can jump to the existing label before opening the stream at all (and since the struct is opaque, there is no way to if it was initialized or not, so we must not call git_inflate_end() in that case). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-25unpack_loose_rest(): avoid numeric comparison of zlib statusJeff King1-1/+1
When unpacking the actual content of a loose object file, we insist both that the status code we got is Z_STREAM_END, and that we consumed all bytes. If we didn't, we'll return an error, but the specific error message we produce depends on which of the two error conditions we saw. So we'll check both a second time to decide which error to produce. But this second time, our status code check is loose: it checks for a negative status value. This can get confused by zlib codes which are not negative, such as Z_NEED_DICT. In this case we'd erroneously print nothing at all, when we should say "corrupt loose object". Instead, this second check should check explicitly against Z_STREAM_END. Note that Z_OK is "0", so the existing code also produced no message for Z_OK. But it's impossible to see that status, since we only break out of the inflate loop when we stop seeing Z_OK (so a stream which has more bytes than its object header claims would eventually yield Z_BUF_ERROR). There's no test here, as it would require a loose object whose zlib stream returns Z_NEED_DICT in the middle of the object content. I think that is probably possible, but even our Z_NEED_DICT test in t1006 does not trigger this, because we hit that error while reading the header. I found this bug while reviewing all callers of git_inflate() for bugs similar to the one we saw in unpack_loose_header(). This was the only other case that did a numeric comparison rather than explicitly checking for Z_STREAM_END. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-25unpack_loose_header(): avoid numeric comparison of zlib statusJeff King1-1/+1
When unpacking a loose header, we try to inflate the first 32 bytes. We'd expect either Z_OK (we filled up the output buffer, but there are more bytes in the object) or Z_STREAM_END (this is a tiny object whose header and content fit in the buffer). We check for that with "if (status < Z_OK)", making the assumption that all of the errors we'd see have negative values (as Z_OK itself is "0", and Z_STREAM_END is "1"). But there's at least one case this misses: Z_NEED_DICT is "2". This isn't something we'd ever expect to see, but if we do see it, we should consider it an error (since we have no dictionary to load). Instead, the current code interprets Z_NEED_DICT as success and looks for the object header's terminating NUL in the bytes we've read. This will generaly be zero bytes if the dictionary is mentioned at the start of the stream. So we'll fail to find it and complain "the header is too long" (ULHR_LONG). But really, the problem is that the object is malformed, and we should return ULHR_BAD. This is a minor bug, as we consider both cases to be an error. But it does mean we print the wrong error message. The test case added in the previous patch triggers this code, so we can just confirm the error message we see here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-25unpack_loose_header(): fix infinite loop on broken zlib inputJeff King1-1/+1
When reading a loose object, we first try to expand the first 32 bytes to read the type+size header. This is enough for any of the normal Git types. But since 46f034483e (sha1_file: support reading from a loose object of unknown type, 2015-05-03), the caller can also ask us to parse any unknown names, which can be much longer. In this case we keep inflating until we find the NUL at the end of the header, or hit Z_STREAM_END. But what if zlib can't make forward progress? For example, if the loose object file is truncated, we'll have no more data to feed it. It will return Z_BUF_ERROR, and we'll just loop infinitely, calling git_inflate() over and over but never seeing new bytes nor an end-of-stream marker. We can fix this by only looping when we think we can make forward progress. This will always be Z_OK in this case. In other code we might also be able to continue on Z_BUF_ERROR, but: - We will never see Z_BUF_ERROR because the output buffer is full; we always feed a fresh 32-byte buffer on each call to git_inflate(). - We may see Z_BUF_ERROR if we run out of input. But since we've fed the whole mmap'd buffer to zlib, if it runs out of input there is nothing more we can do. So if we don't see Z_OK (and didn't see the end-of-header NUL, otherwise we'd have broken out of the loop), then we should stop looping and return an error. The test case shows an example where the input is truncated (which gives us the input Z_BUF_ERROR case above). Although we do operate on objects we might get from an untrusted remote, I don't think the security implications of this bug are too great. It can only trigger if both of these are true: - You're reading a loose object whose on-disk representation was written by an attacker. So fetching an object (or receiving a push) are mostly OK, because even with unpack-objects it is our local, trusted code that writes out the object file. The exception may be fetching from an untrusted local repo, or using dumb-http, which copies objects verbatim. But... - The only code path which triggers the inflate loop is cat-file's --allow-unknown-type option. This is unlikely to be called at all outside of debugging. But I also suspect that objects with non-standard types (or that are truncated) would not survive the usual fetch/receive checks in the first place. So I think it would be quite hard to trick somebody into running the infinite loop, and we can just fix the bug. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-25unpack_loose_header(): report headers without NUL as "bad"Jeff King1-1/+1
If a caller asks us to read the whole loose object header value into a strbuf (e.g., via "cat-file --allow-unknown-type"), we'll keep reading until we see a NUL byte marking the end of the header. If we hit Z_STREAM_END before seeing the NUL, we obviously have to stop. But we return ULHR_TOO_LONG, which doesn't make any sense. The "too long" return code is used in the normal, 32-byte limited mode to indicate that we stopped looking. There is no such thing as "too long" here, as we'd keep reading forever until we see the end of stream or the NUL. Instead, we should return ULHR_BAD. The loose object has no NUL marking the end of header, so it is malformed. The behavior difference is slight; in either case we'd consider the object unreadable and refuse to go further. The only difference is the specific error message we produce. There's no test case here, as we'd need to generate a valid zlib stream without a NUL. That's not something Git will do without writing new custom code. And in the next patch we'll fix another bug in this area which will make this easier to do (and we will test it then). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>