aboutsummaryrefslogtreecommitdiffstats
path: root/midx-write.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-09-15Merge branch 'ds/midx-write-fixes'Junio C Hamano1-68/+63
Fixes multiple crashes around midx write-out codepaths. * ds/midx-write-fixes: midx-write: simplify error cases midx-write: reenable signed comparison errors midx-write: use uint32_t for preferred_pack_idx midx-write: use cleanup when incremental midx fails midx-write: put failing response value back midx-write: only load initialized packs
2025-09-12Merge branch 'ps/object-store-midx-dedup-info'Junio C Hamano1-63/+53
Further code clean-up for multi-pack-index code paths. * ps/object-store-midx-dedup-info: midx: compute paths via their source midx: stop duplicating info redundant with its owning source midx: write multi-pack indices via their source midx: load multi-pack indices via their source midx: drop redundant `struct repository` parameter odb: simplify calling `link_alt_odb_entry()` odb: return newly created in-memory sources odb: consistently use "dir" to refer to alternate's directory odb: allow `odb_find_source()` to fail odb: store locality in object database sources
2025-09-05midx-write: simplify error casesDerrick Stolee1-17/+9
The write_midx_internal() method uses gotos to jump to a cleanup section to clear memory before returning 'result'. Since these jumps are more common for error conditions, initialize 'result' to -1 and then only set it to 0 before returning with success. There are a couple places where we return with success via a jump. This has the added benefit that the method now returns -1 on error instead of an inconsistent 1 or -1. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-05midx-write: reenable signed comparison errorsDerrick Stolee1-17/+18
Remove the remaining signed comparison warnings in midx-write.c so that they can be enforced as errors in the future. After the previous change, the remaining errors are due to iterator variables named 'i'. The strategy here involves defining the variable within the for loop syntax to make sure we use the appropriate bitness for the loop sentinel. This matters in at least one method where the variable was compared to uint32_t in some loops and size_t in others. While adjusting these loops, there were some where the loop boundary was checking against a uint32_t value _plus one_. These were replaced with non-strict comparisons, but also the value is checked to not be UINT32_MAX. Since the value is the number of incremental multi-pack- indexes, this is not a meaningful restriction. The new die() is about defensive programming more than it being realistically possible. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-05midx-write: use uint32_t for preferred_pack_idxDerrick Stolee1-11/+15
midx-write.c has the DISABLE_SIGN_COMPARE_WARNINGS macro defined for a few reasons, but the biggest one is the use of a signed preferred_pack_idx member inside the write_midx_context struct. The code currently uses -1 to indicate an unset preferred pack but pack int ids are normally handled as uint32_t. There are also a few loops that search for the preferred pack by name and those iterators will need updates to uint32_t in the next change. For now, replace the use of -1 with a 'NO_PREFERRED_PACK' macro and an equality check. The macro stores the max value of a uint32_t, so we cannot store a preferred pack that appears last in a list of 2^32 total packs, but that's expected to be unreasonable already. Furthermore, with this change we end up extending the range from 2^31 possible packs to 2^32-1. There are some careful things to worry about with initializing the preferred pack in the struct and using that value when searching for a preferred pack that was already incorrect but accidentally working when the index was initialized to zero. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-05midx-write: use cleanup when incremental midx failsDerrick Stolee1-6/+12
The incremental mode of writing a multi-pack-index has a few extra conditions that could lead to failure, but these are currently short-ciruiting with 'return -1' instead of setting the method's 'result' variable and going to the cleanup tag. Replace these returns with gotos to avoid memory issues when exiting early due to error conditions. Unfortunately, these error conditions are difficult to reproduce with test cases, which is perhaps one reason why the memory loss was not caught by existing test cases in memory tracking modes. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-05midx-write: put failing response value backDerrick Stolee1-0/+1
This instance of setting the result to 1 before going to cleanup was accidentally removed in fcb2205b77 (midx: implement support for writing incremental MIDX chains, 2024-08-06). Build upon a test that already deletes a packfile to verify that this error propagates to full command failure. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-05midx-write: only load initialized packsDerrick Stolee1-27/+19
The fill_packs_from_midx() method was refactored in fcb2205b77 (midx: implement support for writing incremental MIDX chains, 2024-08-06) to allow for preferred packfiles and incremental multi-pack-indexes. However, this led to some conditions that can cause improperly initialized memory in the context's list of packfiles. The conditions caring about the preferred pack name or the incremental flag are currently necessary to load a packfile. But the context is still being populated with pack_info structs based on the packfile array for the existing multi-pack-index even if prepare_midx_pack() isn't called. Add a new test that breaks under --stress when compiled with SANITIZE=address. The chosen number of 100 packfiles was selected to get the --stress output to fail about 50% of the time, while 50 packfiles could not get a failure in most --stress runs. The test case is marked as EXPENSIVE not only because of the number of packfiles it creates, but because some CI environments were reporting errors during the test that I could not reproduce, specifically around being unable to open the packfiles or their pack-indexes. When it fails under SANITIZE=address, it provides the following error: AddressSanitizer:DEADLYSIGNAL ================================================================= ==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027 ==3263517==The signal is caused by a READ memory access. ==3263517==Hint: address points to the zero page. #0 0x562d5d82d1fb in close_pack_windows packfile.c:299 #1 0x562d5d82d3ab in close_pack packfile.c:354 #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490 #3 0x562d5d7c7aec in midx_repack midx-write.c:1795 #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305 ... This failure stack trace is disconnected from the real fix because the bad pointers are accessed later when closing the packfiles from the context. There are a few different aspects to this fix that are worth noting: 1. We return to the previous behavior of fill_packs_from_midx to not rely on the incremental flag or existence of a preferred pack. 2. The behavior to scan all layers of an incremental midx is kept, so this is not a full revert of the change. 3. We skip allocating more room in the pack_info array if the pack fails prepare_midx_pack(). 4. The method has always returned 0 for success and 1 for failure, but the condition checking for error added a check for a negative result for failure, so that is now updated. 5. The call to open_pack_index() is removed, but this is needed later in the case of a preferred pack. That call is moved to immediately before its result is needed (checking for the object count). Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-11midx: compute paths via their sourcePatrick Steinhardt1-28/+24
With the preceding commits we started to always have the object database source available when we load, write or access multi-pack indices. With this in place we can change how MIDX paths are computed so that we don't have to pass in the combination of a hash algorithm and object directory anymore, but only the object database source. Refactor the code accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-11midx: stop duplicating info redundant with its owning sourcePatrick Steinhardt1-4/+5
Multi-pack indices store some information that is redundant with their owning source: - The locality bit that tracks whether the source is the primary object source or an alternate. - The object directory path the multi-pack index is located in. - The pointer to the owning parent directory. All of this information is already contained in `struct odb_source`. So now that we always have that struct available when loading a multi-pack index we have it readily accessible. Drop the redundant information and instead store a pointer to the object source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-11midx: write multi-pack indices via their sourcePatrick Steinhardt1-36/+31
Similar to the preceding commit, refactor the writing side of multi-pack indices so that we pass in the object database source where the index should be written to. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-11midx: drop redundant `struct repository` parameterPatrick Steinhardt1-9/+7
There are a couple of functions that take both a `struct repository` and a `struct multi_pack_index`. This provides redundant information though without much benefit given that the multi-pack index already has a pointer to its owning repository. Drop the `struct repository` parameter from such functions. While at it, reorder the list of parameters of `fill_midx_entry()` so that the MIDX comes first to better align with our coding guidelines. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-11odb: allow `odb_find_source()` to failPatrick Steinhardt1-1/+1
When trying to locate a source for an unknown object directory we will die right away. In subsequent patches we will add new callsites though that want to handle this situation gracefully instead. Refactor the function to return a `NULL` pointer if the source could not be found and adapt the callsites to die instead. Introduce a new wrapper `odb_find_source_or_die()` that continues to die in case the source could not be found. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-05Merge branch 'ps/object-file-wo-the-repository'Junio C Hamano1-1/+1
Reduce implicit assumption and dependence on the_repository in the object-file subsystem. * ps/object-file-wo-the-repository: object-file: get rid of `the_repository` in index-related functions object-file: get rid of `the_repository` in `force_object_loose()` object-file: get rid of `the_repository` in `read_loose_object()` object-file: get rid of `the_repository` in loose object iterators object-file: remove declaration for `for_each_file_in_obj_subdir()` object-file: inline `for_each_loose_file_in_objdir_buf()` object-file: get rid of `the_repository` when writing objects odb: introduce `odb_write_object()` loose: write loose objects map via their source object-file: get rid of `the_repository` in `finalize_object_file()` object-file: get rid of `the_repository` in `loose_object_info()` object-file: get rid of `the_repository` when freshening objects object-file: inline `check_and_freshen()` functions object-file: get rid of `the_repository` in `has_loose_object()` object-file: stop using `the_hash_algo` object-file: fix -Wsign-compare warnings
2025-07-16object-file: get rid of `the_repository` in `finalize_object_file()`Patrick Steinhardt1-1/+1
We implicitly depend on `the_repository` when moving an object file into place in `finalize_object_file()`. Get rid of this global dependency by passing in a repository. Note that one might be pressed to inject an object database instead of a repository. But the function doesn't really care about the ODB at all. All it does is to move a file into place while checking whether there is any collision. As such, the functionality it provides is independent of the object database and only needs the repository as parameter so that it can adjust permissions of the file we are about to finalize. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-15packfile: refactor `get_multi_pack_index()` to work on sourcesPatrick Steinhardt1-20/+2
The function `get_multi_pack_index()` loads multi-pack indices via `prepare_packed_git()` and then returns the linked list of multi-pack indices that is stored in `struct object_database`. That list is in the process of being removed though in favor of storing the MIDX as part of the object database source it belongs to. Refactor `get_multi_pack_index()` so that it returns the multi-pack index for a single object source. Callers are now expected to call this function for each source they are interested in. This requires them to iterate through alternates, so we have to prepare alternate object sources before doing so. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: get rid of `the_repository` in `find_odb()`Patrick Steinhardt1-1/+1
Get rid of our dependency on `the_repository` in `find_odb()` by passing in the object database in which we want to search for the source and adjusting all callers. Rename the function to `odb_find_source()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-22midx: avoid negative array indexPhillip Wood1-2/+2
nth_midxed_pack_int_id() returns the index of the pack file in the multi pack index's list of packfiles that the specified object. The index is returned as a uint32_t. Storing this in an int will make the index negative if the most significant bit is set. Fix this by using uint32_t as the rest of the code does. This is unlikely to be a practical problem as it requires the multipack index to reference 2^31 packfiles. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-22midx repack: avoid potential integer overflow on 64 bit systemsPhillip Wood1-2/+8
On a 64 bit system the calculation p->pack_size * pack_info[i].referenced_objects could overflow. If a pack file contains 2^28 objects with an average compressed size of 1KB then the pack size will be 2^38B. If all of the objects are referenced by the multi-pack index the sum above will overflow. Avoid this by using shifted integer arithmetic and changing the order of the calculation so that the pack size is divided by the total number of objects in the pack before multiplying by the number of objects referenced by the multi-pack index. Using a shift of 14 bits should give reasonable accuracy while avoiding overflow for pack sizes less that 1PB. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-22midx repack: avoid integer overflow on 32 bit systemsPhillip Wood1-4/+8
On a 32 bit system "git multi-pack-index --repack --batch-size=120M" failed with fatal: size_t overflow: 6038786 * 1289 The calculation to estimated size of the objects in the pack referenced by the multi-pack-index uses st_mult() to multiply the pack size by the number of referenced objects before dividing by the total number of objects in the pack. As size_t is 32 bits on 32 bit systems this calculation easily overflows. Fix this by using 64bit arithmetic instead. Also fix a potential overflow when caluculating the total size of the objects referenced by the multipack index with a batch size larger than SIZE_MAX / 2. In that case total_size += estimated_size can overflow as both total_size and estimated_size can be greater that SIZE_MAX / 2. This is addressed by using saturating arithmetic for the addition. Although estimated_size is of type uint64_t by the time we reach this sum it is bounded by the batch size which is of type size_t and so casting estimated_size to size_t does not truncate the value. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-24Merge branch 'ps/object-file-cleanup'Junio C Hamano1-1/+1
Code clean-up. * ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-15Merge branch 'ps/object-wo-the-repository'Junio C Hamano1-5/+7
The object layer has been updated to take an explicit repository instance as a parameter in more code paths. * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-04-15object-file: move `safe_create_leading_directories()` into "path.c"Patrick Steinhardt1-1/+1
The `safe_create_leading_directories()` function and its relatives are located in "object-file.c", which is not a good fit as they provide generic functionality not related to objects at all. Move them into "path.c", which already hosts `safe_create_dir()` and its relative `safe_create_dir_in_gitdir()`. "path.c" is free of `the_repository`, but the moved functions depend on `the_repository` to read the "core.sharedRepository" config. Adapt the function signature to accept a repository as argument to fix the issue and adjust callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21midx: implement writing incremental MIDX bitmapsTaylor Blau1-19/+38
Now that the pack-bitmap machinery has learned how to read and interact with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery (and relevant callers from within the MIDX machinery) to write such bitmaps. The details for doing so are mostly straightforward. The main changes are as follows: - find_object_pos() now makes use of an extra MIDX parameter which is used to locate the bit positions of objects which are from previous layers (and thus do not exist in the current layer's pack_order field). (Note also that the pack_order field is moved into struct write_midx_context to further simplify the callers for write_midx_bitmap()). - bitmap_writer_build_type_index() first determines how many objects precede the current bitmap layer and offsets the bits it sets in each respective type-level bitmap by that amount so they can be OR'd together. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10pack-write: stop depending on `the_repository` and `the_hash_algo`Patrick Steinhardt1-1/+1
There are a couple of functions in "pack-write.c" that implicitly depend on `the_repository` or `the_hash_algo`. Remove this dependency by injecting the repository via a parameter and adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10object: stop depending on `the_repository`Patrick Steinhardt1-2/+2
There are a couple of functions exposed by "object.c" that implicitly depend on `the_repository`. Remove this dependency by injecting the repository via a parameter. Adapt callers accordingly by simply using `the_repository`, except in cases where the subsystem is already free of the repository. In that case, we instead pass the repository provided by the caller's context. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10csum-file: stop depending on `the_repository`Patrick Steinhardt1-2/+4
There are multiple sites in "csum-file.c" where we use the global `the_repository` variable, either explicitly or implicitly by using `the_hash_algo`. Refactor the code to stop using `the_repository` by adapting functions to receive required data as parameters. Adapt callsites accordingly by either using `the_repository->hash_algo`, or by using a context-provided hash algorithm in case the subsystem already got rid of its dependency on `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-28path: adjust last remaining users of `the_repository`Patrick Steinhardt1-1/+1
With the preceding refactorings we now only have a couple of implicit users of `the_repository` left in the "path" subsystem, all of which depend on global state via `calc_shared_perm()`. Make the dependency on `the_repository` explicit by passing the repo as a parameter instead and adjust callers accordingly. Note that this change bubbles up into a couple of subsystems that were previously declared as free from `the_repository`. Instead of marking all of them as `the_repository`-dependent again, we instead use the repository that is available in the calling context. There are three exceptions though with "copy.c", "pack-write.c" and "tempfile.c". Adjusting these would require us to adapt callsites all over the place, so this is left for a future iteration. Mark "path.c" as free from `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03Merge branch 'kn/pack-write-with-reduced-globals'Junio C Hamano1-2/+2
Code clean-up. * kn/pack-write-with-reduced-globals: pack-write: pass hash_algo to internal functions pack-write: pass hash_algo to `write_rev_file()` pack-write: pass hash_algo to `write_idx_file()` pack-write: pass repository to `index_pack_lockfile()` pack-write: pass hash_algo to `fixup_pack_header_footer()`
2025-01-21pack-write: pass hash_algo to `write_rev_file()`Karthik Nayak1-2/+2
The `write_rev_file()` function uses the global `the_hash_algo` variable to access the repository's hash_algo. To avoid global variable usage, pass a hash_algo from the layers above. Also modify children functions `write_rev_file_order()` and `write_rev_header()` to accept 'the_hash_algo'. Altough the layers above could have access to the hash_algo internally, simply pass in `the_hash_algo`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. However, in `midx-write.c`, since all usage of global variables is removed, don't reintroduce them and instead use the `repo` available in the context. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-23Merge branch 'ps/build-sign-compare'Junio C Hamano1-0/+2
Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-18progress: stop using `the_repository`Patrick Steinhardt1-3/+8
Stop using `the_repository` in the "progress" subsystem by passing in a repository when initializing `struct progress`. Furthermore, store a pointer to the repository in that struct so that we can pass it to the trace2 API when logging information. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18Merge branch 'ps/build-sign-compare' into ps/the-repositoryJunio C Hamano1-0/+2
* ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt1-0/+1
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04midx: pass down `hash_algo` to functions using global variablesKarthik Nayak1-11/+11
The functions `get_split_midx_filename_ext()`, `get_midx_filename()` and `get_midx_filename_ext()` use `hash_to_hex()` which internally uses the `the_hash_algo` global variable. Remove this dependency on global variables by passing down the `hash_algo` through to the functions mentioned and instead calling `hash_to_hex_algop()` along with the obtained `hash_algo`. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04midx-write: pass down repository to `write_midx_file[_only]`Karthik Nayak1-13/+9
In a previous commit, we passed the repository field to all subcommands in the `builtin/` directory. Utilize this to pass the repository field down to the `write_midx_file[_only]` functions to remove the usage of `the_repository` global variables. With this, all usage of global variables in `midx-write.c` is removed, hence, remove the `USE_THE_REPOSITORY_VARIABLE` guard from the file. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04write-midx: add repository field to `write_midx_context`Karthik Nayak1-18/+20
The struct `write_midx_context` is used to pass context for creating MIDX files. Add the repository field here to ensure that most functions within `midx-write.c` have access to the field and can use that instead of the global `the_repository` variable. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04midx-write: use `revs->repo` inside `read_refs_snapshot`Karthik Nayak1-1/+2
The function `read_refs_snapshot()` uses `parse_oid_hex()`, which relies on the global `the_hash_algo` variable. Let's instead use `parse_oid_hex_algop()` and provide the hash algo via `revs->repo`. Also, while here, fix a missing newline after the function's definition. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04midx-write: pass down repository to static functionsKarthik Nayak1-26/+31
In 'midx-write.c' there are a lot of static functions which use global variables `the_repository` or `the_hash_algo`. In a follow up commit, the repository variable will be added to `write_midx_context`, which some of the functions can use. But for functions which do not have access to this struct, pass down the required information from non-static functions `write_midx_file` and `write_midx_file_only`. This requires that the function `hash_to_hex` is also replaced with `hash_to_hex_algop` since the former internally accesses the `the_hash_algo` global variable. This ensures that the usage of global variables is limited to these non-static functions, which will be cleaned up in a follow up commit. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04packfile: add repository to struct `packed_git`Karthik Nayak1-1/+1
The struct `packed_git` holds information regarding a packed object file. Let's add the repository variable to this object, to represent the repository that this packfile belongs to. This helps remove dependency on the global `the_repository` object in `packfile.c` by simply using repository information now readily available in the struct. We do need to consider that a packfile could be part of the alternates of a repository, but considering that we only have one repository struct and also that we currently anyways use 'the_repository', we should be OK with this change. We also modify `alloc_packed_git` to ensure that the repository is added to newly created `packed_git` structs. This requires modifying the function and all its callee to pass the repository object down the levels. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30pack-write: fix return parameter of `write_rev_file_order()`Patrick Steinhardt1-1/+2
While the return parameter of `write_rev_file_order()` is a string constant, the function may indeed return an allocated string when its first parameter is a `NULL` pointer. This makes for a confusing calling convention, where callers need to be aware of these intricate ownership rules and cast away the constness to free the string in some cases. Adapt the function and its caller `write_rev_file()` to always return an allocated string and adapt callers to always free the return value. Note that this requires us to also adapt `rename_tmp_packfile()`, which compares the pointers to packfile data with each other. Now that the path of the reverse index file gets allocated unconditionally the check will always fail. This is fixed by using strcmp(3P) instead, which also feels way less fragile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30midx-write: fix leaking bufferPatrick Steinhardt1-0/+2
The buffer used to compute the final MIDX name is never released. Plug this memory leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-03Merge branch 'ps/leakfixes-part-5'Junio C Hamano1-12/+12
Even more leak fixes. * ps/leakfixes-part-5: transport: fix leaking negotiation tips transport: fix leaking arguments when fetching from bundle builtin/fetch: fix leaking transaction with `--atomic` remote: fix leaking peer ref when expanding refmap remote: fix leaks when matching refspecs remote: fix leaking config strings builtin/fetch-pack: fix leaking refs sideband: fix leaks when configuring sideband colors builtin/send-pack: fix leaking refspecs transport: fix leaking OID arrays in git:// transport data t/helper: fix leaking multi-pack-indices in "read-midx" builtin/repack: fix leaks when computing packs to repack midx-write: fix leaking hashfile on error cases builtin/archive: fix leaking `OPT_FILENAME()` value builtin/upload-archive: fix leaking args passed to `write_archive()` builtin/merge-tree: fix leaking `-X` strategy options pretty: fix leaking key/value separator buffer pretty: fix memory leaks when parsing pretty formats convert: fix leaks when resetting attributes mailinfo: fix leaking header data
2024-08-26Merge branch 'tb/pseudo-merge-bitmap-fixes'Junio C Hamano1-6/+4
We created a useless pseudo-merge reachability bitmap that is about 0 commits, and attempted to include commits that are not in packs, which made no sense. These bugs have been corrected. * tb/pseudo-merge-bitmap-fixes: pseudo-merge.c: ensure pseudo-merge groups are closed pseudo-merge.c: do not generate empty pseudo-merge commits t/t5333-pseudo-merge-bitmaps.sh: demonstrate empty pseudo-merge groups pack-bitmap-write.c: select pseudo-merges even for small bitmaps pack-bitmap: drop redundant args from `bitmap_writer_finish()` pack-bitmap: drop redundant args from `bitmap_writer_build()` pack-bitmap: drop redundant args from `bitmap_writer_build_type_index()` pack-bitmap: initialize `bitmap_writer_init()` with packing_data
2024-08-22midx-write: fix leaking hashfile on error casesPatrick Steinhardt1-12/+12
When writing the MIDX file we first create the `struct hashfile` used to write the trailer hash, and then afterwards we verify whether we can actually write the MIDX in the first place. When we decide that we can't, this leads to a memory leak because we never free the hash file contents. We could fix this by freeing the hashfile on the exit path. There is a better option though: we can simply move the checks for the error condition earlier. As there is no early exit between creating the hashfile and finalizing it anymore this is sufficient to fix the memory leak. While at it, also move around the block checking for `ctx.entries_nr`. This change is not required to fix the memory leak, but it feels natural to move together all massaging of parameters before we go with them and execute the actual logic. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-19Merge branch 'tb/incremental-midx-part-1'Junio C Hamano1-49/+275
Incremental updates of multi-pack index files. * tb/incremental-midx-part-1: midx: implement support for writing incremental MIDX chains t/t5313-pack-bounds-checks.sh: prepare for sub-directories t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' midx: implement verification support for incremental MIDXs midx: support reading incremental MIDX chains midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs midx: teach `midx_preferred_pack()` about incremental MIDXs midx: teach `midx_contains_pack()` about incremental MIDXs midx: remove unused `midx_locate_pack()` midx: teach `fill_midx_entry()` about incremental MIDXs midx: teach `nth_midxed_offset()` about incremental MIDXs midx: teach `bsearch_midx()` about incremental MIDXs midx: introduce `bsearch_one_midx()` midx: teach `nth_bitmapped_pack()` about incremental MIDXs midx: teach `nth_midxed_object_oid()` about incremental MIDXs midx: teach `prepare_midx_pack()` about incremental MIDXs midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs midx: add new fields for incremental MIDX chains Documentation: describe incremental MIDX format
2024-08-15pack-bitmap: drop redundant args from `bitmap_writer_finish()`Taylor Blau1-2/+1
In a similar fashion as the previous commit, drop a redundant argument from the `bitmap_writer_finish()` function. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-15pack-bitmap: drop redundant args from `bitmap_writer_build()`Taylor Blau1-1/+1
In a similar fashion as the previous commit, drop a redundant argument from the `bitmap_writer_build()` function. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-15pack-bitmap: drop redundant args from `bitmap_writer_build_type_index()`Taylor Blau1-2/+1
The previous commit ensures that the bitmap_writer's "to_pack" field is initialized early on, so the "to_pack" and "index_nr" arguments to `bitmap_writer_build_type_index()` are redundant. Drop them and adjust the callers accordingly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-15pack-bitmap: initialize `bitmap_writer_init()` with packing_dataTaylor Blau1-1/+1
In order to determine its object order, the pack-bitmap machinery keeps a 'struct packing_data' corresponding to the pack or pseudo-pack (when writing a MIDX bitmap) being written. The to_pack field is provided to the bitmap machinery by callers of bitmap_writer_build() and assigned to the bitmap_writer struct at that point. But a subsequent commit will want to have access to that data earlier on during commit selection. Prepare for that by adding a 'to_pack' argument to 'bitmap_writer_init()', and initializing the field during that function. Subsequent commits will clean up other functions which take now-redundant arguments (like nr_objects, which is equivalent to pdata->objects_nr, or pdata itself). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>