aboutsummaryrefslogtreecommitdiffstats
path: root/builtin/repack.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2023-04-06Merge branch 'en/header-split-cleanup'Junio C Hamano1-0/+2
Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers
2023-04-06Merge branch 'ab/remove-implicit-use-of-the-repository'Junio C Hamano1-1/+1
Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano1-1/+1
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-28cocci: apply the "promisor-remote.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-1/+1
Apply the part of "the_repository.pending.cocci" pertaining to "promisor-remote.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren1-0/+1
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-17Merge branch 'jk/unused-post-2.39-part2'Junio C Hamano1-2/+3
More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...
2023-02-24for_each_object: mark unused callback parametersJeff King1-2/+3
The for_each_{loose,packed}_object interface uses callback functions, but not every callback needs all of the parameters. Mark the unused ones to satisfy -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren1-1/+1
This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06repack: fix leaks on error with "goto cleanup"Ævar Arnfjörð Bjarmason1-6/+7
In cmd_repack() when we hit an error, replace "return ret" with "goto cleanup" to ensure we free the necessary data structures. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-14Merge branch 'ab/various-leak-fixes'Junio C Hamano1-1/+1
Various leak fixes. * ab/various-leak-fixes: built-ins: use free() not UNLEAK() if trivial, rm dead code revert: fix parse_options_concat() leak cherry-pick: free "struct replay_opts" members rebase: don't leak on "--abort" connected.c: free the "struct packed_git" sequencer.c: fix "opts->strategy" leak in read_strategy_opts() ls-files: fix a --with-tree memory leak revision API: call graph_clear() in release_revisions() unpack-file: fix ancient leak in create_temp_file() built-ins & libs & helpers: add/move destructors, fix leaks dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache" read-cache.c: clear and free "sparse_checkout_patterns" commit: discard partial cache before (re-)reading it {reset,merge}: call discard_index() before returning tests: mark tests as passing with SANITIZE=leak
2022-11-21built-ins & libs & helpers: add/move destructors, fix leaksÆvar Arnfjörð Bjarmason1-1/+1
Fix various leaks in built-ins, libraries and a test helper here we were missing a call to strbuf_release(), string_list_clear() etc, or were calling them after a potential "return". Comments on individual changes: - builtin/checkout.c: Fix a memory leak that was introduced in [1]. A sibling leak introduced in [2] was recently fixed in [3]. As with [3] we should be using the wt_status_state_free_buffers() API introduced in [4]. - builtin/repack.c: Fix a leak that's been here since this use of "strbuf_release()" was added in a1bbc6c0176 (repack: rewrite the shell script in C, 2013-09-15). We don't use the variable for anything except this loop, so we can instead free it right afterwards. - builtin/rev-parse: Fix a leak that's been here since this code was added in 21d47835386 (Add a parseopt mode to git-rev-parse to bring parse-options to shell scripts., 2007-11-04). - builtin/stash.c: Fix a couple of leaks that have been here since this code was added in d4788af875c (stash: convert create to builtin, 2019-02-25), we strbuf_release()'d only some of the "struct strbuf" we allocated earlier in the function, let's release all of them. - ref-filter.c: Fix a leak in 482c1191869 (gpg-interface: improve interface for parsing tags, 2021-02-11), we don't use the "payload" variable that we ask parse_signature() to populate for us, so let's free it. - t/helper/test-fake-ssh.c: Fix a leak that's been here since this code was added in 3064d5a38c7 (mingw: fix t5601-clone.sh, 2016-01-27). Let's free the "struct strbuf" as soon as we don't need it anymore. 1. c45f0f525de (switch: reject if some operation is in progress, 2019-03-29) 2. 2708ce62d21 (branch: sort detached HEAD based on a flag, 2021-01-07) 3. abcac2e19fa (ref-filter.c: fix a leak in get_head_description, 2022-09-25) 4. 962dd7ebc3e (wt-status: introduce wt_status_state_free_buffers(), 2020-09-27). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-18Merge branch 'tb/repack-expire-to'Taylor Blau1-10/+61
"git repack" learns to send cruft objects out of the way into packfiles outside the repository. * tb/repack-expire-to: builtin/repack.c: implement `--expire-to` for storing pruned objects builtin/repack.c: write cruft packs to arbitrary locations builtin/repack.c: pass "cruft_expiration" to `write_cruft_pack` builtin/repack.c: pass "out" to `prepare_pack_objects`
2022-10-30Merge branch 'jk/repack-tempfile-cleanup'Taylor Blau1-61/+29
The way "git repack" creared temporary files when it received a signal was prone to deadlocking, which has been corrected. * jk/repack-tempfile-cleanup: t7700: annotate cruft-pack failure with ok=sigpipe repack: drop remove_temporary_files() repack: use tempfiles for signal cleanup repack: expand error message for missing pack files repack: populate extension bits incrementally repack: convert "names" util bitfield to array
2022-10-28Merge branch 'tb/remove-unused-pack-bitmap'Junio C Hamano1-0/+32
When creating a multi-pack bitmap, remove per-pack bitmap files unconditionally as they will never be consulted. * tb/remove-unused-pack-bitmap: builtin/repack.c: remove redundant pack-based bitmaps
2022-10-24builtin/repack.c: implement `--expire-to` for storing pruned objectsTaylor Blau1-0/+40
When pruning objects with `--cruft`, `git repack` offers some flexibility when selecting the set of which objects are pruned via the `--cruft-expiration` option. This is useful for expiring objects which are older than the grace period, making races where to-be-pruned objects become reachable and then ancestors of freshly pushed objects, leaving the repository in a corrupt state after pruning substantially less likely [1]. But in practice, such races are impossible to avoid entirely, no matter how long the grace period is. To prevent this race, it is often advisable to temporarily put a repository into a read-only state. But in practice, this is not always practical, and so some middle ground would be nice. This patch introduces a new option, `--expire-to`, which teaches `git repack` to write an additional cruft pack containing just the objects which were pruned from the repository. The caller can specify a directory outside of the current repository as the destination for this second cruft pack. This makes it possible to prune objects from a repository, while still holding onto a supplemental copy of them outside of the original repository. Having this copy on-disk makes it substantially easier to recover objects when the aforementioned race is encountered. `--expire-to` is implemented in a somewhat convoluted manner, which is to take advantage of the fact that the first time `write_cruft_pack()` is called, it adds the name of the cruft pack to the `names` string list. That means the second time we call `write_cruft_pack()`, objects in the previously-written cruft pack will be excluded. As long as the caller ensures that no objects are expired during the second pass, this is sufficient to generate a cruft pack containing all objects which don't appear in any of the new packs written by `git repack`, including the cruft pack. In other words, all of the objects which are about to be pruned from the repository. It is important to note that the destination in `--expire-to` does not necessarily need to be a Git repository (though it can be) Notably, the expired packs do not contain all ancestors of expired objects. So if the source repository contains something like: <unreachable> / C1 --- C2 \ refs/heads/master where C2 is unreachable, but has a parent (C1) which is reachable, and C2 would be pruned, then the expiry pack will contain only C2, not C1. [1]: https://lore.kernel.org/git/20190319001829.GL29661@sigill.intra.peff.net/ Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24builtin/repack.c: write cruft packs to arbitrary locationsTaylor Blau1-3/+11
In the following commit, a new write_cruft_pack() caller will be added which wants to write a cruft pack to an arbitrary location. Prepare for this by adding a parameter which controls the destination of the cruft pack. For now, provide "packtmp" so that this commit does not change any behavior. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24builtin/repack.c: pass "cruft_expiration" to `write_cruft_pack`Taylor Blau1-2/+4
`builtin/repack.c`'s `write_cruft_pack()` is used to generate the cruft pack when `--cruft` is supplied. It uses a static variable "cruft_expiration" which is filled in by option parsing. A future patch will add an `--expire-to` option which allows `git repack` to write a cruft pack containing the pruned objects out to a separate repository. In order to implement this functionality, some callers will have to pass a value for `cruft_expiration` different than the one filled out by option parsing. Prepare for this by teaching `write_cruft_pack` to take a "cruft_expiration" parameter, instead of reading a single static variable. The (sole) existing caller of `write_cruft_pack()` will pass the value for "cruft_expiration" filled in by option parsing, retaining existing behavior. This means that we can make the variable local to `cmd_repack()`, and eliminate the static declaration. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24builtin/repack.c: pass "out" to `prepare_pack_objects`Taylor Blau1-5/+6
`builtin/repack.c`'s `prepare_pack_objects()` is used to prepare a set of arguments to a `pack-objects` process which will generate a desired pack. A future patch will add an `--expire-to` option which allows `git repack` to write a cruft pack containing the pruned objects out to a separate repository. Prepare for this by teaching that function to write packs to an arbitrary location specified by the caller. All existing callers of `prepare_pack_objects()` will pass `packtmp` for `out`, retaining the existing behavior. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-21repack: drop remove_temporary_files()Jeff King1-32/+0
After we've successfully finished the repack, we call remove_temporary_files(), which looks for and removes any files matching ".tmp-$$-pack-*", where $$ is the pid of the current process. But this is pointless. If we make it this far in the process, we've already renamed these tempfiles into place, and there is nothing left to delete. Nor is there a point in trying to call it to clean up when we _aren't_ successful. It's not safe for using in a signal handler, and the previous commit already handed that job over to the tempfile API. It might seem like it would be useful to clean up stray .tmp files left by other invocations of git-repack. But it won't clean those files; it only matches ones with its pid, and leaves the rest. Fortunately, those are cleaned up naturally by successive calls to git-repack; we'll consider .tmp-*.pack the same as normal packfiles, so "repack -ad", etc, will roll up their contents and eventually delete them. The one case that could matter is if pack-objects generates an extension we don't know about, like ".tmp-pack-$$-$hash.some-new-ext". The current code will quietly delete such a file, while after this patch we'd leave it in place. In practice this doesn't happen, and would be indicative of a bug. Leaving the file as cruft is arguably a better behavior, as it means somebody is more likely to eventually notice and fix the bug. If we really wanted to be paranoid, we could scan for and warn about such files, but that seems like overkill. There's nothing to test with regard to the removal of this function. It was doing nothing, so the behavior should be the same. However, we can verify (and protect) our assumption that "repack -ad" will eventually remove stray files by adding a test for that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-21repack: use tempfiles for signal cleanupJeff King1-18/+8
When git-repack exits due to a signal, it tries to clean up by calling its remove_temporary_files() function, which walks through the packs dir looking for ".tmp-$$-pack-*" files to delete (where "$$" is the pid of the current process). The biggest problem here is that remove_temporary_files() is not safe to call in a signal handler. It uses opendir(), which isn't on the POSIX async-signal-safe list. The details will be platform-specific, but a likely issue is that it needs to allocate memory; if we receive a signal while inside malloc(), etc, we'll conflict on the allocator lock and deadlock with ourselves. We can fix this by just cleaning up the files directly, without walking the directory. We already know the complete list of .tmp-* files that were generated, because we recorded them via populate_pack_exts(). When we find files there, we can use register_tempfile() to record the filenames. If we receive a signal, then the tempfile API will clean them up for us, and it's async-safe and pretty battle-tested. Note that this is slightly racier than the existing scheme. We don't record the filenames until pack-objects tells us the hash over stdout. So during the period between it generating the file and reporting the hash, we'd fail to clean up. However, that period is very small. During most of the pack generation process pack-objects is using its own internal tempfiles. It's only at the very end that it moves them into the names git-repack expects, and then it immediately reports the name to us. Given that cleanup like this is best effort (after all, we may get SIGKILL), this level of race is acceptable. When we register the tempfiles, we'll record them locally and use the result to call rename_tempfile(), rather than renaming by hand. This isn't strictly necessary, as once we've renamed the files they're gone, and the tempfile API's cleanup unlink() would simply become a pointless noop. But managing the lifetimes of the tempfile objects is the cleanest thing to do, and the tempfile pointers naturally fill the same role as the old booleans. This patch also fixes another small problem. We only hook signals, and don't set up an atexit handler. So if we see an error that causes us to die(), we'll leave the .tmp-* files in place. But since the tempfile API handles this for us, this is now fixed for free. The new test covers this by stimulating a failure of pack-objects when generating a cruft pack. Before this patch, the .tmp-* file for the main pack would have been left, but now we correctly clean it up. Two small subtleties on the implementation: - in the renaming loop, we can stop re-constructing fname_old; we only use it when we have a tempfile to rename, so we can just ask the tempfile for its path (which, barring bugs, should be identical) - when renaming fails, our error message mentions fname_old. But since a failed rename_tempfile() invalidates the tempfile struct, we'll lose access to that string. Instead, let's mention the destination filename, which is what most other callers do. Reported-by: Jan Pokorný <poki@fnusa.cz> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-21repack: expand error message for missing pack filesJeff King1-1/+2
If pack-objects tells us it generated pack $hash, we expect to find .tmp-$$-pack-$hash.pack, .idx, .rev, and so on. Some of these files are optional, but others are not. For the required ones, we'll bail with an error if any of them is missing. The error message is just "missing required file", which is a bit vague. We should be more clear that it is not the user's fault, but rather that the sub-pgoram we called is not operating as expected. In practice, nobody should ever see this message, as it would generally only be caused by a bug in Git. It probably doesn't make sense to convert this to a BUG(), though, as there are other (unlikely) possibilities, such as somebody else racily deleting the files, filesystem errors causing stat() to fail, and so on. A nice side effect here is that we stop relying on fname_old in this code path, which will let us deal with it only in the first part of the conditional. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-21repack: populate extension bits incrementallyJeff King1-6/+9
After generating the main pack and then any additional cruft packs, we iterate over the "names" list (which contains hashes of packs generated by pack-objects), and call populate_pack_exts() for each. There's one small problem with this. In repack_promisor_objects(), we may add entries to "names" and call populate_pack_exts() for them. Calling it again is mostly just wasteful, as we'll stat() the filename with each possible extension, get the same result, and just overwrite our bits. So we could drop the call there, and leave the final loop to populate all of the bits. But instead, this patch does the reverse: drops the final loop, and teaches the other two sites to populate the bits as they add entries. This makes the code easier to reason about, as you never have to worry about when the util field is valid; it is always valid for each entry. It also serves my ulterior purpose: recording the generated filenames as soon as possible will make it easier for a future patch to use them for cleaning up from a failed operation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-21repack: convert "names" util bitfield to arrayJeff King1-8/+14
We keep a string_list "names" containing the hashes of packs generated on our behalf by pack-objects. The util field of each item is treated as a bitfield that tells us which extensions (.pack, .idx, .rev, etc) are present for each name. Let's switch this to allocating a real array. That will give us room in a future patch to store more data than just a single bit per extension. And it makes the code a little easier to read, as we avoid casting back and forth between uintptr_t and a void pointer. Since the only thing we're storing is an array, we could just allocate it directly. But instead I've put it into a named struct here. That further increases readability around the casts, and in particular helps differentiate us from other string_lists in the same file which use their util field differently. E.g., the existing_*_packs lists still do bit-twiddling, but their bits have different meaning than the ones in "names". This makes it hard to grep around the code to see how the util fields are used; now you can look for "generated_pack_data". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-17repack: don't remove .keep packs with `--pack-kept-objects`Taylor Blau1-0/+5
`git repack` supports a `--pack-kept-objects` flag which more or less translates to whether or not we pass `--honor-pack-keep` down to `git pack-objects` when assembling a new pack. This behavior has existed since ee34a2bead (repack: add `repack.packKeptObjects` config var, 2014-03-03). In that commit, the documentation was extended to say: [...] Note that we still do not delete `.keep` packs after `pack-objects` finishes. Unfortunately, this is not the case when `--pack-kept-objects` is combined with a `--geometric` repack. When doing a geometric repack, we include `.keep` packs when enumerating available packs only when `pack_kept_objects` is set. So this all works fine when `--no-pack-kept-objects` (or similar) is given. Kept packs are excluded from the geometric roll-up, so when we go to delete redundant packs (with `-d`), no `.keep` packs appear "below the split" in our geometric progression. But when `--pack-kept-objects` is given, things can go awry. Namely, when a kept pack is included in the list of packs tracked by the `pack_geometry` struct *and* part of the pack roll-up, we will delete the `.keep` pack when we shouldn't. Note that this *doesn't* result in object corruption, since the `.keep` pack's objects are still present in the new pack. But the `.keep` pack itself is removed, which violates our promise from back in ee34a2bead. But there's more. Because `repack` computes the geometric roll-up independently from selecting which packs belong in a MIDX (with `--write-midx`), this can lead to odd behavior. Consider when a `.keep` pack appears below the geometric split (ie., its objects will be part of the new pack we generate). We'll write a MIDX containing the new pack along with the existing `.keep` pack. But because the `.keep` pack appears below the geometric split line, we'll (incorrectly) try to remove it. While this doesn't corrupt the repository, it does cause us to remove the MIDX we just wrote, since removing that pack would invalidate the new MIDX. Funny enough, this behavior became far less noticeable after e4d0c11c04 (repack: respect kept objects with '--write-midx -b', 2021-12-20), which made `pack_kept_objects` be enabled by default only when we were writing a non-MIDX bitmap. But e4d0c11c04 didn't resolve this bug, it just made it harder to notice unless callers explicitly passed `--pack-kept-objects`. The solution is to avoid trying to remove `.keep` packs during `--geometric` repacks, even when they appear below the geometric split line, which is the approach this patch implements. Co-authored-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-17builtin/repack.c: remove redundant pack-based bitmapsTaylor Blau1-0/+32
When we write a MIDX bitmap after repacking, it is possible that the repository would be left in a state with both pack- and multi-pack reachability bitmaps. This can occur, for instance, if a pack that was kept (either by having a .keep file, or during a geometric repack in which it is not rolled up) has a bitmap file, and the repack wrote a multi-pack index and bitmap. When loading a reachability bitmap for the repository, the multi-pack one is always preferred, so the pack-based one is redundant. Let's remove it unconditionally, even if '-d' isn't passed, since there is no practical reason to keep both around. The patch below does just that. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01git-compat-util.h: use "UNUSED", not "UNUSED(var)"Ævar Arnfjörð Bjarmason1-2/+2
As reported in [1] the "UNUSED(var)" macro introduced in 2174b8c75de (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of 9b240347543 (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19refs: mark unused each_ref_fn parametersJeff King1-2/+2
Functions used with for_each_ref(), etc, need to conform to the each_ref_fn interface. But most of them don't need every parameter; let's annotate the unused ones to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-06cocci: generalize "unused" rule to cover more than "strbuf"Ævar Arnfjörð Bjarmason1-2/+0
Generalize the newly added "unused.cocci" rule to find more than just "struct strbuf", let's have it find the same unused patterns for "struct string_list", as well as other code that uses similar-looking *_{release,clear,free}() and {release,clear,free}_*() functions. We're intentionally loose in accepting e.g. a "strbuf_init(&sb)" followed by a "string_list_clear(&sb, 0)". It's assumed that the compiler will catch any such invalid code, i.e. that our constructors/destructors don't take a "void *". See [1] for example of code that would be covered by the "get_worktrees()" part of this rule. We'd still need work that the series is based on (we were passing "worktrees" to a function), but could now do the change in [1] automatically. 1. https://lore.kernel.org/git/Yq6eJFUPPTv%2Fzc0o@coredump.intra.peff.net/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-17i18n: fix mismatched camelCase config variablesJiang Xin1-1/+1
Some config variables are combinations of multiple words, and we typically write them in camelCase forms in manpage and translatable strings. It's not easy to find mismatches for these camelCase config variables during code reviews, but occasionally they are identified during localization translations. To check for mismatched config variables, I introduced a new feature in the helper program for localization[^1]. The following mismatched config variables have been identified by running the helper program, such as "git-po-helper check-pot". Lowercase in manpage should use camelCase: * Documentation/config/http.txt: http.pinnedpubkey Lowercase in translable strings should use camelCase: * builtin/fast-import.c: pack.indexversion * builtin/gc.c: gc.logexpiry * builtin/index-pack.c: pack.indexversion * builtin/pack-objects.c: pack.indexversion * builtin/repack.c: pack.writebitmaps * commit.c: i18n.commitencoding * gpg-interface.c: user.signingkey * http.c: http.postbuffer * submodule-config.c: submodule.fetchjobs Mismatched camelCases, choose the former: * Documentation/config/transfer.txt: transfer.credentialsInUrl remote.c: transfer.credentialsInURL [^1]: https://github.com/git-l10n/git-po-helper Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-03Merge branch 'tb/cruft-packs'Junio C Hamano1-24/+161
A mechanism to pack unreachable objects into a "cruft pack", instead of ejecting them into loose form to be reclaimed later, has been introduced. * tb/cruft-packs: sha1-file.c: don't freshen cruft packs builtin/gc.c: conditionally avoid pruning objects via loose builtin/repack.c: add cruft packs to MIDX during geometric repack builtin/repack.c: use named flags for existing_packs builtin/repack.c: allow configuring cruft pack generation builtin/repack.c: support generating a cruft pack builtin/pack-objects.c: --cruft with expiration reachable: report precise timestamps from objects in cruft packs reachable: add options to add_unseen_recent_objects_to_traversal builtin/pack-objects.c: --cruft without expiration builtin/pack-objects.c: return from create_object_entry() t/helper: add 'pack-mtimes' test-tool pack-mtimes: support writing pack .mtimes files chunk-format.h: extract oid_version() pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' pack-mtimes: support reading .mtimes files Documentation/technical: add cruft-packs.txt
2022-05-26builtin/repack.c: add cruft packs to MIDX during geometric repackTaylor Blau1-3/+20
When using cruft packs, the following race can occur when a geometric repack that writes a MIDX bitmap takes place afterwords: - First, create an unreachable object and do an all-into-one cruft repack which stores that object in the repository's cruft pack. - Then make that object reachable. - Finally, do a geometric repack and write a MIDX bitmap. Assuming that we are sufficiently unlucky as to select a commit from the MIDX which reaches that object for bitmapping, then the `git multi-pack-index` process will complain that that object is missing. The reason is because we don't include cruft packs in the MIDX when doing a geometric repack. Since the "make that object reachable" doesn't necessarily mean that we'll create a new copy of that object in one of the packs that will get rolled up as part of a geometric repack, it's possible that the MIDX won't see any copies of that now-reachable object. Of course, it's desirable to avoid including cruft packs in the MIDX because it causes the MIDX to store a bunch of objects which are likely to get thrown away. But excluding that pack does open us up to the above race. This patch demonstrates the bug, and resolves it by including cruft packs in the MIDX even when doing a geometric repack. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26builtin/repack.c: use named flags for existing_packsTaylor Blau1-3/+6
We use the `util` pointer for items in the `existing_packs` string list to indicate which packs are going to be deleted. Since that has so far been the only use of that `util` pointer, we just set it to 0 or 1. But we're going to add an additional state to this field in the next patch, so prepare for that by adding a #define for the first bit so we can more expressively inspect the flags state. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26builtin/repack.c: allow configuring cruft pack generationTaylor Blau1-14/+35
In servers which set the pack.window configuration to a large value, we can wind up spending quite a lot of time finding new bases when breaking delta chains between reachable and unreachable objects while generating a cruft pack. Introduce a handful of `repack.cruft*` configuration variables to control the parameters used by pack-objects when generating a cruft pack. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26builtin/repack.c: support generating a cruft packTaylor Blau1-5/+100
Expose a way to split the contents of a repository into a main and cruft pack when doing an all-into-one repack with `git repack --cruft -d`, and a complementary configuration variable. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26pack-mtimes: support reading .mtimes filesTaylor Blau1-0/+1
To store the individual mtimes of objects in a cruft pack, introduce a new `.mtimes` format that can optionally accompany a single pack in the repository. The format is defined in Documentation/technical/pack-format.txt, and stores a 4-byte network order timestamp for each object in name (index) order. This patch prepares for cruft packs by defining the `.mtimes` format, and introducing a basic API that callers can use to read out individual mtimes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20builtin/repack.c: ensure that `names` is sortedTaylor Blau1-1/+2
The previous patch demonstrates a scenario where the list of packs written by `pack-objects` (and stored in the `names` string_list) is out-of-order, and can thus cause us to delete packs we shouldn't. This patch resolves that bug by ensuring that `names` is sorted in all cases, not just when delete_redundant && pack_everything & ALL_INTO_ONE is true. Because we did sort `names` in that case (which, prior to `--geometric` repacks, was the only time we would actually delete packs, this is only a bug for `--geometric` repacks. It would be sufficient to only sort `names` when `delete_redundant` is set to a non-zero value. But sorting a small list of strings is cheap, and it is defensive against future calls to `string_list_has_string()` on this list. Co-discovered-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20repack: respect --keep-pack with geometric repackVictoria Dye1-11/+35
Update 'repack' to ignore packs named on the command line with the '--keep-pack' option. Specifically, modify 'init_pack_geometry()' to treat command line-kept packs the same way it treats packs with an on-disk '.keep' file (that is, skip the pack and do not include it in the 'geometry' structure). Without this handling, a '--keep-pack' pack would be included in the 'geometry' structure. If the pack is *before* the geometry split line (with at least one other pack and/or loose objects present), 'repack' assumes the pack's contents are "rolled up" into another pack via 'pack-objects'. However, because the internally-invoked 'pack-objects' properly excludes '--keep-pack' objects, any new pack it creates will not contain the kept objects. Finally, 'repack' deletes the '--keep-pack' as "redundant" (since it assumes 'pack-objects' created a new pack with its contents), resulting in possible object loss and repository corruption. Add a test ensuring that '--keep-pack' packs are now appropriately handled. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-14repack: add config to skip updating server infoPatrick Steinhardt1-1/+5
By default, git-repack(1) will update server info that is required by the dumb HTTP transport. This can be skipped by passing the `-n` flag, but what we're noticably missing is a config option to permanently disable updating this information. Add a new option "repack.updateServerInfo" which can be used to disable the logic. Most hosting providers have turned off the dumb HTTP protocol anyway, and on the client-side it woudln't typically be useful either. Giving a persistent way to disable this feature thus makes quite some sense to avoid wasting compute cycles and storage. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-14repack: refactor to avoid double-negation of update-server-infoPatrick Steinhardt1-4/+4
By default, git-repack(1) runs `update_server_info()` to generate info required for the dumb HTTP protocol. This can be disabled via the `-n` flag, which then sets the `no_update_server_info` flag. Further down the code this leads to some double-negation logic, which is about to become more confusing as we're about to add a new config which allows the user to permanently disable generation of the info. Refactor the code to avoid the double-negation and add some tests which verify that the flag continues to work as expected. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-10Merge branch 'ja/i18n-similar-messages'Junio C Hamano1-2/+2
Similar message templates have been consolidated so that translators need to work on fewer number of messages. * ja/i18n-similar-messages: i18n: turn even more messages into "cannot be used together" ones i18n: ref-filter: factorize "%(foo) atom used without %(bar) atom" i18n: factorize "--foo outside a repository" i18n: refactor "unrecognized %(foo) argument" strings i18n: factorize "no directory given for --foo" i18n: factorize "--foo requires --bar" and the like i18n: tag.c factorize i18n strings i18n: standardize "cannot open" and "cannot read" i18n: turn "options are incompatible" into "cannot be used together" i18n: refactor "%s, %s and %s are mutually exclusive" i18n: refactor "foo and bar are mutually exclusive"
2022-01-05Merge branch 'ds/repack-fixlets'Junio C Hamano1-3/+5
Two fixes around "git repack". * ds/repack-fixlets: repack: make '--quiet' disable progress repack: respect kept objects with '--write-midx -b'
2022-01-05i18n: turn "options are incompatible" into "cannot be used together"Jean-Noël Avila1-2/+2
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-20repack: make '--quiet' disable progressDerrick Stolee1-2/+4
While testing some ideas in 'git repack', I ran it with '--quiet' and discovered that some progress output was still shown. Specifically, the output for writing the multi-pack-index showed the progress. The 'show_progress' variable in cmd_repack() is initialized with isatty(2) and is not modified at all by the '--quiet' flag. The '--quiet' flag modifies the po_args.quiet option which is translated into a '--quiet' flag for the 'git pack-objects' child process. However, 'show_progress' is used to directly send progress information to the multi-pack-index writing logic which does not use a child process. The fix here is to modify 'show_progress' to be false if po_opts.quiet is true, and isatty(2) otherwise. This new expectation simplifies a later condition that checks both. Update the documentation to make it clear that '-q' will disable all progress in addition to ensuring the 'git pack-objects' child process will receive the flag. Use 'test_terminal' to check that this works to get around the isatty(2) check. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-20repack: respect kept objects with '--write-midx -b'Derrick Stolee1-1/+1
Historically, we needed a single packfile in order to have reachability bitmaps. This introduced logic that when 'git repack' had a '-b' option that we should stop sending the '--honor-pack-keep' option to the 'git pack-objects' child process, ensuring that we create a packfile containing all reachable objects. In the world of multi-pack-index bitmaps, we no longer need to repack all objects into a single pack to have valid bitmaps. Thus, we should continue sending the '--honor-pack-keep' flag to 'git pack-objects'. The fix is very simple: only disable the flag when writing bitmaps but also _not_ writing the multi-pack-index. This opens the door to new repacking strategies that might want to keep some historical set of objects in a stable pack-file while only repacking more recent objects. To test, create a new 'test_subcommand_inexact' helper that is more flexible than 'test_subcommand'. This allows us to look for the --honor-pack-keep flag without over-indexing on the exact set of arguments. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-10Merge branch 'po/size-t-for-vs'Junio C Hamano1-1/+1
On platforms where ulong is shorter than size_t, code paths that shifted 1 or 1U to the left lacked the necessary cast to size_t, which have been corrected. * po/size-t-for-vs: object-file.c: LLP64 compatibility, upcast unity for left shift diffcore-delta.c: LLP64 compatibility, upcast unity for left shift repack.c: LLP64 compatibility, upcast unity for left shift
2021-12-01repack.c: LLP64 compatibility, upcast unity for left shiftPhilip Oakley1-1/+1
Visual Studio reports C4334 "was 64-bit shift intended" warning because of size mismatch. Promote unity to the matching type to fit with the `&` operator. Signed-off-by: Philip Oakley <philipoakley@iee.email> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-28builtin/repack.c: avoid leaking child argumentsTaylor Blau1-1/+3
`git repack` invokes a handful of child processes: one to write the actual pack, and optionally ones to repack promisor objects and update the MIDX. Most of these are freed automatically by calling `start_command()` (which invokes it on error) and `finish_command()` which calls it automatically. But repack_promisor_objects() can initialize a child_process, populate its array of arguments, and then return from the function before even calling start_command(). Make sure that the prepared list of arguments is freed by calling child_process_clear() ourselves to avoid leaking memory along this path. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-18Merge branch 'tb/repack-write-midx'Junio C Hamano1-34/+248
"git repack" has been taught to generate multi-pack reachability bitmaps. * tb/repack-write-midx: test-read-midx: fix leak of bitmap_index struct builtin/repack.c: pass `--refs-snapshot` when writing bitmaps builtin/repack.c: make largest pack preferred builtin/repack.c: support writing a MIDX while repacking builtin/repack.c: extract showing progress to a variable builtin/repack.c: rename variables that deal with non-kept packs builtin/repack.c: keep track of existing packs unconditionally midx: preliminary support for `--refs-snapshot` builtin/multi-pack-index.c: support `--stdin-packs` mode midx: expose `write_midx_file_only()` publicly