git - Mirror of https://git.kernel.org/pub/scm/git/git.git/

Age	Commit message (Collapse)	Author	Files	Lines
2024-11-24	Makefile(s): avoid recipe prefix in conditional statements	Taylor Blau	1	-2/+2
	In GNU Make commit 07fcee35 ([SV 64815] Recipe lines cannot contain conditional statements, 2023-05-22) and following, conditional statements may no longer be preceded by a tab character (which Make refers to as the recipe prefix). There are a handful of spots in our various Makefile(s) which will break in a future release of Make containing 07fcee35. For instance, trying to compile the pre-image of this patch with the tip of make.git results in the following: $ make -v \| head -1 && make GNU Make 4.4.90 config.mak.uname:842: *** missing 'endif'. Stop. The kernel addressed this issue in 82175d1f9430 (kbuild: Replace tabs with spaces when followed by conditionals, 2024-01-28). Address the issues in Git's tree by applying the same strategy. When a conditional word (ifeq, ifneq, ifdef, etc.) is preceded by one or more tab characters, replace each tab character with 8 space characters with the following: find . -type f -not -path './.git/' -name Makefile -or -name '.mak' \| xargs perl -i -pe ' s/(\t+)(ifn?eq\|ifn?def\|else\|endif)/" " x (length($1) * 8) . $2/ge unless /\\$/ ' The "unless /\\$/" removes any false-positives (like "\telse \" appearing within a shell script as part of a recipe). After doing so, Git compiles on newer versions of Make: $ make -v \| head -1 && make GNU Make 4.4.90 GIT_VERSION = 2.44.0.414.gfac1dc44ca9 [...] $ echo $? 0 Reported-by: Dario Gjorgjevski <dario.gjorgjevski@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Cherry-picked-from: 728b9ac0c3b93aaa4ea80280c591deb198051785 Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2024-11-24	doc: switch links to https	Josh Soref	1	-1/+1
	These sites offer https versions of their content. Using the https versions provides some protection for users. Signed-off-by: Josh Soref <jsoref@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Cherry-picked-from: d05b08cd52cfda627f1d865bdfe6040a2c9521b5 Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2024-11-24	doc: update links to current pages	Josh Soref	1	-1/+1
	It's somewhat traditional to respect sites' self-identification. Signed-off-by: Josh Soref <jsoref@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Cherry-picked-from: 65175d9ea26bebeb9d69977d0e75efc0e88dbced Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2024-11-22	The eleventh batch	Junio C Hamano	1	-0/+14
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	t: remove TEST_PASSES_SANITIZE_LEAK annotations	Patrick Steinhardt	928	-928/+0
	Now that the default value for TEST_PASSES_SANITIZE_LEAK is `true` there is no longer a need to have that variable declared in all of our tests. Drop it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	test-lib: unconditionally enable leak checking	Patrick Steinhardt	4	-97/+1
	Over the last two releases we have plugged a couple hundred of memory leaks exposed by the Git test suite. With the preceding commits we have finally fixed the last leak exposed by our test suite, which means that we are now basically leak free wherever we have branch coverage. From hereon, the Git test suite should ideally stay free of memory leaks. Most importantly, any test suite that is being added should automatically be subject to the leak checker, and if that test does not pass it is a strong signal that the added code introduced new memory leaks and should not be accepted without further changes. Drop the infrastructure around TEST_PASSES_SANITIZE_LEAK to reflect this new requirement. Like this, all test suites will be subject to the leak checker by default. This is being intentionally strict, but we still have an escape hatch: the SANITIZE_LEAK prerequisite. There is one known case in t5601 where the leak sanitizer itself is buggy, so adding this prereq in such cases is acceptable. Another acceptable situation is when a newly added test uncovers preexisting memory leaks: when fixing that memory leak would be sufficiently complicated it is fine to annotate and document the leak accordingly. But in any case, the burden is now on the patch author to explain why exactly they have to add the SANITIZE_LEAK prerequisite. The TEST_PASSES_SANITIZE_LEAK annotations will be dropped in the next patch. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	t: remove unneeded !SANITIZE_LEAK prerequisites	Patrick Steinhardt	3	-11/+11
	We have a couple of !SANITIZE_LEAK prerequisites for tests that used to fail due to memory leaks. These have all been fixed by now, so let's drop the prerequisite. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	t: mark some tests as leak free	Patrick Steinhardt	2	-0/+2
	Both t5558 and t5601 are leak-free starting with 6dab49b9fb (bundle-uri: plug leak in unbundle_from_file(), 2024-10-10). Mark them accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	t5601: work around leak sanitizer issue	Patrick Steinhardt	1	-11/+15
	When running t5601 with the leak checker enabled we can see a hang in our CI systems. This hang seems to be system-specific, as I cannot reproduce it on my own machine. As it turns out, the issue is in those testcases that exercise cloning of `~repo`-style paths. All of the testcases that hang eventually end up interpreting "repo" as the username and will call getpwnam(3p) with that username. That should of course be fine, and getpwnam(3p) should just return an error. But instead, the leak sanitizer seems to be recursing while handling a call to `free()` in the NSS modules: #0 0x00007ffff7fd98d5 in _dl_update_slotinfo (req_modid=1, new_gen=2) at ../elf/dl-tls.c:720 #1 0x00007ffff7fd9ac4 in update_get_addr (ti=0x7ffff7a91d80, gen=<optimized out>) at ../elf/dl-tls.c:916 #2 0x00007ffff7fdc85c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55 #3 0x00007ffff7a27e04 in __lsan::GetAllocatorCache () at ../../../../src/libsanitizer/lsan/lsan_linux.cpp:27 #4 0x00007ffff7a2b33a in __lsan::Deallocate (p=0x0) at ../../../../src/libsanitizer/lsan/lsan_allocator.cpp:127 #5 __lsan::lsan_free (p=0x0) at ../../../../src/libsanitizer/lsan/lsan_allocator.cpp:220 ... #261505 0x00007ffff7fd99f2 in free (ptr=<optimized out>) at ../include/rtld-malloc.h:50 #261506 _dl_update_slotinfo (req_modid=1, new_gen=2) at ../elf/dl-tls.c:822 #261507 0x00007ffff7fd9ac4 in update_get_addr (ti=0x7ffff7a91d80, gen=<optimized out>) at ../elf/dl-tls.c:916 #261508 0x00007ffff7fdc85c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55 #261509 0x00007ffff7a27e04 in __lsan::GetAllocatorCache () at ../../../../src/libsanitizer/lsan/lsan_linux.cpp:27 #261510 0x00007ffff7a2b33a in __lsan::Deallocate (p=0x5020000001e0) at ../../../../src/libsanitizer/lsan/lsan_allocator.cpp:127 #261511 __lsan::lsan_free (p=0x5020000001e0) at ../../../../src/libsanitizer/lsan/lsan_allocator.cpp:220 #261512 0x00007ffff793da25 in module_load (module=0x515000000280) at ./nss/nss_module.c:188 #261513 0x00007ffff793dee5 in __nss_module_load (module=0x515000000280) at ./nss/nss_module.c:302 #261514 __nss_module_get_function (module=0x515000000280, name=name@entry=0x7ffff79b9128 "getpwnam_r") at ./nss/nss_module.c:328 #261515 0x00007ffff793e741 in __GI___nss_lookup_function (fct_name=<optimized out>, ni=<optimized out>) at ./nss/nsswitch.c:137 #261516 __GI___nss_next2 (ni=ni@entry=0x7fffffffa458, fct_name=fct_name@entry=0x7ffff79b9128 "getpwnam_r", fct2_name=fct2_name@entry=0x0, fctp=fctp@entry=0x7fffffffa460, status=status@entry=0, all_values=all_values@entry=0) at ./nss/nsswitch.c:120 #261517 0x00007ffff794c6a7 in __getpwnam_r (name=name@entry=0x501000000060 "repo", resbuf=resbuf@entry=0x7ffff79fb320 <resbuf>, buffer=<optimized out>, buflen=buflen@entry=1024, result=result@entry=0x7fffffffa4b0) at ../nss/getXXbyYY_r.c:343 #261518 0x00007ffff794c4d8 in getpwnam (name=0x501000000060 "repo") at ../nss/getXXbyYY.c:140 #261519 0x00005555557e37ff in getpw_str (username=0x5020000001a1 "repo", len=4) at path.c:613 #261520 0x00005555557e3937 in interpolate_path (path=0x5020000001a0 "~repo", real_home=0) at path.c:654 #261521 0x00005555557e3aea in enter_repo (path=0x501000000040 "~repo", strict=0) at path.c:718 #261522 0x000055555568f0ba in cmd_upload_pack (argc=1, argv=0x502000000100, prefix=0x0, repo=0x0) at builtin/upload-pack.c:57 #261523 0x0000555555575ba8 in run_builtin (p=0x555555a20c98 <commands+3192>, argc=2, argv=0x502000000100, repo=0x555555a53b20 <the_repo>) at git.c:481 #261524 0x0000555555576067 in handle_builtin (args=0x7fffffffaab0) at git.c:742 #261525 0x000055555557678d in cmd_main (argc=2, argv=0x7fffffffac58) at git.c:912 #261526 0x00005555556963cd in main (argc=2, argv=0x7fffffffac58) at common-main.c:64 Note that this stack is more than 260000 function calls deep. Run under the debugger this will eventually segfault, but in our CI systems it seems like this just hangs forever. I assume that this is a bug either in the leak sanitizer or in glibc, as I cannot reproduce it on my machine. In any case, let's work around the bug for now by marking those tests with the "!SANITIZE_LEAK" prereq. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	git-compat-util: drop now-unused `UNLEAK()` macro	Patrick Steinhardt	2	-35/+0
	The `UNLEAK()` macro has been introduced with 0e5bba53af (add UNLEAK annotation for reducing leak false positives, 2017-09-08) to help us reduce the amount of reported memory leaks in cases we don't care about, e.g. when exiting immediately afterwards. We have since removed all of its users in favor of freeing the memory and thus don't need the macro anymore. Remove it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	global: drop `UNLEAK()` annotation	Patrick Steinhardt	2	-2/+0
	There are two users of `UNLEAK()` left in our codebase: - In "builtin/clone.c", annotating the `repo` variable. That leak has already been fixed though as you can see in the context, where we do know to free `repo_to_free`. - In "builtin/diff.c", to unleak entries of the `blob[]` array. That leak has also been fixed, because the entries we assign to that array come from `rev.pending.objects`, and we do eventually release `rev`. This neatly demonstrates one of the issues with `UNLEAK()`: it is quite easy for the annotation to become stale. A second issue is that its whole intent is to paper over leaks. And while that has been a necessary evil in the past, because Git was leaking left and right, it isn't really much of an issue nowadays where our test suite has no known leaks anymore. Remove the last two users of this macro. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	t/helper: fix leaking commit graph in "read-graph" subcommand	Patrick Steinhardt	1	-2/+1
	We're leaking the commit-graph in the "test-helper read-graph" subcommand, but as the leak is annotated with `UNLEAK()` the leak sanitizer doesn't complain. Fix the leak by calling `free_commit_graph()`. Besides getting rid of the `UNLEAK()` annotation, it also increases code coverage because we properly release resources as Git would do it, as well. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	builtin/branch: fix leaking sorting options	Patrick Steinhardt	1	-11/+22
	The sorting options are leaking, but given that they are marked with `UNLEAK()` the leak sanitizer doesn't complain. Fix the leak by creating a common exit path and clearing the vector such that we can get rid of the `UNLEAK()` annotation entirely. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	builtin/init-db: fix leaking directory paths	Patrick Steinhardt	1	-15/+19
	We've got a couple of leaking directory paths in git-init(1), all of which are marked with `UNLEAK()`. Fixing them is trivial, so let's do that instead so that we can get rid of `UNLEAK()` entirely. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	builtin/help: fix leaks in `check_git_cmd()`	Patrick Steinhardt	1	-6/+7
	The `check_git_cmd()` function is declared to return a string constant. And while it sometimes does return a constant, it may also return an allocated string in two cases: - When handling aliases. This case is already marked with `UNLEAK()` to work around the leak. - When handling unknown commands in case "help.autocorrect" is enabled. This one is not marked with `UNLEAK()`. The function only has a single caller, so let's fix its return type to be non-constant, consistently return an allocated string and free it at its callsite to plug the leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	help: fix leaking return value from `help_unknown_cmd()`	Patrick Steinhardt	3	-6/+7
	While `help_unknown_cmd()` would usually die on an unknown command, it instead returns an autocorrected command when "help.autocorrect" is set. But while the function is declared to return a string constant, it actually returns an allocated string in that case. Callers thus aren't aware that they have to free the string, leading to a memory leak. Fix the function return type to be non-constant and free the returned value at its only callsite. Note that we cannot simply take ownership of `main_cmds.names[0]->name` and then eventually free it. This is because the `struct cmdname` is using a flex array to allocate the name, so the name pointer points into the middle of the structure and thus cannot be freed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	help: fix leaking `struct cmdnames`	Patrick Steinhardt	1	-0/+4
	We're populating multiple `struct cmdnames`, but don't ever free them. Plug this memory leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	help: refactor to not use globals for reading config	Patrick Steinhardt	1	-23/+24
	We're reading the "help.autocorrect" and "alias.*" configuration into global variables, which makes it hard to manage their lifetime correctly. Refactor the code to use a struct instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	builtin/sparse-checkout: fix leaking sanitized patterns	Patrick Steinhardt	1	-22/+39
	Both `git sparse-checkout add` and `git sparse-checkout set` accept a list of additional directories or patterns. These get massaged via calls to `sanitize_paths()`, which may end up modifying the passed-in array by updating its pointers to be prefixed paths. This allocates memory that we never free. Refactor the code to instead use a `struct strvec`, which makes it way easier for us to track the lifetime correctly. The couple of extra memory allocations likely do not matter as we only ever populate it with command line arguments. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	split-index: fix memory leak in `move_cache_to_base_index()`	Patrick Steinhardt	2	-1/+6
	In `move_cache_to_base_index()` we move the index cache of the main index into the split index, which is used when writing a shared index. But we don't release the old split index base in case we already had a split index before this operation, which can thus leak memory. Plug the leak by releasing the previous base. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	git: refactor builtin handling to use a `struct strvec`	Patrick Steinhardt	2	-36/+32
	Similar as with the preceding commit, `handle_builtin()` does not properly track lifetimes of the `argv` array and its strings. As it may end up modifying the array this can lead to memory leaks in case it contains allocated strings. Refactor the function to use a `struct strvec` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	git: refactor alias handling to use a `struct strvec`	Patrick Steinhardt	2	-26/+33
	In `handle_alias()` we use both `argcp` and `argv` as in-out parameters. Callers mostly pass through the static array from `main()`, but once we handle an alias we replace it with an allocated array that may contain some allocated strings. Callers do not handle this scenario at all and thus leak memory. We could in theory handle the lifetime of `argv` in a hacky fashion by letting callers free it in case they see that an alias was handled. But while that would likely work, we still wouldn't be able to easily handle the lifetime of strings referenced by `argv`. Refactor the code to instead use a `struct strvec`, which effectively removes the need for us to manually track lifetimes. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	strvec: introduce new `strvec_splice()` function	Patrick Steinhardt	3	-0/+93
	Introduce a new `strvec_splice()` function that can replace a range of strings in the vector with another array of strings. This function will be used in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	line-log: fix leak when rewriting commit parents	Patrick Steinhardt	2	-0/+2
	In `process_ranges_merge_commit()` we try to figure out which of the parents can be blamed for the given line changes. When we figure out that none of the files in the line-log have changed we assign the complete blame to that commit and rewrite the parents of the current commit to only use that single parent. This is done via `commit_list_append()`, which is misleadingly _not_ appending to the list of parents. Instead, we overwrite the parents with the blamed parent. This makes us lose track of the old pointers, creating a memory leak. Fix this issue by freeing the parents before we overwrite them. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	bisect: fix various cases where we leak commit list items	Patrick Steinhardt	2	-8/+23
	There are various cases where we leak commit list items because we evict items from the list, but don't free them. Plug those. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	bisect: fix leaking commit list items in `check_merge_base()`	Patrick Steinhardt	1	-2/+2
	While we free the result commit list at the end of `check_merge_base()`, we forget to free any items that we have already iterated over. Fix this by using a separate variable to iterate through them. This leak is exposed by t6030, but plugging it does not make the whole test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	bisect: fix multiple leaks in `bisect_next_all()`	Patrick Steinhardt	1	-2/+3
	There are multiple leaks in `bisect_next_all()`. For one we don't free the `tried` commit list. Second, one of the branches uses a direct return instead of jumping to the cleanup code. Fix these by freeing the commit list and converting the return to a goto. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	bisect: fix leaking `current_bad_oid`	Patrick Steinhardt	1	-0/+1
	When reading bisect refs we read the reference mapping to the "bad" term into the global `current_bad_oid` variable. This is an allocated string, but because it is global we never have to free it. This changes though when `register_ref()` is being called multiple times, at which point we'll overwrite the previous pointer and thus make it unreachable. Fix this issue by freeing the previous value. This leak is exposed by t6030, but plugging it does not make the whole test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	bisect: fix leaking string in `handle_bad_merge_base()`	Patrick Steinhardt	1	-0/+2
	When handling a bad merge base we print an error, which includes the set of good revisions joined by spaces. This string is allocated, but never freed. Fix this memory leak. Note that the local `bad_hex` varible also looks like a string that we should free. But in fact, `oid_to_hex()` returns an address to a static variable even though it is declared to return a non-constant string. The function signature is thus quite misleading and really should be fixed, but doing so is outside of the scope of this patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	bisect: fix leaking good/bad terms when reading multipe times	Patrick Steinhardt	3	-8/+12
	Even though `read_bisect_terms()` is declared as assigning string constants, it in fact assigns allocated strings to the `read_bad` and `read_good` out parameters. The only callers of this function assign the result to global variables and thus don't have to free them in order to be leak-free. But that changes when executing the function multiple times because we'd then overwrite the previous value and thus make it unreachable. Fix the function signature and free the previous values. This leak is exposed by t0630, but plugging it does not make the whole test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21	builtin/blame: fix leaking blame entries with `--incremental`	Patrick Steinhardt	4	-7/+10
	When passing `--incremental` to git-blame(1) we exit early by jumping to the `cleanup` label. But some of the cleanups we perform are handled between the `goto` and its label, and thus we leak the data. Move the cleanups after the `cleanup` label. While at it, move the logic to free the scoreboard's `final_buf` into `cleanup_scoreboard()` and drop its `const` declaration. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-20	The tenth batch	Junio C Hamano	1	-5/+10
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-20	Prepare for 2.47.1	Junio C Hamano	3	-2/+28
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-16	Clean up RelNotes for 2.48	Junio C Hamano	1	-15/+0
	There somehow ended up too many bogus "merge X later to maint" comments for topics that cannot be merged ever down to 'maint' because they were forked from more recent integration branches in the draft release notes. Remove them, as they are inviting for mistakes later. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-15	pack-objects: only perform verbatim reuse on the preferred pack	Taylor Blau	2	-56/+41
	When reusing objects from source pack(s), write_reused_pack_verbatim() is responsible for reusing objects whole eword_t's at a time. It works by taking the longest continuous run of objects from the beginning of each source pack that the caller wants, and reuses the entirety of that section from each pack. This is based on the assumption that we don't have any gaps within the region. This assumption relieves us from having to patch any OFS_DELTAs, since we know that there aren't any gaps between any delta and its base in that region. To illustrate why this assumption is necessary, suppose we have some pack P, which has objects X, Y, and Z. If the MIDX's copy of Y was selected from a pack other than P, then the bit corresponding to object Y will appear earlier in the bitmap than the bits corresponding to X and Z. If pack-objects already has or will use the copy of Y from the pack it was selected from in the MIDX, then it is an error to reuse all objects between X and Z in the source pack. Doing so will cause us to reuse Y from a different pack than the one which represents Y in the MIDX, causing us to either: - include the object twice, assuming that the caller wants Y in the pack, or - include the object once, resulting in us packing more objects than necessary. This regression comes from ca0fd69e37 (pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse, 2023-12-14), which incorrectly assumed that there would be no gaps in reusable regions of non-preferred packs. Instead, we can only safely perform the whole-word reuse optimization on the preferred pack, where we know with certainty that no gaps exist in that region of the bitmap. We can still reuse objects from non-preferred packs, but we have to inspect them individually in write_reused_pack() to ensure that any gaps that may exist are accounted for. This allows us to simplify the implementation of write_reused_pack_verbatim() back to almost its pre-multi-pack reuse form, since we can now assume that the beginning of the pack appears at the beginning of the bitmap, meaning that we don't have to account for any bits up to the first word boundary (like we had to special case in ca0fd69e37). The only significant changes from the pre-ca0fd69e37 implementation are: - that we can no longer inspect words up to the end of reuse_packfile_bitmap->word_alloc, since we only want to look at words whose bits all correspond to objects in the given packfile, and - that we return early when given a reuse_packfile which is not preferred, making the call a noop. In the future, it might be possible to restore this optimization if we could guarantee that some reuse packs don't contain any gaps by construction (similar to the "disjoint packs" idea in very early versions of multi-pack reuse). Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-15	t5332-multi-pack-reuse.sh: demonstrate duplicate packing failure	Taylor Blau	1	-0/+22
	In the multi-pack reuse code, there are two paths for reusing the on-disk representation of an object, handled by: - builtin/pack-objects.c::write_reused_pack_one() - builtin/pack-objects.c::write_reused_pack_verbatim() The former is responsible for copying the bytes for a single object out of an existing source pack. The latter does the same but for a region of objects aligned at eword_t boundaries. Demonstrate a bug whereby write_reused_pack_verbatim() can be tricked into writing out objects from some source pack, even when those objects were selected from a different source pack in the MIDX bitmap. When the caller wants at least one of the objects in that region, pack-objects will write the same object twice as a result of this bug. In the other case where the caller doesn't want any of the objects in the region of interest, we will write out objects that weren't requested. Demonstrate this bug by creating two packs, where the preferred one of those packs contains a single object which also appears in the main (non-preferred) pack. A separate bug[^1] prevents us from triggering the main bug when the duplicated object is the last one in the main pack, but any earlier object will suffice. We could fix that separate bug, but the following commit will simplify write_reused_pack_verbatim() and only call it on the preferred pack, so doing so would have little point. [^1]: Because write_reused_pack_verbatim() only reuses bits in the range off_t pack_start_off = pack_pos_to_offset(reuse_packfile->p, 0); off_t pack_end_off = pack_pos_to_offset(reuse_packfile->p, pos - reuse_packfile->bitmap_pos); written += pos - reuse_packfile->bitmap_pos; /* We're recording one chunk, not one object. */ record_reused_object(pack_start_off, pack_start_off - (hashfile_total(out) - pack_start)); , or in other words excluding the object beginning at position 'pos - reuse_packfile->bitmap_pos' in the source pack. But since reuse_packfile->bitmap_pos is '1' in the non-preferred pack (accounting for the single-object pack which is preferred), we don't actually copy the bytes from the last object. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-14	test-lib: move malloc-debug setup after $PATH setup	Jeff King	1	-50/+50
	Originally, the conditional definition of the setup/teardown functions for malloc checking could be run at any time, because they depended only on command-line options and the system getconf function. But since 02d900361c (test-lib: check malloc debug LD_PRELOAD before using, 2024-11-11), we probe the system by running "git version". Since this code runs before we've set $PATH to point to the version of Git we intend to test, we actually run the system version of git. This mostly works, since what we really care about is whether the LD_PRELOAD works, and it should work the same with any program. But there are some corner cases: 1. You might not have a system git at all, in which case the preload will appear to fail, even though it could work with the actual built version of git. 2. Your system git could be linked in a different way. For example, if it was built statically, then it will ignore LD_PRELOAD entirely, and we might assume that the preload works, even though it might not when used with a dynamic build. We could give a more complete path to the version of Git we intend to test, but features like GIT_TEST_INSTALLED make that not entirely trivial. So instead, let's just bump the setup until after we've set up the $PATH. There's no need for us to do it early, as long as it is done before the first test runs. Reported-by: Toon Claes <toon@iotcl.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-13	The ninth batch	Junio C Hamano	1	-0/+15
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-13	builtin/difftool: intialize some hashmap variables	Simon Marchi	2	-4/+6
	When running a dir-diff command that produces no diff, variables `wt_modified` and `tmp_modified` are used while uninitialized, causing: $ /home/smarchi/src/git/git-difftool --dir-diff master free(): invalid pointer [1] 334004 IOT instruction (core dumped) /home/smarchi/src/git/git-difftool --dir-diff master $ valgrind --track-origins=yes /home/smarchi/src/git/git-difftool --dir-diff master ... Invalid free() / delete / delete[] / realloc() at 0x48478EF: free (vg_replace_malloc.c:989) by 0x422CAC: hashmap_clear_ (hashmap.c:208) by 0x283830: run_dir_diff (difftool.c:667) by 0x284103: cmd_difftool (difftool.c:801) by 0x238E0F: run_builtin (git.c:484) by 0x2392B9: handle_builtin (git.c:750) by 0x2399BC: cmd_main (git.c:921) by 0x356FEF: main (common-main.c:64) Address 0x1ffefff180 is on thread 1's stack in frame #2, created by run_dir_diff (difftool.c:358) ... If taking any `goto finish` path before these variables are initialized, `hashmap_clear_and_free()` operates on uninitialized data, sometimes causing a crash. This regression was introduced in 7f795a1715 (builtin/difftool: plug several trivial memory leaks, 2024-09-26). Fix it by initializing those variables with the `HASHMAP_INIT` macro. Add a test comparing the main branch to itself, resulting in no diff. Signed-off-by: Simon Marchi <simon.marchi@efficios.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12	refspec: store raw refspecs inside refspec_item	Jeff King	5	-31/+19
	The refspec struct keeps two matched arrays: one for the refspec_item structs and one for the original raw refspec strings. The main reason for this is that there are other users of refspec_item that do not care about the raw strings. But it does make managing the refspec struct awkward, as we must keep the two arrays in sync. This has led to bugs in the past (both leaks and double-frees). Let's just store a copy of the raw refspec string directly in each refspec_item struct. This simplifies the handling at a small cost: 1. Direct callers of refspec_item_init() will now get an extra copy of the refspec string, even if they don't need it. This should be negligible, as the struct is already allocating two strings for the parsed src/dst values (and we tend to only do it sparingly anyway for things like the TAG_REFSPEC literal). 2. Users of refspec_appendf() will now generate a temporary string, copy it, and then free the result (versus handing off ownership of the temporary string). We could get around this by having a "nodup" variant of refspec_item_init(), but it doesn't seem worth the extra complexity for something that is not remotely a hot code path. Code which accesses refspec->raw now needs to look at refspec->item.raw. Other callers which just use refspec_item directly can remain the same. We'll free the allocated string in refspec_item_clear(), which they should be calling anyway to free src/dst. One subtle note: refspec_item_init() can return an error, in which case we'll still have set its "raw" field. But that is also true of the "src" and "dst" fields, so any caller which does not _clear() the failed item is already potentially leaking. In practice most code just calls die() on an error anyway, but you can see the exception in valid_fetch_refspec(), which does correctly call _clear() even on error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12	refspec: drop separate raw_nr count	Jeff King	5	-15/+14
	A refspec struct contains zero or more refspec_item structs, along with matching "raw" strings. The items and raw strings are kept in separate arrays, but those arrays will always have the same length (because we write them only via refspec_append_nodup(), which grows both). This can lead to bugs when manipulating the array, since the arrays and lengths must be modified in lockstep. For example, the bug fixed in the previous commit, which forgot to decrement raw_nr. So let's get rid of "raw_nr" and have only "nr", making this kind of bug impossible (and also making it clear that the two are always matched, something that existing code already assumed but was not guaranteed by the interface). Even though we'd expect "alloc" and "raw_alloc" to likewise move in lockstep, we still need to keep separate counts there if we want to continue to use ALLOC_GROW() for both. Conceptually this would all be simpler if refspec_item just held onto its own raw string, and we had a single array. But there are callers which use refspec_item outside of "struct refspec" (and so don't hold on to a matching "raw" string at all), which we'd possibly need to adjust. So let's not worry about refactoring that for now, and just get rid of the redundant count variable. That is the first step on the road to combining them anyway. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12	fetch: adjust refspec->raw_nr when filtering prefetch refspecs	Jeff King	2	-0/+5
	In filter_prefetch_refspecs(), we may remove one or more refspecs if they point into refs/tags/. When we do, we remove the item from the refspec->items array, shifting subsequent items down, and then decrement the refspec->nr count. We also remove the item from the refspec->raw array, but fail to decrement refspec->raw_nr. This leaves us with a count that is too high, and anybody looking at the "raw" array will erroneously see either: 1. The removed entry, if there were no subsequent items to shift down. 2. A duplicate of the final entry, as everything is shifted down but there was nothing to overwrite the final item. The obvious culprit to run into this is calling refspec_clear(), which will try to free the removed entry (case 1) or double-free the final entry (case 2). But even though the bug has existed since the function was added in 2e03115d0c (fetch: add --prefetch option, 2021-04-16), we did not trigger it in the test suite. The --prefetch option is normally only used with configured refspecs, and we never bother to call refspec_clear() on those (they are stored as part of a struct remote, which is held in a global variable). But you could trigger case 2 manually like: git fetch --prefetch . refs/tags/foo refs/tags/bar Ironically you couldn't trigger case 1, because the code accidentally leaked the string in the raw array, and the two bugs (the leak and the double-free) cancelled out. But when we fixed the leak in ea4780307c (fetch: free "raw" string when shrinking refspec, 2024-09-24), it became possible to trigger that, too, with a single item: git fetch --prefetch . refs/tags/foo We can fix both cases by just correctly decrementing "raw_nr" when we shrink the array. Even though we don't expect people to use --prefetch with command-line refspecs, we'll add a test to make sure it behaves well (like the test just before it, we're just confirming that the filtered prefetch succeeds at all). Reported-by: Eric Mills <ermills@epic.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12	index-pack: repack local links into promisor packs	Jonathan Tan	4	-2/+172
	Teach index-pack to, when processing the objects in a pack with --promisor specified on the CLI, repack local objects (and the local objects that they refer to, recursively) referenced by these objects into promisor packs. This prevents the situation in which, when fetching from a promisor remote, we end up with promisor objects (newly fetched) referring to non-promisor objects (locally created prior to the fetch). This situation may arise if the client had previously pushed objects to the remote, for example. One issue that arises in this situation is that, if the non-promisor objects become inaccessible except through promisor objects (for example, if the branch pointing to them has moved to point to the promisor object that refers to them), then GC will garbage collect them. There are other ways to solve this, but the simplest seems to be to enforce the invariant that we don't have promisor objects referring to non-promisor objects. This repacking is done from index-pack to minimize the performance impact. During a fetch, the only time most objects are fully inflated in memory is when their object ID is computed, so we also scan the objects (to see which objects they refer to) during this time. Also to minimize the performance impact, an object is calculated to be local if it's a loose object or present in a non-promisor pack. (If it's also in a promisor pack or referred to by an object in a promisor pack, it is technically already a promisor object. But a misidentification of a promisor object as a non-promisor object is relatively benign here - we will thus repack that promisor object into a promisor pack, duplicating it in the object store, but there is no correctness issue, just an issue of inefficiency.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12	doc: git-add.txt: convert to new style convention	Jean-Noël Avila	2	-62/+68
	Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12	test-lib: check malloc debug LD_PRELOAD before using	Jeff King	1	-2/+5
	This fixes test failures across the suite on glibc platforms that don't have libc_malloc_debug.so.0. We added support for glibc's malloc checking routines long ago in a731fa916e (Add MALLOC_CHECK_ and MALLOC_PERTURB_ libc env to the test suite for detecting heap corruption, 2012-09-14). Back then we didn't need to do any checks to see if the platform supported it. We were just setting some environment variables which would either enable it or not. That changed in 131b94a10a (test-lib.sh: Use GLIBC_TUNABLES instead of MALLOC_CHECK_ on glibc >= 2.34, 2022-03-04). Now that glibc split this out into libc_malloc_debug.so, we have to add it to LD_PRELOAD. We only do that when we detect glibc, but it's possible to have glibc but not the malloc debug library. In that case LD_PRELOAD will complain to stderr, and tests which check for an empty stderr will fail. You can work around this by setting TEST_NO_MALLOC_CHECK, which disables the feature entirely. But it's not obvious to know you need to do that. Instead, since this malloc checking is best-effort anyway, let's just automatically disable it when the LD_PRELOAD appears not to work. We can check it by running something simple that should work (and produce nothing on stderr) like "git version". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-08	The eighth batch	Junio C Hamano	1	-0/+15
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-06	compat/mingw: support POSIX semantics for atomic renames	Patrick Steinhardt	2	-7/+88
	By default, Windows restricts access to files when those files have been opened by another process. As explained in the preceding commits, these restrictions can be loosened such that reads, writes and/or deletes of files with open handles _are_ allowed. While we set up those sharing flags in most relevant code paths now, we still don't properly handle POSIX-style atomic renames in case the target path is open. This is failure demonstrated by t0610, where one of our tests spawns concurrent writes in a reftable-enabled repository and expects all of them to succeed. This test fails most of the time because the process that has acquired the "tables.list" lock is unable to rename it into place while other processes are busy reading that file. Windows 10 has introduced the `FILE_RENAME_FLAG_POSIX_SEMANTICS` flag that allows us to fix this usecase [1]. When set, it is possible to rename a file over a preexisting file even when the target file still has handles open. Those handles must have been opened with the `FILE_SHARE_DELETE` flag, which we have ensured in the preceding commits. Careful readers might have noticed that [1] does not mention the above flag, but instead mentions `FILE_RENAME_POSIX_SEMANTICS`. This flag is not for use with `SetFileInformationByHandle()` though, which is what we use. And while the `FILE_RENAME_FLAG_POSIX_SEMANTICS` flag exists, it is not documented on [2] or anywhere else as far as I can tell. Unfortunately, we still support Windows systems older than Windows 10 that do not yet have this new flag. Our `_WIN32_WINNT` SDK version still targets 0x0600, which is Windows Vista and later. And even though that Windows version is out-of-support, bumping the SDK version all the way to 0x0A00, which is Windows 10 and later, is not an option as it would make it impossible to compile on Windows 8.1, which is still supported. Instead, we have to manually declare the relevant infrastructure to make this feature available and have fallback logic in place in case we run on a Windows version that does not yet have this flag. On another note: `mingw_rename()` has a retry loop that is used in case deleting a file failed because it's still open in another process. One might be pressed to not use this loop anymore when we can use POSIX semantics. But unfortunately, we have to keep it around due to our dependence on the `FILE_SHARE_DELETE` flag. While we know to set that sharing flag now, other applications may not do so and may thus still cause sharing violations when we try to rename a file. This fixes concurrent writes in the reftable backend as demonstrated in t0610, but may also end up fixing other usecases where Git wants to perform renames. [1]: https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_file_rename_information [2]: https://learn.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_rename_info Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-05	fetch-pack: die if in commit graph but not obj db	Jonathan Tan	2	-5/+18
	When fetching, there is a step in which sought objects are first checked against the local repository; only objects that are not in the local repository are then fetched. This check first looks up the commit graph file, and returns "present" if the object is in there. However, the action of first looking up the commit graph file is not done everywhere in Git, especially if the type of the object at the time of lookup is not known. This means that in a repo corruption situation, a user may encounter an "object missing" error, attempt to fetch it, and still encounter the same error later when they reattempt their original action, because the object is present in the commit graph file but not in the object DB. Therefore, make it a fatal error when this occurs. (Note that we cannot proceed to include this object in the list of objects to be fetched without changing at least the fetch negotiation code: what would happen is that the client will send "want X" and "have X" and when I tested at $DAYJOB with a work server that uses JGit, the server reasonably returned an empty packfile. And changing the fetch negotiation code to only use the object DB when deciding what to report as "have" would be an unnecessary slowdown, I think.) This was discovered when a lazy fetch of a missing commit completed with nothing actually fetched, and the writing of the commit graph file after every fetch then attempted to read said missing commit, triggering a lazy fetch of said missing commit, resulting in an infinite loop with no user-visible indication (until they check the list of processes running on their computer). With this fix, there is no infinite loop. Note that although the repo corruption we discovered was caused by a bug in GC in a partial clone, the behavior that this patch teaches Git to warn about applies to any repo with commit graph enabled and with a missing commit, whether it is a partial clone or not. t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in lookup_commit_in_graph(), 2022-07-01), tests that an interaction between fetch and the commit graph does not cause an infinite loop. This patch changes the exit code in that situation, so that test had to be changed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-05	Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"	Jonathan Tan	1	-18/+7
	This reverts commit a6e65fb39caf18259c660c1c7910d5bf80bc15cb. This revert simplifies the next patch in this patch set. The commit message of that commit mentions that the new function "will be used for the bundle-uri client in a subsequent commit", but it seems that eventually it wasn't used. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04	doc: correct misleading descriptions for --shallow-exclude	Elijah Newren	6	-9/+9
	The documentation for the --shallow-exclude option to clone/fetch/etc. claims that the option takes a revision, but it does not. As per upload-pack.c's process_deepen_not(), it passes the option to expand_ref() and dies if it does not find exactly one ref matching the name passed. Further, this has always been the case ever since these options were introduced by the commits merged in a460ea4a3cb1 (Merge branch 'nd/shallow-deepen', 2016-10-10). Fix the documentation to match the implementation. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>