aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2021-05-21t4013: test "git log -m --stat"Sergey Organov2-0/+67
This is to ensure we won't break different diff formats when we start to imply "-p" by "-m". Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-21t4013: test "git log -m --raw"Sergey Organov2-0/+62
This is to ensure we won't break different diff formats when we start to imply "-p" by "-m". Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-21t4013: test that "-m" alone has no effect in "git log"Sergey Organov1-0/+8
This is to notice current behavior that we are going to change when we start to imply "-p" by "-m". Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-21simple-ipc: correct ifdefs when NO_PTHREADS is definedJeff Hostetler6-14/+48
Simple IPC always requires threads (in addition to various platform-specific IPC support). Fix the ifdefs in the Makefile to define SUPPORTS_SIMPLE_IPC when appropriate. Previously, the Unix version of the code would only verify that Unix domain sockets were available. This problem was reported here: https://lore.kernel.org/git/YKN5lXs4AoK%2FJFTO@coredump.intra.peff.net/T/#m08be8f1942ea8a2c36cfee0e51cdf06489fdeafc Reported-by: Randall S. Becker <rsbecker@nexbridge.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-21help: fix small typo in error messageJean-Noël Avila1-1/+1
Classic string concatenation while forgetting a space character. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-21trace2: refactor to avoid gcc warning under -O3Ævar Arnfjörð Bjarmason1-10/+8
Refactor tr2_dst_try_uds_connect() to avoid a gcc warning[1] that appears under -O3 (but not -O2). This makes the build pass under DEVELOPER=1 without needing a DEVOPTS=no-error. This can be reproduced with GCC Debian 8.3.0-6, but not e.g. with clang 7.0.1-8+deb10u2. We've had this warning since ee4512ed481 (trace2: create new combined trace facility, 2019-02-22). As noted in [2] this warning happens because the compiler doesn't assume that errno must be non-zero after a failed syscall. Let's work around by using the well-established "saved_errno" pattern, along with returning -1 ourselves instead of "errno". The caller can thus rely on our "errno" on failure. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61846 for a related bug report against GCC. 1. trace2/tr2_dst.c: In function ‘tr2_dst_get_trace_fd.part.5’: trace2/tr2_dst.c:296:10: warning: ‘fd’ may be used uninitialized in this function [-Wmaybe-uninitialized] dst->fd = fd; ~~~~~~~~^~~~ trace2/tr2_dst.c:229:6: note: ‘fd’ was declared here int fd; ^~ 2. https://lore.kernel.org/git/20200404142131.GA679473@coredump.intra.peff.net/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-21Merge branch 'ds/sparse-index-protections'Junio C Hamano1-1/+1
Fix access to uninitialized piece of memory, introduced during this cycle. * ds/sparse-index-protections: sparse-index: fix uninitialized jump
2021-05-21Merge branch 'tz/c-locale-output-is-no-more'Junio C Hamano1-1/+1
Test update. * tz/c-locale-output-is-no-more: t7500: remove non-existant C_LOCALE_OUTPUT prereq
2021-05-21Merge branch 'cs/http-use-basic-after-failed-negotiate'Junio C Hamano3-13/+48
Regression fix for a change made during this cycle. * cs/http-use-basic-after-failed-negotiate: Revert "remote-curl: fall back to basic auth if Negotiate fails" t5551: test http interaction with credential helpers
2021-05-20l10n: sv.po: Update Swedish translation (5204t0f0u)Peter Krefting1-3192/+3684
Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
2021-05-20merge-ort, diffcore-rename: employ cached renames when possibleElijah Newren4-30/+90
When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. However, for this optimization to be effective, merge_switch_to_result() should only be called when the rebase or cherry-pick operation has either completed or hit a case where the user needs to resolve a conflict or edit the result. If it is called after every commit, as sequencer.c does, then the working tree and index are needlessly updated with every commit and the cached metadata is tossed, defeating this optimization. Refactoring sequencer.c to only call merge_switch_to_result() at the end of the operation is a bigger undertaking, and the practical benefits of this optimization will not be realized until that work is performed. Since `test-tool fast-rebase` only updates at the end of the operation, it was used to obtain the timings above. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20merge-ort: handle interactions of caching and rename/rename(1to1) casesElijah Newren1-1/+29
As documented in Documentation/technical/remembering-renames.txt, and as tested for in the two testcases in t6429 with "rename same file identically" in their description, there is one case where we need to have renames in one commit NOT be cached for the next commit in our rebase sequence -- namely, rename/rename(1to1) cases. Rather than specifically trying to uncache those and fix up dir_rename_counts() to match (which would also be valid but more work), we simply disable the optimization when this really rare type of rename occurs. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20merge-ort: add helper functions for using cached renamesElijah Newren1-0/+47
If we have a usable rename cache, then we can remove from relevant_sources all the paths that were cached; diffcore_rename_extended() can then consider an even smaller set of relevant_sources in its rename detection. However, when diffcore_rename_extended() is done, we will need to take the renames it detected and then add back in all the ones we had cached from before. Add helper functions for doing these two operations; the next commit will make use of them. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20merge-ort: preserve cached renames for the appropriate sideElijah Newren1-9/+11
Previous commits created an in-memory cache of the results of rename detection, and added logic to detect when that cache could appropriately be used in a subsequent merge operation -- but we were still unconditionally clearing the cache with each new merge operation anyway. If it is valid to reuse the cache from one of the two sides of history, preserve that side. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20merge-ort: avoid accidental API mis-useElijah Newren2-0/+9
Previously, callers of the merge-ort API could have passed an uninitialized value for struct merge_result *result. However, we want to check result to see if it has cached renames from a previous merge that we can reuse; such values would be found behind result->priv. However, if result->priv is uninitialized, attempting to access behind it will give a segfault. So, we need result->priv to be NULL (which will be the case if the caller does a memset(&result, 0)), or be written by a previous call to the merge-ort machinery. Documenting this requirement may help, but despite being the person who introduced this requirement, I still missed it once and it did not fail in a very clear way and led to a long debugging session. Add a _properly_initialized field to merge_result; that value will be 0 if the caller zero'ed the merge_result, it will be set to a very specific value by a previous run by the merge-ort machinery, and if it's uninitialized it will most likely either be 0 or some value that does not match the specific one we'd expect allowing us to throw a much more meaningful error. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20merge-ort: add code to check for whether cached renames can be reusedElijah Newren1-2/+64
We need to know when renames detected in a previous merge operation can be reused in a later merge operation. Consider the following setup (from the git-rebase manpage): A---B---C topic / D---E---F---G master After rebasing, this will appear as: A'--B'--C' topic / D---E---F---G master Further, let's say that 'oldfile' was renamed to 'newfile' between E and G. The rebase or cherry-pick of A onto G will involve a three-way merge between E (as the merge base) and G and A. After detecting the rename between E:oldfile and G:newfile, there will be a three-way content merge of the following: E:oldfile G:newfile A:oldfile and produce a new result: A':newfile Now, when we want to pick B onto A', we will need to do a three-way merge between A (as the merge-base) and A' and B. This will involve a three-way content merge of A:oldfile A':newfile B:oldfile but only if we can detect that A:oldfile is similar enough to A':newfile to be used together in a three-way content merge, i.e. only if we can detect that A:oldfile and A':newfile are a rename. But we already know that A:oldfile and A':newfile are similar enough to be used in a three-way content merge, because that is precisely where A':newfile came from in the previous merge. Note that A & A' both appear in both merges. That gives us the condition under which we can reuse renames. There are a couple important points about this optimization: - If the rebase or cherry-pick halts for user conflicts, these caches are NOT saved anywhere. Thus, resuming a halted rebase or cherry-pick will result in no reused renames for the next commit. This is intentional, as user resolution can change files significantly and in ways that violate the similarity assumptions here. - Technically, in a *very* narrow case this might give slightly different results for rename detection. Using the example above, if: * E:oldfile had 20 lines * G:newfile added 10 new lines at the beginning of the file * A:oldfile deleted all but the first three lines of the file then => A':newfile would have 13 lines, 3 of which matches those in A:oldfile. Consider the two cases: * Without this optimization: - the next step of the rebase operation (moving B to B') would not detect the rename betwen A:oldfile and A':newfile - we'd thus get a modify/delete conflict with the rebase operation halting for the user to resolve, and have both A':newfile and B:oldfile sitting in the working tree. * With this optimization: - the rename between A:oldfile and A':newfile would be detected via the cache of renames - a three-way merge between A:oldfile, A':newfile, and B:oldfile would commence and be written to A':newfile Now, is the difference in behavior a bug...or a bugfix? I can't tell. Given that A:oldfile and A':newfile are not very similar, when we three-way merge with B:oldfile it seems likely we'll hit a conflict for the user to resolve. And it shouldn't be too hard for users to see why we did that three-way merge; oldfile and newfile *were* renames somewhere in the sequence. So, most of these corner cases will still behave similarly -- namely, a conflict given to the user to resolve. Also, consider the interesting case when commit B is a clean revert of commit A. Without this optimization, a rebase could not both apply a weird patch like A and then immediately revert it; users would be forced to resolve merge conflicts. With this optimization, it would successfully apply the clean revert. So, there is certainly at least one case that behaves better. Even if it's considered a "difference in behavior", I think both behaviors are reasonable, and the time savings provided by this optimization justify using the slightly altered rename heuristics. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20merge-ort: populate caches of rename detection resultsElijah Newren1-1/+72
Fill in cache_pairs, cached_target_names, and cached_irrelevant based on rename detection results. Future commits will make use of these values. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20merge-ort: add data structures for in-memory caching of rename detectionElijah Newren1-0/+53
When there are many renames between the old base of a series of commits and the new base for a series of commits, the sequence of merges employed to transplant those commits (from a cherry-pick or rebase operation) will repeatedly detect the exact same renames. This is wasted effort. Add data structures which will be used to cache rename detection results, along with the initialization and deallocation of these data structures. Future commits will populate these caches, detect the appropriate circumstances when they can be used, and employ them to avoid re-detecting the same renames repeatedly. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20t6429: testcases for remembering renamesElijah Newren2-6/+700
We will soon be adding an optimization that caches (in memory only, never written to disk) upstream renames during a sequence of merges such as occurs during a cherry-pick or rebase operation. Add several tests meant to stress such an implementation to ensure it does the right thing, and include a test whose outcome we will later change due to this optimization as well. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20fast-rebase: write conflict state to working tree, index, and HEADElijah Newren1-19/+32
Previously, when fast-rebase hit a conflict, it simply aborted and left HEAD, the index, and the working tree where they were before the operation started. While fast-rebase does not support restarting from a conflicted state, write the conflicted state out anyway as it gives us a way to see what the conflicts are and write tests that check for them. This will be important in the upcoming commits, because sequencer.c is only superficially integrated with merge-ort.c; in particular, it calls merge_switch_to_result() after EACH merge instead of only calling it at the end of all the sequence of merges (or when a conflict is hit). This not only causes needless updates to the working copy and index, but also causes all intermediate data to be freed and tossed, preventing caching information from one merge to the next. However, integrating sequencer.c more deeply with merge-ort.c is a big task, and making this small extension to fast-rebase.c provides us with a simple way to test the edge and corner cases that we want to make sure continue working. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20fast-rebase: change assert() to BUG()Elijah Newren1-1/+2
assert() can succinctly document expectations for the code, and do so in a way that may be useful to future folks trying to refactor the code and change basic assumptions; it allows them to more quickly find some places where their violations of previous assumptions trips things up. Unfortunately, assert() can surround a function call with important side-effects, which is a huge mistake since some users will compile with assertions disabled. I've had to debug such mistakes before in other codebases, so I should know better. Luckily, this was only in test code, but it's still very embarrassing. Change an assert() to an if (...) BUG (...). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20Documentation/technical: describe remembering renames optimizationElijah Newren1-0/+669
Remembering renames on the upstream side of history in an early merge of a rebase or cherry-pick for re-use in a latter merge of the same operation makes pretty good intuitive sense. However, trying to show that it doesn't cause some subtle behavioral difference or some funny edge or corner case is much more involved. And, in fact, it does introduce a subtle behavioral change. Document all the assumptions, special cases, and logic involved in such an optimization, and describe why this optimization is safe under the current optimizations/features/etc. -- even when the subtle behavioral change is triggered. Part of the point of adding this document that goes over the optimization in such laborious detail, is that it is possible that significant future changes (optimizations or feature changes) could interact with this optimization in interesting ways; this document is here to help folks making big changes sanity check that the assumptions and arguments underlying this optimization are still valid. (As a side note, creating this document forced me to review things in sufficient detail that I found I was not properly caching directory-rename-induced renames, resulting in the code not being aware of those renames and causing unnecessary diffcore_rename_extended() calls in subsequent merges.) A subsequent commit will add several testcases based on this document meant to stress-test the implementation and also document the case with the subtle behavioral change. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20doc: explain the use of color.pagerJeff King1-2/+3
The current documentation for color.pager is technically correct, but slightly misleading and doesn't really clarify the purpose of the variable. As explained in the original thread which added it: https://lore.kernel.org/git/E1G6zPH-00062L-Je@moooo.ath.cx/ the point is to deal with pagers that don't understand colors. And hence it being set to "true" is necessary for colorizing output to the pager, but not sufficient by itself (you must also have enabled one of the other color options, though note that these are set to "auto" by default these days). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20l10n: fix typos in po/TEAMSJiang Xin1-2/+2
Find typos in "po/TEAMS" file using the "git-po-helper" program. These typos were introduced from commit v2.24.0-1-g9917eca794 (l10n: zh_TW: add translation for v2.24.0, 2019-11-20 19:14:22 +0800). Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
2021-05-20setup: split "extensions found" messages into singular and pluralAlex Henrie1-2/+6
It's easier to translate this way. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20fetch: improve grammar of "shallow roots" messageAlex Henrie1-1/+1
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20A handful more topics before -rc1Junio C Hamano1-0/+19
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20Merge branch 'jk/test-chainlint-softer'Junio C Hamano4-3/+21
The "chainlint" feature in the test framework is a handy way to catch common mistakes in writing new tests, but tends to get expensive. An knob to selectively disable it has been introduced to help running tests that the developer has not modified. * jk/test-chainlint-softer: t: avoid sed-based chain-linting in some expensive cases
2021-05-20Merge branch 'en/prompt-under-set-u'Junio C Hamano1-3/+3
The bash prompt script (in contrib/) did not work under "set -u". * en/prompt-under-set-u: git-prompt: work under set -u
2021-05-20Merge branch 'zh/ref-filter-push-remote-fix'Junio C Hamano2-1/+20
The handling of "%(push)" formatting element of "for-each-ref" and friends was broken when the same codepath started handling "%(push:<what>)", which has been corrected. * zh/ref-filter-push-remote-fix: ref-filter: fix read invalid union member bug
2021-05-20Merge branch 'ew/sha256-clone-remote-curl-fix'Junio C Hamano1-0/+2
"git clone" from SHA256 repository by Git built with SHA-1 as the default hash algorithm over the dumb HTTP protocol did not correctly set up the resulting repository, which has been corrected. * ew/sha256-clone-remote-curl-fix: remote-curl: fix clone on sha256 repos
2021-05-20Merge branch 'en/dir-traversal'Junio C Hamano18-172/+298
"git clean" and "git ls-files -i" had confusion around working on or showing ignored paths inside an ignored directory, which has been corrected. * en/dir-traversal: dir: introduce readdir_skip_dot_and_dotdot() helper dir: update stale description of treat_directory() dir: traverse into untracked directories if they may have ignored subfiles dir: avoid unnecessary traversal into ignored directory t3001, t7300: add testcase showcasing missed directory traversal t7300: add testcase showing unnecessary traversal into ignored directory ls-files: error out on -i unless -o or -c are specified dir: report number of visited directories and paths with trace2 dir: convert trace calls to trace2 equivalents
2021-05-20Merge branch 'ab/perl-makefile-cleanup'Junio C Hamano2-4/+19
Build procedure clean-up. * ab/perl-makefile-cleanup: Makefile: make PERL_DEFINES recursively expanded perl: use mock i18n functions under NO_GETTEXT=Y Makefile: regenerate *.pm on NO_PERL_CPAN_FALLBACKS change Makefile: regenerate perl/build/* if GIT-PERL-DEFINES changes Makefile: don't re-define PERL_DEFINES
2021-05-20fetch-pack: signal v2 server that we are done making requestsJeff King2-2/+13
When fetching with the v0 protocol over ssh (or a local upload-pack with pipes), the server closes the connection as soon as it is finished sending the pack. So even though the client may still be operating on the data via index-pack (e.g., resolving deltas, checking connectivity, etc), the server has released all resources. With the v2 protocol, however, the server considers the ssh session only as a transport, with individual requests coming over it. After sending the pack, it goes back to its main loop, waiting for another request to come from the client. As a result, the ssh session hangs around until the client process ends, which may be much later (because resolving deltas, etc, may consume a lot of CPU). This is bad for two reasons: - it's consuming resources on the server to leave open a connection that won't see any more use - if something bad happens to the ssh connection in the meantime (say, it gets killed by the network because it's idle, as happened in a real-world report), then ssh will exit non-zero, and we'll propagate the error up the stack. The server is correct here not to hang up after serving the pack. The v2 protocol's design is meant to allow multiple requests like this, and hanging up would be the wrong thing for a hypothetical client which was planning to make more requests (though in practice, the git.git client never would, and I doubt any other implementations would either). The right thing is instead for the client to signal to the server that it's not interested in making more requests. We can do that by closing the pipe descriptor we use to write to ssh. This will propagate to the server upload-pack as an EOF when it tries to read the next request (and then it will close its half, and the whole connection will go away). It's important to do this "half duplex" shutdown, because we have to do it _before_ we actually receive the pack. This is an artifact of the way fetch-pack and index-pack (or unpack-objects) interact. We hand the connection off to index-pack (really, a sideband demuxer which feeds it), and then wait until it returns. And it doesn't do that until it has resolved all of the deltas in the pack, even though it was done reading from the server long before. So just closing the connection fully after index-pack returns would be too late; we'd have held it open much longer than was necessary. And teaching index-pack to close the connection is awkward. It's not even seeing the whole conversation (the sideband demuxer is, but it doesn't actually know what's in the packets, or when the end comes). Note that this close() is happening deep within the transport code. It's possible that a caller would want to perform other operations over the same ssh transport after receiving the pack. But as of the current code, none of the callers do, and there haven't been discussions of any plans to change this. If we need to support that later, we can probably do so by passing down a flag for "you're the last request on the transport; it's OK to close" instead of the code just assuming that's true. The description above all discusses v2 ssh, so it's worth thinking about how this interacts with other protocols: - in v0 protocols, we could do the same half-duplex shutdown (it just goes into the v0 do_fetch_pack() instead). This does work, but since it doesn't have the same persistence problem in the first place, there's little reason to change it at this point. - local fetches against git-upload-pack on the same machine will behave the same as ssh (they are talking over two pipes, and see EOF on their input pipe) - fetches against git-daemon will run this same code, and close one of the descriptors. In practice, this won't do anything, since there our two descriptors are dups of each other, and not part of a half-duplex pair. The right thing would probably be to call shutdown(SHUT_WR) on it. I didn't bother with that here. It doesn't face the same error-code problem (since it's just a TCP connection), so it's really only an optimization problem. And git:// is not that widely used these days, and has less impact on server resources than an ssh termination. - v2 http doesn't suffer from this problem in the first place, as our pipes terminate at a local git-remote-https, which is passing data along as individual requests via curl. Probably curl is keeping the TCP/TLS connection open for more requests, and we might be able to tell it manually "hey, we are done making requests now". But I think that's much less important. It again doesn't suffer from the error-code problem, and HTTP keepalive is pretty well understood (importantly, the timeouts can be set low, because clients like curl know how to reconnect for subsequent requests if necessary). So it's probably not worth figuring out how to tell curl that we're done (though if we do, this patch is probably the first step anyway; fetch-pack closes the pipe back to remote-https, which would be the signal that it should tell curl we're done). The code is pretty straightforward. We close the pipe at the right moment, and set it to -1 to mark it as invalid. I modified the later cleanup code to avoid calling close(-1). That's not strictly necessary, since close(-1) is a noop, but hopefully makes things a bit more obvious to a reader. I suspect that trying to call more transport functions after the close() (e.g., calling transport_fetch_refs() again) would fail, as it's not smart enough to realize we need to re-open the ssh connection. But that's already true when v0 is in use. And no current callers want to do that (and again, the solution is probably a flag in the transport code to keep things open, which can be added later). There's no test here, as the situation it covers is inherently racy (the question is when upload-pack exits, compared to when index-pack finishes resolving deltas and exits). The rather gross shell snippet below does recreate the problematic situation; when run on a sufficiently-large repository (git.git works fine), it kills an "idle" upload-pack while the client is resolving deltas, leading to a failed clone. ( git clone --no-local --progress . foo.git 2>&1 echo >&2 "clone exit code=$?" ) | tr '\r' '\n' | while read line do case "$done,$line" in ,Resolving*) echo "hit resolving deltas; killing upload-pack" killall -9 git-upload-pack done=t ;; esac done Reported-by: Greg Pflaum <greg.pflaum@pnp-hcl.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19l10n: fr: v2.32.0 round 1Jean-Noël Avila1-5526/+4574
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
2021-05-19clone: clean up directory after transport_fetch_refs() failureJeff King2-7/+11
git-clone started respecting errors from the transport subsystem in aab179d937 (builtin/clone.c: don't ignore transport_fetch_refs() errors, 2020-12-03). However, that commit didn't handle the cleanup of the filesystem quite right. The cleanup of the directory that cmd_clone() creates is done by an atexit() handler, which we control with a flag. It starts as JUNK_LEAVE_NONE ("clean up everything"), then progresses to JUNK_LEAVE_REPO when we know we have a valid repo but not working tree, and then finally JUNK_LEAVE_ALL when we have a successful checkout. Most errors cause us to die(), which then triggers the handler to do the right thing based on how far into cmd_clone() we got. But the checks added by aab179d937 instead set the "err" variable and then jump to a new "cleanup" label, which then returns our non-zero status. However, the code after the cleanup label includes setting the flag to JUNK_LEAVE_ALL, and so we accidentally leave the repository and working tree in place. One obvious option to fix this is to reorder the end of the function to set the flag first, before cleanup code, and put the label between them. But we can observe another small bug: the error return from transport_fetch_refs() is generally "-1", and we propagate that to the return value of cmd_clone(), which ultimately becomes the exit code of the process. And we try to avoid transmitting negative values via exit codes (only the low 8 bits are passed along as an unsigned value, though in practice for "-1" this at least retains the property that it's non-zero). Instead, let's just die(). That makes us consistent with rest of the code in the function. It does add a new "fatal:" line to the output, but I'd argue that's a good thing: - in the rare case that the transport code didn't say anything, now the user gets _some_ error message - even if the transport code said something like "error: ssh died of signal 9", it's nice to also say "fatal" to indicate that we considered that to be a show-stopper. Triggering this in the test suite turns out to be surprisingly difficult. Almost every error we'd encounter, including ones deep inside the transport code, cause us to just die() right there! However, one way is to put a fake wrapper around git-upload-pack that sends the complete packfile but exits with a failure code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19docs: improve fast-forward in glossary contentReuven Y1-2/+2
The text was somewhat confusing between the revision itself and the author. Signed-off-by: Reuven Yagel <robi@post.jce.ac.il> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19read-cache: delete unused hashing methodsDerrick Stolee1-64/+0
These methods were marked as MAYBE_UNUSED in the previous change to avoid a complicated diff. Delete them entirely, since we now use the hashfile API instead of this custom hashing code. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19read-cache: use hashfile instead of git_hash_ctxDerrick Stolee1-71/+66
The do_write_index() method in read-cache.c has its own hashing logic and buffering mechanism. Specifically, the ce_write() method was introduced by 4990aadc (Speed up index file writing by chunking it nicely, 2005-04-20) and similar mechanisms were introduced a few months later in c38138cd (git-pack-objects: write the pack files with a SHA1 csum, 2005-06-26). Based on the timing, in the early days of the Git codebase, I figured that these roughly equivalent code paths were never unified only because it got lost in the shuffle. The hashfile API has since been used extensively in other file formats, such as pack-indexes, multi-pack-indexes, and commit-graphs. Therefore, it seems prudent to unify the index writing code to use the same mechanism. I discovered this disparity while trying to create a new index format that uses the chunk-format API. That API uses a hashfile as its base, so it is incompatible with the custom code in read-cache.c. This rewrite is rather straightforward. It replaces all writes to the temporary file with writes to the hashfile struct. This takes care of many of the direct interactions with the_hash_algo. There are still some git_hash_ctx uses remaining: the extension headers are hashed for use in the End of Index Entries (EOIE) extension. This use of the git_hash_ctx is left as-is. There are multiple reasons to not use a hashfile here, including the fact that the data is not actually writing to a file, just a hash computation. These hashes do not block our adoption of the chunk-format API in a future change to the index, so leave it as-is. The internals of the algorithms are mostly identical. Previously, the hashfile API used a smaller 8KB buffer instead of the 128KB buffer from read-cache.c. The previous change already unified these sizes. There is one subtle point: we do not pass the CSUM_FSYNC to the finalize_hashfile() method, which differs from most consumers of the hashfile API. The extra fsync() call indicated by this flag causes a significant peformance degradation that is noticeable for quick commands that write the index, such as "git add". Other consumers can absorb this cost with their more complicated data structure organization, and further writing structures such as pack-files and commit-graphs is rarely in the critical path for common user interactions. Some static methods become orphaned in this diff, so I marked them as MAYBE_UNUSED. The diff is much harder to read if they are deleted during this change. Instead, they will be deleted in the following change. In addition to the test suite passing, I computed indexes using the previous binaries and the binaries compiled after this change, and found the index data to be exactly equal. Finally, I did extensive performance testing of "git update-index --force-write" on repos of various sizes, including one with over 2 million paths at HEAD. These tests demonstrated less than 1% difference in behavior. As expected, the performance should be considered unchanged. The previous changes to increase the hashfile buffer size from 8K to 128K ensured this change would not create a peformance regression. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19csum-file.h: increase hashfile buffer sizeDerrick Stolee3-25/+68
The hashfile API uses a hard-coded buffer size of 8KB and has ever since it was introduced in c38138c (git-pack-objects: write the pack files with a SHA1 csum, 2005-06-26). It performs a similar function to the hashing buffers in read-cache.c, but that code was updated from 8KB to 128KB in f279894 (read-cache: make the index write buffer size 128K, 2021-02-18). The justification there was that do_write_index() improves from 1.02s to 0.72s. Since our end goal is to have the index writing code use the hashfile API, we need to unify this buffer size to avoid a performance regression. There is a buffer, 'check_buffer', that is used to verify the check_fd file descriptor. When this buffer increases to 128K to fit the data being flushed, it causes the stack to overflow the limits placed in the test suite. To avoid issues with stack size, move both 'buffer' and 'check_buffer' to be heap pointers within 'struct hashfile'. The 'check_buffer' member is left as NULL unless check_fd is set in hashfd_check(). Both buffers are cleared as part of finalize_hashfile() which also frees the full structure. Since these buffers are now on the heap, we can adjust their size based on the needs of the consumer. In particular, callers to hashfd_throughput() are expecting to report progress indicators as the buffer flushes. These callers would prefer the smaller 8k buffer to avoid large delays between updates, especially for users with slower networks. When the progress indicator is not used, the larger buffer is preferrable. By adding a new trace2 region in the chunk-format API, we can see that the writing portion of 'git multi-pack-index write' lowers from ~1.49s to ~1.47s on a Linux machine. These effects may be more pronounced or diminished on other filesystems. The end-to-end timing is too noisy to have a definitive change either way. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19xsize_t: avoid implementation defined behavior when len < 0Jonathan Nieder1-4/+2
The xsize_t helper aims to safely convert an off_t to a size_t, erroring out when a file offset is too large to fit into a memory address. It does this by using two casts: size_t size = (size_t) len; if (len != (off_t) size) ... error out ... On a platform with sizeof(size_t) < sizeof(off_t), this check is safe and correct. The first cast truncates to a size_t by finding the remainder modulo SIZE_MAX+1 (see C99 section 6.3.1.3 Signed and unsigned integers) and the second promotes to an off_t, meaning the result is true if and only if len is representable as a size_t. On other platforms, this two-casts strategy still works well (always succeeds) for len >= 0. But for len < 0, when the first cast succeeds and produces SIZE_MAX + 1 + len, the resulting value is too large to be represented as an off_t, so the second cast produces implementation defined behavior. In practice, it is likely to produce a result of true despite len not being representable as size_t. Simplify by replacing with a more straightforward check: compare len to the relevant bounds and then cast it. (To avoid a -Wsign-compare warning, after checking that len >= 0, we explicitly convert to a sufficiently-large unsigned type before comparing to SIZE_MAX.) In practice, this is not likely to come up since typical callers use nonnegative len. Still, it's helpful to handle this case to make the behavior easy to reason about. Historical note: the original bounds-checking in 46be82dfd0 (xsize_t: check whether we lose bits, 2010-07-28) did not produce this implementation-defined behavior, though it still did not handle negative offsets. It was not until 73560c793a (git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings, 2017-09-21) introduced the double cast that the implementation-defined behavior was triggered. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19Revert "remote-curl: fall back to basic auth if Negotiate fails"Jeff King2-9/+8
This reverts commit 1b0d9545bb85912a16b367229d414f55d140d3be. That commit does fix the situation it intended to (avoiding Negotiate even when the credentials were provided in the URL), but it creates a more serious regression: we now never hit the conditional for "we had a username and password, tried them, but the server still gave us a 401". That has two bad effects: 1. we never call credential_reject(), and thus a bogus credential stored by a helper will live on forever 2. we never return HTTP_NOAUTH, so the error message the user gets is "The requested URL returned error: 401", instead of "Authentication failed". Doing this correctly seems non-trivial, as we don't know whether the Negotiate auth was a problem. Since this is a regression in the upcoming v2.23.0 release (for which we're in -rc0), let's revert for now and work on a fix separately. (Note that this isn't a pure revert; the previous commit added a test showing the regression, so we can now flip it to expect_success). Reported-by: Ben Humphreys <behumphreys@atlassian.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19t5551: test http interaction with credential helpersJeff King1-0/+41
We test authentication with http, and we independently test that credential helpers work, but we don't have any tests that cover the two features working together. Let's add two: 1. Make sure that a successful request asks the helper to save the credential. This works as expected. 2. Make sure that a failed request asks the helper to forget the credential. This is marked as expect_failure, as it was recently regressed by 1b0d9545bb (remote-curl: fall back to basic auth if Negotiate fails, 2021-03-22). The symptom here is that the second request should prompt the user, but doesn't. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-18revisions(7): clarify that most commands take a single revision rangeJunio C Hamano1-0/+23
Sometimes new people are confused by how a revision "range" works, in that it is not a random collection of commits but a set of commits that are all connected to each other, and most Git commands work on a single such "range". Give an example to clarify it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-18hashfile: use write_in_full()Derrick Stolee1-12/+5
The flush() logic in csum-file.c was introduced originally by c38138c (git-pack-objects: write the pack files with a SHA1 csum, 2005-06-26) and a portion of the logic performs similar utility to write_in_full() in wrapper.c. The history of write_in_full() is full of moves and renames, but was originally introduced by 7230e6d (Add write_or_die(), a helper function, 2006-08-21). The point of these sections of code are to flush a write buffer using xwrite() and report errors in the case of disk space issues or other generic input/output errors. The logic in flush() can interpret the output of write_in_full() to provide the correct error messages to users. The logic in the hashfile API has an additional set of logic to augment the progress indicator between calls to xwrite(). This was introduced by 2a128d6 (add throughput display to git-push, 2007-10-30). It seems that since the hashfile's buffer is only 8KB, these additional progress indicators might not be incredibly necessary. Instead, update the progress only when write_in_full() complete. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-18sparse-index: fix uninitialized jumpDerrick Stolee1-1/+1
While testing the sparse-index, I verified a test with --valgrind and it complained about an uninitialized value being used in a jump in the path_matches_pattern_list() method. The line was this one: if (*dtype == DT_UNKNOWN) In the call stack, the culprit was the initialization of the dtype variable in convert_to_sparse_rec(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-18parallel-checkout: send the new object_id algo field to the workersMatheus Tavares2-7/+22
An object_id storing a SHA-1 name has some unused bytes at the end of the hash array. Since these bytes are not used, they are usually not initialized to any value either. However, at parallel_checkout.c:send_one_item() the object_id of a cache entry is copied into a buffer which is later sent to a checkout worker through a pipe write(). This makes Valgrind complain about passing uninitialized bytes to a syscall. The worker won't use these uninitialized bytes either, but the warning could confuse someone trying to debug this code; So instead of using oidcpy(), send_one_item() uses hashcpy() to only copy the used/initialized bytes of the object_id, and leave the remaining part with zeros. However, since cf0983213c ("hash: add an algo member to struct object_id", 2021-04-26), using hashcpy() is no longer sufficient here as it won't copy the new algo field from the object_id. Let's add and use a new function which meets both our requirements of copying all the important object_id data while still avoiding the uninitialized bytes, by padding the end of the hash array in the destination object_id. With this change, we also no longer need the destination buffer from send_one_item() to be initialized with zeros, so let's switch from xcalloc() to xmalloc() to make this clear. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-18t7500: remove non-existant C_LOCALE_OUTPUT prereqTodd Zullinger1-1/+1
The C_LOCALE_OUTPUT prerequisite was removed in b1e079807b (tests: remove last uses of C_LOCALE_OUTPUT, 2021-02-11), where Ævar noted: I'm not leaving the prerequisite itself in place for in-flight changes as there currently are none that introduce new tests that rely on it, and because C_LOCALE_OUTPUT is currently a noop on the master branch we likely won't have any new submissions that use it. One more use of C_LOCALE_OUTPUT did creep in with 3d1bda6b5b (t7500: add tests for --fixup=[amend|reword] options, 2021-03-15). This causes a number of the tests to be skipped by default: ok 35 # SKIP --fixup=reword: incompatible with --all (missing C_LOCALE_OUTPUT) ok 36 # SKIP --fixup=reword: incompatible with --include (missing C_LOCALE_OUTPUT) ok 37 # SKIP --fixup=reword: incompatible with --only (missing C_LOCALE_OUTPUT) ok 38 # SKIP --fixup=reword: incompatible with --interactive (missing C_LOCALE_OUTPUT) ok 39 # SKIP --fixup=reword: incompatible with --patch (missing C_LOCALE_OUTPUT) Remove the C_LOCALE_OUTPUT prerequisite from these tests so they are not skipped. Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-17l10n: tr: v2.32.0-r1Emir Sarı1-3177/+3642
Signed-off-by: Emir Sarı <bitigchi@me.com>
2021-05-17l10n: fr: fixed inconsistenciesrlespinasse1-1/+1
Signed-off-by: rlespinasse <romain.lespinasse@gmail.com>