<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/commit-reach.c, branch v2.28.1</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://www.git.shady.money/git/atom?h=v2.28.1</id>
<link rel='self' href='https://www.git.shady.money/git/atom?h=v2.28.1'/>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/'/>
<updated>2020-07-07T05:09:16Z</updated>
<entry>
<title>Merge branch 'cb/is-descendant-of'</title>
<updated>2020-07-07T05:09:16Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-07-07T05:09:15Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=0258ed1e08e7973f1d6829db8d33109851067a91'/>
<id>urn:sha1:0258ed1e08e7973f1d6829db8d33109851067a91</id>
<content type='text'>
Code clean-up.

* cb/is-descendant-of:
  commit-reach: avoid is_descendant_of() shim
</content>
</entry>
<entry>
<title>Merge branch 'ak/commit-graph-to-slab'</title>
<updated>2020-07-07T05:09:14Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-07-07T05:09:13Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=d80bea479daa1d49a6b2332c4731e2647c896c29'/>
<id>urn:sha1:d80bea479daa1d49a6b2332c4731e2647c896c29</id>
<content type='text'>
A few fields in "struct commit" that do not have to always be
present have been moved to commit slabs.

* ak/commit-graph-to-slab:
  commit-graph: minimize commit_graph_data_slab access
  commit: move members graph_pos, generation to a slab
  commit-graph: introduce commit_graph_data_slab
  object: drop parsed_object_pool-&gt;commit_count
</content>
</entry>
<entry>
<title>Merge branch 'rs/commit-reach-leakfix'</title>
<updated>2020-06-29T21:17:27Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-06-29T21:17:27Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=fa2c57d562921cdd3be5c7af1ee59622439bc8cd'/>
<id>urn:sha1:fa2c57d562921cdd3be5c7af1ee59622439bc8cd</id>
<content type='text'>
Leakfix.

* rs/commit-reach-leakfix:
  commit-reach: plug minor memory leak after using is_descendant_of()
</content>
</entry>
<entry>
<title>commit-reach: avoid is_descendant_of() shim</title>
<updated>2020-06-23T23:36:53Z</updated>
<author>
<name>Carlo Marcelo Arenas Belón</name>
<email>carenas@gmail.com</email>
</author>
<published>2020-06-23T18:42:22Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=c1ea625f72110fda87fd225d7dff002befb82d9f'/>
<id>urn:sha1:c1ea625f72110fda87fd225d7dff002befb82d9f</id>
<content type='text'>
d91d6fbf26 (commit-reach: create repo_is_descendant_of(), 2020-06-17)
adds a repository aware version of is_descendant_of() and a backward
compatibility shim that is barely used.

Update all callers to directly use the new repo_is_descendant_of()
function instead; making the codebase simpler and pushing more
the_repository references higher up the stack.

Helped-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Carlo Marcelo Arenas Belón &lt;carenas@gmail.com&gt;
Reviewed-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-reach: plug minor memory leak after using is_descendant_of()</title>
<updated>2020-06-19T18:06:01Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2020-06-19T13:13:46Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=d546fe2874ce8dc73cb0ac7541640fd202ec27c8'/>
<id>urn:sha1:d546fe2874ce8dc73cb0ac7541640fd202ec27c8</id>
<content type='text'>
ref_newer() builds a commit_list to pass a single potential ancestor to
is_descendant_of().  The latter leaves the list intact.  Release the
allocated memory after the call.

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Acked-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: minimize commit_graph_data_slab access</title>
<updated>2020-06-17T21:37:52Z</updated>
<author>
<name>Abhishek Kumar</name>
<email>abhishekkumar8222@gmail.com</email>
</author>
<published>2020-06-17T09:14:11Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=c752ad09c4ea479e8d54d08637cc0e5709723208'/>
<id>urn:sha1:c752ad09c4ea479e8d54d08637cc0e5709723208</id>
<content type='text'>
In an earlier patch, multiple struct acccesses to `graph_pos` and
`generation` were auto-converted to multiple method calls.

Since the values are fixed and commit-slab access costly, we would be
better off with storing the values as a local variable and reusing it.

Signed-off-by: Abhishek Kumar &lt;abhishekkumar8222@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit: move members graph_pos, generation to a slab</title>
<updated>2020-06-17T21:37:30Z</updated>
<author>
<name>Abhishek Kumar</name>
<email>abhishekkumar8222@gmail.com</email>
</author>
<published>2020-06-17T09:14:10Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=c49c82aa4c1036ba1629f73223cff53230e695f3'/>
<id>urn:sha1:c49c82aa4c1036ba1629f73223cff53230e695f3</id>
<content type='text'>
We remove members `graph_pos` and `generation` from the struct commit.
The default assignments in init_commit_node() are no longer valid,
which is fine as the slab helpers return appropriate default values and
the assignments are removed.

We will replace existing use of commit-&gt;generation and commit-&gt;graph_pos
by commit_graph_data_slab helpers using
`contrib/coccinelle/commit.cocci'.

Signed-off-by: Abhishek Kumar &lt;abhishekkumar8222@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-reach: use fast logic in repo_in_merge_base</title>
<updated>2020-06-17T20:49:38Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-06-17T17:24:29Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=80b8ada54756e972b01d20b98a388fbbc3a2fe58'/>
<id>urn:sha1:80b8ada54756e972b01d20b98a388fbbc3a2fe58</id>
<content type='text'>
The repo_is_descendant_of() method is aware of the existence of the
commit-graph file. It checks for generation_numbers_enabled() before
deciding on using can_all_from_reach() or repo_in_merge_bases()
depending on the situation. The reason here is that can_all_from_reach()
uses a depth-first search that is limited by the minimum generation
number of the target commits, and that algorithm can be very slow when
generation numbers are not present. The alternative uses
paint_down_to_common() which will walk the entire merge-base boundary,
which is typically slower.

This method is used by commands like "git tag --contains" and "git
branch --contains" for very fast results when a commit-graph file
exists. Unfortunately, it is _not_ used in commands like "git merge-base
--is-ancestor" which is doing an even simpler request.

This issue was raised recently [1] with respect to a change to how
generation numbers are stored, but was also reported much earlier [2]
before commit-reach.c existed to simplify these reachability queries.

[1] https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/
[2] https://lore.kernel.org/git/87608bawoa.fsf@evledraar.gmail.com/

The root cause is that builtin/merge-base.c has a method
handle_is_ancestor() that calls in_merge_bases(), an older version of
repo_in_merge_bases(). It would be better if we have every caller to
in_merge_bases() use the logic in can_all_from_reach() when possible.

This is where things get a little tricky: repo_is_descendant_of() calls
repo_in_merge_bases() in the non-generation numbers enabled case! If we
simply update repo_in_merge_bases() to call repo_is_descendant_of()
instead of repo_in_merge_bases_many(), then we will get a recursive call
loop. Thankfully, this is caught by the test suite in the default mode
(i.e. GIT_TEST_COMMIT_GRAPH=0).

The trick, then, is to make the non-generation number case for
repo_is_descendant_of() call repo_in_merge_bases_many() directly,
skipping the non-_many version. This allows us to take advantage of this
faster code path, when possible.

The easiest way to measure the performance impact is to test the
following command on the Linux kernel repository:

	git merge-base --is-ancestor &lt;A&gt; &lt;B&gt;

  | A    | B    | Time Before | Time After |
  |------|------|-------------|------------|
  | v3.0 | v5.7 | 0.459s      | 0.028s     |
  | v4.0 | v5.7 | 0.267s      | 0.021s     |
  | v5.0 | v5.7 | 0.074s      | 0.013s     |

Note that each of these samples return success. The old code performed
the same operation when &lt;A&gt; and &lt;B&gt; are swapped. However,
can_all_from_reach() will return immediately if the generation numbers
show that &lt;A&gt; has larger generation number than &lt;B&gt;. Thus, the time for
the swapped case is universally 0.004s in each case.

Reported-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Reported-by: SZEDER Gábor &lt;szeder.dev@gmail.com&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-reach: create repo_is_descendant_of()</title>
<updated>2020-06-17T20:49:36Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-06-17T17:24:28Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=d91d6fbf26663ab64c453411c2de6cf4ea754246'/>
<id>urn:sha1:d91d6fbf26663ab64c453411c2de6cf4ea754246</id>
<content type='text'>
The next change will make repo_in_merge_bases() depend on the logic in
is_descendant_of(), but we need to make the method independent of
the_repository first.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: fix writing first commit-graph during fetch</title>
<updated>2019-10-25T02:19:16Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-10-24T13:40:42Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=cb99a34e23e32ca8e94bafaa9699cfd133a17fd3'/>
<id>urn:sha1:cb99a34e23e32ca8e94bafaa9699cfd133a17fd3</id>
<content type='text'>
The previous commit includes a failing test for an issue around
fetch.writeCommitGraph and fetching in a repo with a submodule. Here, we
fix that bug and set the test to "test_expect_success".

The problem arises with this set of commands when the remote repo at
&lt;url&gt; has a submodule. Note that --recurse-submodules is not needed to
demonstrate the bug.

	$ git clone &lt;url&gt; test
	$ cd test
	$ git -c fetch.writeCommitGraph=true fetch origin
	Computing commit graph generation numbers: 100% (12/12), done.
	BUG: commit-graph.c:886: missing parent &lt;hash1&gt; for commit &lt;hash2&gt;
	Aborted (core dumped)

As an initial fix, I converted the code in builtin/fetch.c that calls
write_commit_graph_reachable() to instead launch a "git commit-graph
write --reachable --split" process. That code worked, but is not how we
want the feature to work long-term.

That test did demonstrate that the issue must be something to do with
internal state of the 'git fetch' process.

The write_commit_graph() method in commit-graph.c ensures the commits we
plan to write are "closed under reachability" using close_reachable().
This method walks from the input commits, and uses the UNINTERESTING
flag to mark which commits have already been visited. This allows the
walk to take O(N) time, where N is the number of commits, instead of
O(P) time, where P is the number of paths. (The number of paths can be
exponential in the number of commits.)

However, the UNINTERESTING flag is used in lots of places in the
codebase. This flag usually means some barrier to stop a commit walk,
such as in revision-walking to compare histories. It is not often
cleared after the walk completes because the starting points of those
walks do not have the UNINTERESTING flag, and clear_commit_marks() would
stop immediately.

This is happening during a 'git fetch' call with a remote. The fetch
negotiation is comparing the remote refs with the local refs and marking
some commits as UNINTERESTING.

I tested running clear_commit_marks_many() to clear the UNINTERESTING
flag inside close_reachable(), but the tips did not have the flag, so
that did nothing.

It turns out that the calculate_changed_submodule_paths() method is at
fault. Thanks, Peff, for pointing out this detail! More specifically,
for each submodule, the collect_changed_submodules() runs a revision
walk to essentially do file-history on the list of submodules. That
revision walk marks commits UNININTERESTING if they are simplified away
by not changing the submodule.

Instead, I finally arrived on the conclusion that I should use a flag
that is not used in any other part of the code. In commit-reach.c, a
number of flags were defined for commit walk algorithms. The REACHABLE
flag seemed like it made the most sense, and it seems it was not
actually used in the file. The REACHABLE flag was used in early versions
of commit-reach.c, but was removed by 4fbcca4 (commit-reach: make
can_all_from_reach... linear, 2018-07-20).

Add the REACHABLE flag to commit-graph.c and use it instead of
UNINTERESTING in close_reachable(). This fixes the bug in manual
testing.

Reported-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Helped-by: Jeff King &lt;peff@peff.net&gt;
Helped-by: Szeder Gábor &lt;szeder.dev@gmail.com&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
