<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/commit-graph.h, branch v2.28.1</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://www.git.shady.money/git/atom?h=v2.28.1</id>
<link rel='self' href='https://www.git.shady.money/git/atom?h=v2.28.1'/>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/'/>
<updated>2020-06-17T21:37:23Z</updated>
<entry>
<title>commit-graph: introduce commit_graph_data_slab</title>
<updated>2020-06-17T21:37:23Z</updated>
<author>
<name>Abhishek Kumar</name>
<email>abhishekkumar8222@gmail.com</email>
</author>
<published>2020-06-17T09:14:09Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=4844812b9ec9bc6ffdc2da2c54d9146ff4c97d94'/>
<id>urn:sha1:4844812b9ec9bc6ffdc2da2c54d9146ff4c97d94</id>
<content type='text'>
The struct commit is used in many contexts. However, members
`generation` and `graph_pos` are only used for commit-graph related
operations and otherwise waste memory.

This wastage would have been more pronounced as we transition to
generation number v2, which uses 64-bit generation number instead of
current 32-bits.

As they are often accessed together, let's introduce struct
commit_graph_data and move them to a commit_graph_data slab.

While the overall test suite runs just as fast as master,
(series: 26m48s, master: 27m34s, faster by 2.87%), certain commands
like `git merge-base --is-ancestor` were slowed by 40% as discovered
by Szeder Gábor [1]. After minimizing commit-slab access, the slow down
persists but is closer to 20%.

Derrick Stolee believes the slow down is attributable to the underlying
algorithm rather than the slowness of commit-slab access [2] and we will
follow-up in a later series.

[1]: https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/
[2]: https://lore.kernel.org/git/13db757a-9412-7f1e-805c-8a028c4ab2b1@gmail.com/

Signed-off-by: Abhishek Kumar &lt;abhishekkumar8222@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag</title>
<updated>2020-05-18T19:51:11Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2020-05-13T21:59:55Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=2f00c355cb79ee86bddc9f2fef91ac380a6023fc'/>
<id>urn:sha1:2f00c355cb79ee86bddc9f2fef91ac380a6023fc</id>
<content type='text'>
Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in
'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on
receiving non-commit OIDs as input to '--stdin-commits'.

This behavior can be cumbersome to work around in, say, the case of
piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if
the caller does not want to cull out non-commits themselves. In this
situation, it would be ideal if 'git commit-graph write' wrote the graph
containing the inputs that did pertain to commits, and silently ignored
the remainder of the input.

Some options have been proposed to the effect of '--[no-]check-oids'
which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't
want to pass '--no-check-oids', suggesting that we should get rid of the
behavior of complaining about non-commit inputs altogether.

If callers do wish to retain this behavior, they can easily work around
this change by doing the following:

     git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
     awk '
       !/commit/ { print "not-a-commit:"$1 }
        /commit/ { print $1 }
     ' |
     git commit-graph write --stdin-commits

To make it so that valid OIDs that refer to non-existent objects are
indeed an error after loosening the error handling, perform an extra
lookup to make sure that object indeed exists before sending it to the
commit-graph internals.

Helped-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ds/blame-on-bloom'</title>
<updated>2020-05-01T20:39:54Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-05-01T20:39:54Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=6d56d4c7dcd667d28aec28498591723c6febea1c'/>
<id>urn:sha1:6d56d4c7dcd667d28aec28498591723c6febea1c</id>
<content type='text'>
"git blame" learns to take advantage of the "changed-paths" Bloom
filter stored in the commit-graph file.

* ds/blame-on-bloom:
  test-bloom: check that we have expected arguments
  test-bloom: fix some whitespace issues
  blame: drop unused parameter from maybe_changed_path
  blame: use changed-path Bloom filters
  tests: write commit-graph with Bloom filters
  revision: complicated pathspecs disable filters
</content>
</entry>
<entry>
<title>Merge branch 'gs/commit-graph-path-filter'</title>
<updated>2020-05-01T20:39:53Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-05-01T20:39:53Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=9b6606f43d55bbf33b9924d16e02e60e1c09660a'/>
<id>urn:sha1:9b6606f43d55bbf33b9924d16e02e60e1c09660a</id>
<content type='text'>
Introduce an extension to the commit-graph to make it efficient to
check for the paths that were modified at each commit using Bloom
filters.

* gs/commit-graph-path-filter:
  bloom: ignore renames when computing changed paths
  commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag
  t4216: add end to end tests for git log with Bloom filters
  revision.c: add trace2 stats around Bloom filter usage
  revision.c: use Bloom filters to speed up path based revision walks
  commit-graph: add --changed-paths option to write subcommand
  commit-graph: reuse existing Bloom filters during write
  commit-graph: write Bloom filters to commit graph file
  commit-graph: examine commits by generation number
  commit-graph: examine changed-path objects in pack order
  commit-graph: compute Bloom filters for changed paths
  diff: halt tree-diff early after max_changes
  bloom.c: core Bloom filter implementation for changed paths.
  bloom.c: introduce core Bloom filter constructs
  bloom.c: add the murmur3 hash implementation
  commit-graph: define and use MAX_NUM_CHUNKS
</content>
</entry>
<entry>
<title>commit-graph: close descriptors after mmap</title>
<updated>2020-04-25T05:25:50Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-04-23T21:41:13Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=c8828530b7797f5ab584c84dc2b86d3c14b39c8d'/>
<id>urn:sha1:c8828530b7797f5ab584c84dc2b86d3c14b39c8d</id>
<content type='text'>
We don't ever refer to the descriptor after mmap-ing it. And keeping it
open means we can run out of descriptors in degenerate cases (e.g.,
thousands of split chain files). Let's close it as soon as possible.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tests: write commit-graph with Bloom filters</title>
<updated>2020-04-16T22:38:04Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-04-16T20:14:03Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=b23ea9790d30e80c7c79a77aab2e9d150a189463'/>
<id>urn:sha1:b23ea9790d30e80c7c79a77aab2e9d150a189463</id>
<content type='text'>
The GIT_TEST_COMMIT_GRAPH environment variable updates the commit-
graph file whenever "git commit" is run, ensuring that we always
have an updated commit-graph throughout the test suite. The
GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS environment variable was
introduced to write the changed-path Bloom filters whenever "git
commit-graph write" is run. However, the GIT_TEST_COMMIT_GRAPH
trick doesn't launch a separate process and instead writes it
directly.

To expand the number of tests that have commits in the commit-graph
file, add a helper method that computes the commit-graph and place
that helper inside "git commit" and "git merge".

In the helper method, check GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS
to ensure we are writing changed-path Bloom filters whenever
possible.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph.h: replace 'commit_hex' with 'commits'</title>
<updated>2020-04-15T16:20:30Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2020-04-14T04:04:25Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=6830c360777468434184f60023e2562348c9dacc'/>
<id>urn:sha1:6830c360777468434184f60023e2562348c9dacc</id>
<content type='text'>
The 'write_commit_graph()' function takes in either a string list of
pack indices, or a string list of hexadecimal commit OIDs. These
correspond to the '--stdin-packs' and '--stdin-commits' mode(s) from
'git commit-graph write'.

Using a string_list of hexadecimal commit IDs is not the most efficient
use of memory, since we can instead use the 'struct oidset', which is
more well-suited for this case.

This has another benefit which will become apparent in the following
commit. This is that we are about to disambiguate the kinds of errors we
produce with '--stdin-commits' into "non-hex input" and "hex-input, but
referring to a non-commit object". By having 'write_commit_graph' take
in a 'struct oidset *' of commits, we place the burden on the caller (in
this case, the builtin) to handle the first case, and the commit-graph
machinery can handle the second case.

Suggested-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>builtin/commit-graph.c: introduce split strategy 'replace'</title>
<updated>2020-04-15T16:20:28Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2020-04-14T04:04:17Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=8a6ac287b2ba5f75bb2d9409dd97e9b501daf253'/>
<id>urn:sha1:8a6ac287b2ba5f75bb2d9409dd97e9b501daf253</id>
<content type='text'>
When using split commit-graphs, it is sometimes useful to completely
replace the commit-graph chain with a new base.

For example, consider a scenario in which a repository builds a new
commit-graph incremental for each push. Occasionally (say, after some
fixed number of pushes), they may wish to rebuild the commit-graph chain
with all reachable commits.

They can do so with

  $ git commit-graph write --reachable

but this removes the chain entirely and replaces it with a single
commit-graph in 'objects/info/commit-graph'. Unfortunately, this means
that the next push will have to move this commit-graph into the first
layer of a new chain, and then write its new commits on top.

Avoid such copying entirely by allowing the caller to specify that they
wish to replace the entirety of their commit-graph chain, while also
specifying that the new commit-graph should become the basis of a fresh,
length-one chain.

This addresses the above situation by making it possible for the caller
to instead write:

  $ git commit-graph write --reachable --split=replace

which writes a new length-one chain to 'objects/info/commit-graphs',
making the commit-graph incremental generated by the subsequent push
relatively cheap by avoiding the aforementioned copy.

In order to do this, remove an assumption in 'write_commit_graph_file'
that chains are always at least two incrementals long.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>builtin/commit-graph.c: introduce split strategy 'no-merge'</title>
<updated>2020-04-15T16:20:27Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2020-04-14T04:04:12Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=fdbde82fe523374b3c5d1f0f01f3c43dcaca9465'/>
<id>urn:sha1:fdbde82fe523374b3c5d1f0f01f3c43dcaca9465</id>
<content type='text'>
In the previous commit, we laid the groundwork for supporting different
splitting strategies. In this commit, we introduce the first splitting
strategy: 'no-merge'.

Passing '--split=no-merge' is useful for callers which wish to write a
new incremental commit-graph, but do not want to spend effort condensing
the incremental chain [1]. Previously, this was possible by passing
'--size-multiple=0', but this no longer the case following 63020f175f
(commit-graph: prefer default size_mult when given zero, 2020-01-02).

When '--split=no-merge' is given, the commit-graph machinery will never
condense an existing chain, and it will always write a new incremental.

[1]: This might occur when, for example, a server administrator running
some program after each push may want to ensure that each job runs
proportional in time to the size of the push, and does not "jump" when
the commit-graph machinery decides to trigger a merge.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>builtin/commit-graph.c: support for '--split[=&lt;strategy&gt;]'</title>
<updated>2020-04-15T16:20:26Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2020-04-14T04:04:08Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=4f027355f6b6b5b2ba69e23ff50cb7313d13dd70'/>
<id>urn:sha1:4f027355f6b6b5b2ba69e23ff50cb7313d13dd70</id>
<content type='text'>
With '--split', the commit-graph machinery writes new commits in another
incremental commit-graph which is part of the existing chain, and
optionally decides to condense the chain into a single commit-graph.
This is done to ensure that the asymptotic behavior of looking up a
commit in an incremental chain is not dominated by the number of
incrementals in that chain. It can be controlled by the '--max-commits'
and '--size-multiple' options.

In the next two commits, we will introduce additional splitting
strategies that can exert additional control over:

  - when a split commit-graph is and isn't written, and

  - when the existing commit-graph chain is discarded completely and
    replaced with another graph

To prepare for this, make '--split' take an optional strategy (as in
'--split[=&lt;strategy&gt;]'), and add a new enum to describe which strategy
is being used. For now, no strategies are given, and the only enumerated
value is 'COMMIT_GRAPH_SPLIT_UNSPECIFIED', indicating the absence of a
strategy.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
