<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/commit-graph.h, branch v2.25.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://www.git.shady.money/git/atom?h=v2.25.3</id>
<link rel='self' href='https://www.git.shady.money/git/atom?h=v2.25.3'/>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/'/>
<updated>2019-09-12T19:30:08Z</updated>
<entry>
<title>upload-pack: disable commit graph more gently for shallow traversal</title>
<updated>2019-09-12T19:30:08Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2019-09-12T14:44:45Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=6abada1880156df9c6a0de91fb23716e378e13d7'/>
<id>urn:sha1:6abada1880156df9c6a0de91fb23716e378e13d7</id>
<content type='text'>
When the client has asked for certain shallow options like
"deepen-since", we do a custom rev-list walk that pretends to be
shallow. Before doing so, we have to disable the commit-graph, since it
is not compatible with the shallow view of the repository. That's
handled by 829a321569 (commit-graph: close_commit_graph before shallow
walk, 2018-08-20). That commit literally closes and frees our
repo-&gt;objects-&gt;commit_graph struct.

That creates an interesting problem for commits that have _already_ been
parsed using the commit graph. Their commit-&gt;object.parsed flag is set,
their commit-&gt;graph_pos is set, but their commit-&gt;maybe_tree may still
be NULL. When somebody later calls repo_get_commit_tree(), we see that
we haven't loaded the tree oid yet and try to get it from the commit
graph. But since it has been freed, we segfault!

So the root of the issue is a data dependency between the commit's
lazy-load of the tree oid and the fact that the commit graph can go
away mid-process. How can we resolve it?

There are a couple of general approaches:

  1. The obvious answer is to avoid loading the tree from the graph when
     we see that it's NULL. But then what do we return for the tree oid?
     If we return NULL, our caller in do_traverse() will rightly
     complain that we have no tree. We'd have to fallback to loading the
     actual commit object and re-parsing it. That requires teaching
     parse_commit_buffer() to understand re-parsing (i.e., not starting
     from a clean slate and not leaking any allocated bits like parent
     list pointers).

  2. When we close the commit graph, walk through the set of in-memory
     objects and clear any graph_pos pointers. But this means we also
     have to "unparse" any such commits so that we know they still need
     to open the commit object to fill in their trees. So it's no less
     complicated than (1), and is more expensive (since we clear objects
     we might not later need).

  3. Stop freeing the commit-graph struct. Continue to let it be used
     for lazy-loads of tree oids, but let upload-pack specify that it
     shouldn't be used for further commit parsing.

  4. Push the whole shallow rev-list out to its own sub-process, with
     the commit-graph disabled from the start, giving it a clean memory
     space to work from.

I've chosen (3) here. Options (1) and (2) would work, but are
non-trivial to implement. Option (4) is more expensive, and I'm not sure
how complicated it is (shelling out for the actual rev-list part is
easy, but we do then parse the resulting commits internally, and I'm not
clear which parts need to be handling shallow-ness).

The new test in t5500 triggers this segfault, but see the comments there
for how horribly intimate it has to be with how both upload-pack and
commit graphs work.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: error out on invalid commit oids in 'write --stdin-commits'</title>
<updated>2019-08-05T21:33:39Z</updated>
<author>
<name>SZEDER Gábor</name>
<email>szeder.dev@gmail.com</email>
</author>
<published>2019-08-05T08:02:40Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=7c5c9b9c57d58273d17dfc3fec3ebdb25077a9de'/>
<id>urn:sha1:7c5c9b9c57d58273d17dfc3fec3ebdb25077a9de</id>
<content type='text'>
While 'git commit-graph write --stdin-commits' expects commit object
ids as input, it accepts and silently skips over any invalid commit
object ids, and still exits with success:

  # nonsense
  $ echo not-a-commit-oid | git commit-graph write --stdin-commits
  $ echo $?
  0
  # sometimes I forgot that refs are not good...
  $ echo HEAD | git commit-graph write --stdin-commits
  $ echo $?
  0
  # valid tree OID, but not a commit OID
  $ git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits
  $ echo $?
  0
  $ ls -l .git/objects/info/commit-graph
  ls: cannot access '.git/objects/info/commit-graph': No such file or directory

Check that all input records are indeed valid commit object ids and
return with error otherwise, the same way '--stdin-packs' handles
invalid input; see e103f7276f (commit-graph: return with errors during
write, 2019-06-12).

Note that it should only return with error when encountering an
invalid commit object id coming from standard input.  However,
'--reachable' uses the same code path to process object ids pointed to
by all refs, and that includes tag object ids as well, which should
still be skipped over.  Therefore add a new flag to 'enum
commit_graph_write_flags' and a corresponding field to 'struct
write_commit_graph_context', so we can differentiate between those two
cases.

Signed-off-by: SZEDER Gábor &lt;szeder.dev@gmail.com&gt;
Acked-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: turn a group of write-related macro flags into an enum</title>
<updated>2019-08-05T21:31:24Z</updated>
<author>
<name>SZEDER Gábor</name>
<email>szeder.dev@gmail.com</email>
</author>
<published>2019-08-05T08:02:39Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=39d88318569188c0544c3c0f8207f2e1b1829e60'/>
<id>urn:sha1:39d88318569188c0544c3c0f8207f2e1b1829e60</id>
<content type='text'>
Signed-off-by: SZEDER Gábor &lt;szeder.dev@gmail.com&gt;
Acked-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: verify chains with --shallow mode</title>
<updated>2019-06-20T03:46:26Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-06-18T18:14:32Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=3da4b609bb14b13672f64af908706462617f53cb'/>
<id>urn:sha1:3da4b609bb14b13672f64af908706462617f53cb</id>
<content type='text'>
If we wrote a commit-graph chain, we only modified the tip file in
the chain. It is valuable to verify what we wrote, but not waste
time checking files we did not write.

Add a '--shallow' option to the 'git commit-graph verify' subcommand
and check that it does not read the base graph in a two-file chain.

Making the verify subcommand read from a chain of commit-graphs takes
some rearranging of the builtin code.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: create options for split files</title>
<updated>2019-06-20T03:46:26Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-06-18T18:14:32Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=c2bc6e6ab0ade70c475a73d06326e677e70840d2'/>
<id>urn:sha1:c2bc6e6ab0ade70c475a73d06326e677e70840d2</id>
<content type='text'>
The split commit-graph feature is now fully implemented, but needs
some more run-time configurability. Allow direct callers to 'git
commit-graph write --split' to specify the values used in the
merge strategy and the expire time.

Update the documentation to specify these values.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: allow cross-alternate chains</title>
<updated>2019-06-20T03:46:26Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-06-18T18:14:30Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=c523035cbd8515d095dc15e9661fd896733bedbc'/>
<id>urn:sha1:c523035cbd8515d095dc15e9661fd896733bedbc</id>
<content type='text'>
In an environment like a fork network, it is helpful to have a
commit-graph chain that spans both the base repo and the fork repo. The
fork is usually a small set of data on top of the large repo, but
sometimes the fork is much larger. For example, git-for-windows/git has
almost double the number of commits as git/git because it rebases its
commits on every major version update.

To allow cross-alternate commit-graph chains, we need a few pieces:

1. When looking for a graph-{hash}.graph file, check all alternates.

2. When merging commit-graph chains, do not merge across alternates.

3. When writing a new commit-graph chain based on a commit-graph file
   in another object directory, do not allow success if the base file
   has of the name "commit-graph" instead of
   "commit-graphs/graph-{hash}.graph".

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: write commit-graph chains</title>
<updated>2019-06-20T03:46:26Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-06-18T18:14:27Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=6c622f9f0bbb38a23341dc4294f56d0d909b3d50'/>
<id>urn:sha1:6c622f9f0bbb38a23341dc4294f56d0d909b3d50</id>
<content type='text'>
Extend write_commit_graph() to write a commit-graph chain when given the
COMMIT_GRAPH_SPLIT flag.

This implementation is purposefully simplistic in how it creates a new
chain. The commits not already in the chain are added to a new tip
commit-graph file.

Much of the logic around writing a graph-{hash}.graph file and updating
the commit-graph-chain file is the same as the commit-graph file case.
However, there are several places where we need to do some extra logic
in the split case.

Track the list of graph filenames before and after the planned write.
This will be more important when we start merging graph files, but it
also allows us to upgrade our commit-graph file to the appropriate
graph-{hash}.graph file when we upgrade to a chain of commit-graphs.

Note that we use the eighth byte of the commit-graph header to store the
number of base graph files. This determines the length of the base
graphs chunk.

A subtle change of behavior with the new logic is that we do not write a
commit-graph if we our commit list is empty. This extends to the typical
case, which is reflected in t5318-commit-graph.sh.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: add base graphs chunk</title>
<updated>2019-06-20T03:46:26Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-06-18T18:14:26Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=118bd570029f5610abdbf1a220e87d3a5c241f5f'/>
<id>urn:sha1:118bd570029f5610abdbf1a220e87d3a5c241f5f</id>
<content type='text'>
To quickly verify a commit-graph chain is valid on load, we will
read from the new "Base Graphs Chunk" of each file in the chain.
This will prevent accidentally loading incorrect data from manually
editing the commit-graph-chain file or renaming graph-{hash}.graph
files.

The commit_graph struct already had an object_id struct "oid", but
it was never initialized or used. Add a line to read the hash from
the end of the commit-graph file and into the oid member.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: prepare for commit-graph chains</title>
<updated>2019-06-20T03:46:25Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-06-18T18:14:24Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=d4f4d60f6da8b93f768b2b4958c04cc1b3cea443'/>
<id>urn:sha1:d4f4d60f6da8b93f768b2b4958c04cc1b3cea443</id>
<content type='text'>
To prepare for a chain of commit-graph files, augment the
commit_graph struct to point to a base commit_graph. As we load
commits from the graph, we may actually want to read from a base
file according to the graph position.

The "graph position" of a commit is given by concatenating the
lexicographic commit orders from each of the commit-graph files in
the chain. This means that we must distinguish two values:

 * lexicographic index : the position within the lexicographic
   order in a single commit-graph file.

 * graph position: the position within the concatenated order
   of multiple commit-graph files

Given the lexicographic index of a commit in a graph, we can
compute the graph position by adding the number of commits in
the lower-level graphs. To find the lexicographic index of
a commit, we subtract the number of commits in lower-level graphs.

While here, change insert_parent_or_die() to take a uint32_t
position, as that is the type used by its only caller and that
makes more sense with the limits in the commit-graph format.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: use raw_object_store when closing</title>
<updated>2019-06-12T18:33:54Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-05-17T18:41:47Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=c3a3a964b29221a9b5fa305a08037c90b9f74be0'/>
<id>urn:sha1:c3a3a964b29221a9b5fa305a08037c90b9f74be0</id>
<content type='text'>
The close_commit_graph() method took a repository struct, but then
only uses the raw_object_store within. Change the function prototype
to make the method more flexible.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
