From 4169d8964523198ca89f507824c07b70ba833732 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Mon, 9 Oct 2023 17:04:58 -0400 Subject: commit-graph: check consistency of fanout table We use bsearch_hash() to look up items in the oid index of a commit-graph. It also has a fanout table to reduce the initial range in which we'll search. But since the fanout comes from the on-disk file, a corrupted or malicious file can cause us to look outside of the allocated index memory. One solution here would be to pass the total table size to bsearch_hash(), which could then bounds check the values it reads from the fanout. But there's an inexpensive up-front check we can do, and it's the same one used by the midx and pack idx code (both of which likewise have fanout tables and use bsearch_hash(), but are not affected by this bug): 1. We can check the value of the final fanout entry against the size of the table we got from the index chunk. These must always match, since the fanout is just slicing up the index. As a side note, the midx and pack idx code compute it the other way around: they use the final fanout value as the object count, and check the index size against it. Either is valid; if they disagree we cannot know which is wrong (a corrupted fanout value, or a too-small table of oids). 2. We can quickly scan the fanout table to make sure it is monotonically increasing. If it is, then we know that every value is less than or equal to the final value, and therefore less than or equal to the table size. It would also be sufficient to just check that each fanout value is smaller than the final one, but the midx and pack idx code both do a full monotonicity check. It's the same cost, and it catches some other corruptions (though not all; the checks done by "commit-graph verify" are more complete but more expensive, and our goal here is to be fast and memory-safe). There are two new tests. One just checks the final fanout value (this is the mirror image of the "too small oid lookup" case added for the midx in the previous commit; it's flipped here because commit-graph considers the oid lookup chunk to be the source of truth). The other actually creates a fanout with many out-of-bounds entries, and prior to this patch, it does cause the segfault you'd expect. But note that the error is not "your fanout entry is out-of-bounds", but rather "fanout value out of order". That's because we leave the final fanout value in place (to get past the table size check), making the index non-monotonic (the second-to-last entry is big, but the last one must remain small to match the actual table). We need adjustments to a few existing tests, as well: - an earlier test in t5318 corrupts the fanout and runs "commit-graph verify". Its message is now changed, since we catch the problem earlier (during the load step, rather than the careful validation step). - in t5324, we test that "commit-graph verify --shallow" does not do expensive verification on the base file of the chain. But the corruption it uses (munging a byte at offset 1000) happens to be in the middle of the fanout table. And now we detect that problem in the cheaper checks that are performed for every part of the graph. We'll push this back to offset 1500, which is only caught by the more expensive checksum validation. Likewise, there's a later test in t5324 which munges an offset 100 bytes into a file (also in the fanout table) that is referenced by an alternates file. So we now find that corruption during the load step, rather than the verification step. At the very least we need to change the error message (like the case above in t5318). But it is probably good to make sure we handle all parts of the verification even for alternate graph files. So let's likewise corrupt byte 1500 and make sure we found the invalid checksum. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/t5324-split-commit-graph.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 't/t5324-split-commit-graph.sh') diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 06bb897f02..55b5765e2d 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -317,7 +317,7 @@ test_expect_success 'verify --shallow does not check base contents' ' cd verify-shallow && git commit-graph verify && base_file=$graphdir/graph-$(head -n 1 $graphdir/commit-graph-chain).graph && - corrupt_file "$base_file" 1000 "\01" && + corrupt_file "$base_file" 1500 "\01" && git commit-graph verify --shallow && test_must_fail git commit-graph verify 2>test_err && grep -v "^+" test_err >err && @@ -391,10 +391,10 @@ test_expect_success 'verify across alternates' ' test_commit extra && git commit-graph write --reachable --split && tip_file=$graphdir/graph-$(tail -n 1 $graphdir/commit-graph-chain).graph && - corrupt_file "$tip_file" 100 "\01" && + corrupt_file "$tip_file" 1500 "\01" && test_must_fail git commit-graph verify --shallow 2>test_err && grep -v "^+" test_err >err && - test_i18ngrep "commit-graph has incorrect fanout value" err + test_i18ngrep "incorrect checksum" err ) ' -- cgit v1.2.3 From 6cf61d0db55291c3b8406a6ba8f20fdfb9a4a344 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Mon, 9 Oct 2023 17:05:41 -0400 Subject: commit-graph: bounds-check base graphs chunk When we are loading a commit-graph chain, we check that each slice of the chain points to the appropriate set of base graphs via its BASE chunk. But since we don't record the size of the chunk, we may access out-of-bounds memory if the file is corrupted. Since we know the number of entries we expect to find (based on the position within the commit-graph-chain file), we can just check the size up front. In theory this would also let us drop the st_mult() call a few lines later when we actually access the memory, since we know that the computed offset will fit in a size_t. But because the operands "g->hash_len" and "n" have types "unsigned char" and "int", we'd have to cast to size_t first. Leaving the st_mult() does that cast, and makes it more obvious that we don't have an overflow problem. Note that the test does not actually segfault before this patch, since it just reads garbage from the chunk after BASE (and indeed, it even rejects the file because that garbage does not have the expected hash value). You could construct a file with BASE at the end that did segfault, but corrupting the existing one is easy, and we can check stderr for the expected message. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- commit-graph.c | 8 +++++++- commit-graph.h | 1 + t/t5324-split-commit-graph.sh | 14 ++++++++++++++ 3 files changed, 22 insertions(+), 1 deletion(-) (limited to 't/t5324-split-commit-graph.sh') diff --git a/commit-graph.c b/commit-graph.c index e4860841fc..4377b547c8 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -435,7 +435,8 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s, read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph); pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges, &graph->chunk_extra_edges_size); - pair_chunk_unsafe(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs); + pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs, + &graph->chunk_base_graphs_size); if (s->commit_graph_generation_version >= 2) { pair_chunk_unsafe(cf, GRAPH_CHUNKID_GENERATION_DATA, @@ -546,6 +547,11 @@ static int add_graph_to_chain(struct commit_graph *g, return 0; } + if (g->chunk_base_graphs_size / g->hash_len < n) { + warning(_("commit-graph base graphs chunk is too small")); + return 0; + } + while (n) { n--; diff --git a/commit-graph.h b/commit-graph.h index 1f8a9de4fb..e4248ea05d 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -97,6 +97,7 @@ struct commit_graph { const unsigned char *chunk_extra_edges; size_t chunk_extra_edges_size; const unsigned char *chunk_base_graphs; + size_t chunk_base_graphs_size; const unsigned char *chunk_bloom_indexes; const unsigned char *chunk_bloom_data; diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 55b5765e2d..3c8482d073 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -2,6 +2,7 @@ test_description='split commit graph' . ./test-lib.sh +. "$TEST_DIRECTORY"/lib-chunk.sh GIT_TEST_COMMIT_GRAPH=0 GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 @@ -398,6 +399,19 @@ test_expect_success 'verify across alternates' ' ) ' +test_expect_success 'reader bounds-checks base-graph chunk' ' + git clone --no-hardlinks . corrupt-base-chunk && + ( + cd corrupt-base-chunk && + tip_file=$graphdir/graph-$(tail -n 1 $graphdir/commit-graph-chain).graph && + corrupt_chunk_file "$tip_file" BASE clear 01020304 && + git -c core.commitGraph=false log >expect.out && + git -c core.commitGraph=true log >out 2>err && + test_cmp expect.out out && + grep "commit-graph base graphs chunk is too small" err + ) +' + test_expect_success 'add octopus merge' ' git reset --hard commits/10 && git merge commits/3 commits/4 && -- cgit v1.2.3