aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/technical (follow)
AgeCommit message (Collapse)AuthorFilesLines
2024-12-20path-walk: allow consumer to specify object typesDerrick Stolee1-0/+9
We add the ability to filter the object types in the path-walk API so the callback function is called fewer times. This adds the ability to ask for the commits in a list, as well. We re-use the empty string for this set of objects because these are passed directly to the callback function instead of being part of the 'path_stack'. Future changes will add the ability to visit annotated tags. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20t6601: add helper for testing path-walk APIDerrick Stolee1-1/+2
Add some tests based on the current behavior, doing interesting checks for different sets of branches, ranges, and the --boundary option. This sets a baseline for the behavior and we can extend it as new options are introduced. Store and output a 'batch_nr' value so we can demonstrate that the paths are grouped together in a batch and not following some other ordering. This allows us to test the depth-first behavior of the path-walk API. However, we purposefully do not test the order of the objects in the batch, so the output is compared to the expected output through a sort. It is important to mention that the behavior of the API will change soon as we start to handle UNINTERESTING objects differently, but these tests will demonstrate the change in behavior. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20path-walk: introduce an object walk by pathDerrick Stolee1-0/+45
In anticipation of a few planned applications, introduce the most basic form of a path-walk API. It currently assumes that there are no UNINTERESTING objects, and does not include any complicated filters. It calls a function pointer on groups of tree and blob objects as grouped by path. This only includes objects the first time they are discovered, so an object that appears at multiple paths will not be included in two batches. These batches are collected in 'struct type_and_oid_list' objects, which store an object type and an oid_array of objects. The data structures are documented in 'struct path_walk_context', but in summary the most important are: * 'paths_to_lists' is a strmap that connects a path to a type_and_oid_list for that path. To avoid conflicts in path names, we make sure that tree paths end in "/" (except the root path with is an empty string) and blob paths do not end in "/". * 'path_stack' is a string list that is added to in an append-only way. This stores the stack of our depth-first search on the heap instead of using recursion. * 'path_stack_pushed' is a strmap that stores path names that were already added to 'path_stack', to avoid repeating paths in the stack. Mostly, this saves us from quadratic lookups from doing unsorted checks into the string_list. The coupling of 'path_stack' and 'path_stack_pushed' is protected by the push_to_stack() method. Call this instead of inserting into these structures directly. The walk_objects_by_path() method initializes these structures and starts walking commits from the given rev_info struct. The commits are used to find the list of root trees which populate the start of our depth-first search. The core of our depth-first search is in a while loop that continues while we have not indicated an early exit and our 'path_stack' still has entries in it. The loop body pops a path off of the stack and "visits" the path via the walk_path() method. The walk_path() method gets the list of OIDs from the 'path_to_lists' strmap and executes the callback method on that list with the given path and type. If the OIDs correspond to tree objects, then iterate over all trees in the list and run add_children() to add the child objects to their own lists, adding new entries to the stack if necessary. In testing, this depth-first search approach was the one that used the least memory while iterating over the object lists. There is still a chance that repositories with too-wide path patterns could cause memory pressure issues. Limiting the stack size could be done in the future by limiting how many objects are being considered in-progress, or by visiting blob paths earlier than trees. There are many future adaptations that could be made, but they are left for future updates when consumers are ready to take advantage of those features. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07Documentation: add comparison of build systemsPatrick Steinhardt1-0/+224
We're contemplating whether to eventually replace our build systems with a build system that is easier to use. Add a comparison of build systems to our technical documentation as a baseline for discussion. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-22doc: consolidate extensions in git-config documentationCaleb White3-44/+6
The `technical/repository-version.txt` document originally served as the master list for extensions, requiring that any new extensions be defined there. However, the `config/extensions.txt` file was introduced later and has since become the de facto location for describing extensions, with several extensions listed there but missing from `repository-version.txt`. This consolidates all extension definitions into `config/extensions.txt`, making it the authoritative source for extensions. The references in `repository-version.txt` are updated to point to `config/extensions.txt`, and cross-references to related documentation such as `gitrepository-layout[5]` and `git-config[1]` are added. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Caleb White <cdwhite3@pm.me> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-07docs: fix the `maintain-git` links in `technical/platform-support`Johannes Schindelin1-2/+2
These links should point to `.html` files, not to `.txt` ones. Compare also to 4945f046c7f (api docs: link to html version of api-trace2, 2022-09-16). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-23Documentation/technical: fix a typoAndrew Kreimer1-1/+1
Fix a typo in documentation. Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-18Merge branch 'ps/clar-unit-test'Junio C Hamano1-0/+2
Import clar unit tests framework libgit2 folks invented for our use. * ps/clar-unit-test: Makefile: rename clar-related variables to avoid confusion clar: add CMake support t/unit-tests: convert ctype tests to use clar t/unit-tests: convert strvec tests to use clar t/unit-tests: implement test driver Makefile: wire up the clar unit testing framework Makefile: do not use sparse on third-party sources Makefile: make hdr-check depend on generated headers Makefile: fix sparse dependency on GENERATED_H clar: stop including `shellapi.h` unnecessarily clar(win32): avoid compile error due to unused `fs_copy()` clar: avoid compile error with mingw-w64 t/clar: fix compatibility with NonStop t: import the clar unit testing framework t: do not pass GIT_TEST_OPTS to unit tests with prove
2024-09-04t: import the clar unit testing frameworkPatrick Steinhardt1-0/+2
Our unit testing framework is a homegrown solution. While it supports most of our needs, it is likely that the volume of unit tests will grow quite a bit in the future such that we can exercise low-level subsystems directly. This surfaces several shortcomings that the current solution has: - There is no way to run only one specific tests. While some of our unit tests wire this up manually, others don't. In general, it requires quite a bit of boilerplate to get this set up correctly. - Failures do not cause a test to stop execution directly. Instead, the test author needs to return manually whenever an assertion fails. This is rather verbose and is not done correctly in most of our unit tests. - Wiring up a new testcase requires both implementing the test function and calling it in the respective test suite's main function, which is creating code duplication. We can of course fix all of these issues ourselves, but that feels rather pointless when there are already so many unit testing frameworks out there that have those features. We line out some requirements for any unit testing framework in "Documentation/technical/unit-tests.txt". The "clar" unit testing framework, which isn't listed in that table yet, ticks many of the boxes: - It is licensed under ISC, which is compatible. - It is easily vendorable because it is rather tiny at around 1200 lines of code. - It is easily hackable due to the same reason. - It has TAP support. - It has skippable tests. - It preprocesses test files in order to extract test functions, which then get wired up automatically. While it's not perfect, the fact that clar originates from the libgit2 project means that it should be rather easy for us to collaborate with upstream to plug any gaps. Import the clar unit testing framework at commit 1516124 (Merge pull request #97 from pks-t/pks-whitespace-fixes, 2024-08-15). The framework will be wired up in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22trace2: implement trace2_printf() for event targetJosh Steadmon1-2/+15
The trace2 event target does not have an implementation for trace2_printf(). While the event target is for structured events, and trace2_printf() is for unstructured, human-readable messages, it may still be useful to wrap these unstructured messages in a structured JSON object. Among other things, it may reduce confusion when manually debugging using event trace data. Add a simple implementation for the event target that wraps trace2_printf() messages in a minimal JSON object. Document this in Documentation/technical/api-trace2.txt, and bump the event format version since we're adding a new event type. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-19Merge branch 'tb/incremental-midx-part-1'Junio C Hamano1-0/+103
Incremental updates of multi-pack index files. * tb/incremental-midx-part-1: midx: implement support for writing incremental MIDX chains t/t5313-pack-bounds-checks.sh: prepare for sub-directories t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' midx: implement verification support for incremental MIDXs midx: support reading incremental MIDX chains midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs midx: teach `midx_preferred_pack()` about incremental MIDXs midx: teach `midx_contains_pack()` about incremental MIDXs midx: remove unused `midx_locate_pack()` midx: teach `fill_midx_entry()` about incremental MIDXs midx: teach `nth_midxed_offset()` about incremental MIDXs midx: teach `bsearch_midx()` about incremental MIDXs midx: introduce `bsearch_one_midx()` midx: teach `nth_bitmapped_pack()` about incremental MIDXs midx: teach `nth_midxed_object_oid()` about incremental MIDXs midx: teach `prepare_midx_pack()` about incremental MIDXs midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs midx: add new fields for incremental MIDX chains Documentation: describe incremental MIDX format
2024-08-06Documentation: describe incremental MIDX formatTaylor Blau1-0/+103
Prepare to implement incremental multi-pack indexes (MIDXs) over the next several commits by first describing the relevant prerequisites (like a new chunk in the MIDX format, the directory structure for incremental MIDXs, etc.) The format is described in detail in the patch contents below, but the high-level description is as follows. Incremental MIDXs live in $GIT_DIR/objects/pack/multi-pack-index.d, and each `*.midx` within that directory has a single "parent" MIDX, which is the MIDX layer immediately before it in the MIDX chain. The chain order resides in a file 'multi-pack-index-chain' in the same directory. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-02Documentation: add platform support policyEmily Shaffer1-0/+190
Supporting many platforms is only possible when we have the right tools to ensure that support. Teach platform maintainers how they can help us to help them, by explaining what kind of tooling support we would like to have, and what level of support becomes available as a result. Provide examples so that platform maintainers can see what we're asking for in practice. With this policy in place, we can make changes with stronger assurance that we are not breaking anybody we promised not to. Instead, we can feel confident that our existing testing and integration practices protect those who care from breakage. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14Documentation/technical/bitmap-format.txt: add missing position tableTaylor Blau1-0/+9
While investigating a benign Coverity warning on the new pseudo-merge implementation, I was struggling to understand the (paraphrased) below: ofs = index_end - 24 - (index->pseudo_merges.nr * sizeof(uint64_t)); for (i = 0; i < index->pseudo_merges.nr; i++) { index->pseudo_merges.v[i].at = get_be64(ofs); ofs += sizeof(uint64_t); } , in pack-bitmap.c::load_bitmap_header(). Looking at the documentation, the diagram describing the on-disk format (prior to this patch) suggested that the optional extended lookup table immediately preceded the trailing metadata portion. If that were the case, that would make the above code from load_bitmap_header() incorrect, as we'd be blindly reading into the extended offset table. But later on in the documentation there is a description of the pseudo-merge position table as immediately preceding the trailing metadata portion of the extension. And indeed, we do write the position table in pack-bitmap-write.c: /* write positions for all pseudo merges */ for (i = 0; i < writer->pseudo_merges_nr; i++) hashwrite_be64(f, pseudo_merge_ofs[i]); hashwrite_be32(f, writer->pseudo_merges_nr); hashwrite_be32(f, kh_size(writer->pseudo_merge_commits)); hashwrite_be64(f, table_start - start); hashwrite_be64(f, hashfile_total(f) - start + sizeof(uint64_t)); So this is purely a case of the diagram being out of sync with the textual description and actual implementation of the format specification. Add the missing component back to the format diagram to avoid further confusion in this area. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-24Documentation/technical: describe pseudo-merge bitmaps formatTaylor Blau1-0/+132
Prepare to implement pseudo-merge bitmaps over the next several commits by first describing the serialization format which will store the new pseudo-merge bitmaps themselves. This format is implemented as an optional extension within the bitmap v1 format, making it compatible with previous versions of Git, as well as the original .bitmap implementation within JGit. The format is described in detail in the patch contents below, but the high-level description is as follows: - An array of pseudo-merge bitmaps, each containing a pair of EWAH bitmaps: one describing the set of pseudo-merge "parents", and another describing the set of object(s) reachable from those parents. - A lookup table to determine which pseudo-merge(s) a given commit appears in. An optional extended lookup table follows when there is at least one commit which appears in multiple pseudo-merge groups. - Trailing metadata, including the number of pseudo-merge(s), number of unique parents, the offset within the .bitmap file for the pseudo-merge commit lookup table, and the size of the optional extension itself. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-07refs: introduce reftable backendPatrick Steinhardt1-2/+3
Due to scalability issues, Shawn Pearce has originally proposed a new "reftable" format more than six years ago [1]. Initially, this new format was implemented in JGit with promising results. Around two years ago, we have then added the "reftable" library to the Git codebase via a4bbd13be3 (Merge branch 'hn/reftable', 2021-12-15). With this we have landed all the low-level code to read and write reftables. Notably missing though was the integration of this low-level code into the Git code base in the form of a new ref backend that ties all of this together. This gap is now finally closed by introducing a new "reftable" backend into the Git codebase. This new backend promises to bring some notable improvements to Git repositories: - It becomes possible to do truly atomic writes where either all refs are committed to disk or none are. This was not possible with the "files" backend because ref updates were split across multiple loose files. - The disk space required to store many refs is reduced, both compared to loose refs and packed-refs. This is enabled both by the reftable format being a binary format, which is more compact, and by prefix compression. - We can ignore filesystem-specific behaviour as ref names are not encoded via paths anymore. This means there is no need to handle case sensitivity on Windows systems or Unicode precomposition on macOS. - There is no need to rewrite the complete refdb anymore every time a ref is being deleted like it was the case for packed-refs. This means that ref deletions are now constant time instead of scaling linearly with the number of refs. - We can ignore file/directory conflicts so that it becomes possible to store both "refs/heads/foo" and "refs/heads/foo/bar". - Due to this property we can retain reflogs for deleted refs. We have previously been deleting reflogs together with their refs to avoid file/directory conflicts, which is not necessary anymore. - We can properly enumerate all refs. With the "files" backend it is not easily possible to distinguish between refs and non-refs because they may live side by side in the gitdir. Not all of these improvements are realized with the current "reftable" backend implementation. At this point, the new backend is supposed to be a drop-in replacement for the "files" backend that is used by basically all Git repositories nowadays. It strives for 1:1 compatibility, which means that a user can expect the same behaviour regardless of whether they use the "reftable" backend or the "files" backend for most of the part. Most notably, this means we artificially limit the capabilities of the "reftable" backend to match the limits of the "files" backend. It is not possible to create refs that would end up with file/directory conflicts, we do not retain reflogs, we perform stricter-than-necessary checks. This is done intentionally due to two main reasons: - It makes it significantly easier to land the "reftable" backend as tests behave the same. It would be tough to argue for each and every single test that doesn't pass with the "reftable" backend. - It ensures compatibility between repositories that use the "files" backend and repositories that use the "reftable" backend. Like this, hosters can migrate their repositories to use the "reftable" backend without causing issues for clients that use the "files" backend in their clones. It is expected that these artificial limitations may eventually go away in the long term. Performance-wise things very much depend on the actual workload. The following benchmarks compare the "files" and "reftable" backends in the current version: - Creating N refs in separate transactions shows that the "files" backend is ~50% faster. This is not surprising given that creating a ref only requires us to create a single loose ref. The "reftable" backend will also perform auto compaction on updates. In real-world workloads we would likely also want to perform pack loose refs, which would likely change the picture. Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1) Time (mean ± σ): 2.1 ms ± 0.3 ms [User: 0.6 ms, System: 1.7 ms] Range (min … max): 1.8 ms … 4.3 ms 133 runs Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1) Time (mean ± σ): 2.7 ms ± 0.1 ms [User: 0.6 ms, System: 2.2 ms] Range (min … max): 2.4 ms … 2.9 ms 132 runs Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000) Time (mean ± σ): 1.975 s ± 0.006 s [User: 0.437 s, System: 1.535 s] Range (min … max): 1.969 s … 1.980 s 3 runs Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000) Time (mean ± σ): 2.611 s ± 0.013 s [User: 0.782 s, System: 1.825 s] Range (min … max): 2.597 s … 2.622 s 3 runs Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000) Time (mean ± σ): 198.442 s ± 0.241 s [User: 43.051 s, System: 155.250 s] Range (min … max): 198.189 s … 198.670 s 3 runs Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000) Time (mean ± σ): 294.509 s ± 4.269 s [User: 104.046 s, System: 190.326 s] Range (min … max): 290.223 s … 298.761 s 3 runs - Creating N refs in a single transaction shows that the "files" backend is significantly slower once we start to write many refs. The "reftable" backend only needs to update two files, whereas the "files" backend needs to write one file per ref. Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1) Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.4 ms, System: 1.4 ms] Range (min … max): 1.8 ms … 2.6 ms 151 runs Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1) Time (mean ± σ): 2.5 ms ± 0.1 ms [User: 0.7 ms, System: 1.7 ms] Range (min … max): 2.4 ms … 3.4 ms 148 runs Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000) Time (mean ± σ): 152.5 ms ± 5.2 ms [User: 19.1 ms, System: 133.1 ms] Range (min … max): 148.5 ms … 167.8 ms 15 runs Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000) Time (mean ± σ): 58.0 ms ± 2.5 ms [User: 28.4 ms, System: 29.4 ms] Range (min … max): 56.3 ms … 72.9 ms 40 runs Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000) Time (mean ± σ): 152.752 s ± 0.710 s [User: 20.315 s, System: 131.310 s] Range (min … max): 152.165 s … 153.542 s 3 runs Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000) Time (mean ± σ): 51.912 s ± 0.127 s [User: 26.483 s, System: 25.424 s] Range (min … max): 51.769 s … 52.012 s 3 runs - Deleting a ref in a fully-packed repository shows that the "files" backend scales with the number of refs. The "reftable" backend has constant-time deletions. Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1) Time (mean ± σ): 1.7 ms ± 0.1 ms [User: 0.4 ms, System: 1.2 ms] Range (min … max): 1.6 ms … 2.1 ms 316 runs Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1) Time (mean ± σ): 1.8 ms ± 0.1 ms [User: 0.4 ms, System: 1.3 ms] Range (min … max): 1.7 ms … 2.1 ms 294 runs Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000) Time (mean ± σ): 2.0 ms ± 0.1 ms [User: 0.5 ms, System: 1.4 ms] Range (min … max): 1.9 ms … 2.5 ms 287 runs Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000) Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.5 ms, System: 1.3 ms] Range (min … max): 1.8 ms … 2.1 ms 217 runs Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000) Time (mean ± σ): 229.8 ms ± 7.9 ms [User: 182.6 ms, System: 46.8 ms] Range (min … max): 224.6 ms … 245.2 ms 6 runs Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000) Time (mean ± σ): 2.0 ms ± 0.0 ms [User: 0.6 ms, System: 1.3 ms] Range (min … max): 2.0 ms … 2.1 ms 3 runs - Listing all refs shows no significant advantage for either of the backends. The "files" backend is a bit faster, but not by a significant margin. When repositories are not packed the "reftable" backend outperforms the "files" backend because the "reftable" backend performs auto-compaction. Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.0 ms 1729 runs Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 1.8 ms 1816 runs Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true) Time (mean ± σ): 4.3 ms ± 0.1 ms [User: 0.9 ms, System: 3.3 ms] Range (min … max): 4.1 ms … 4.6 ms 645 runs Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true) Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.0 ms, System: 3.3 ms] Range (min … max): 4.2 ms … 5.9 ms 643 runs Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true) Time (mean ± σ): 2.537 s ± 0.034 s [User: 0.488 s, System: 2.048 s] Range (min … max): 2.511 s … 2.627 s 10 runs Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true) Time (mean ± σ): 2.712 s ± 0.017 s [User: 0.653 s, System: 2.059 s] Range (min … max): 2.692 s … 2.752 s 10 runs Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 1.9 ms 1834 runs Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 2.0 ms 1840 runs Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false) Time (mean ± σ): 13.8 ms ± 0.2 ms [User: 2.8 ms, System: 10.8 ms] Range (min … max): 13.3 ms … 14.5 ms 208 runs Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false) Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.2 ms, System: 3.3 ms] Range (min … max): 4.3 ms … 6.2 ms 624 runs Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false) Time (mean ± σ): 12.127 s ± 0.129 s [User: 2.675 s, System: 9.451 s] Range (min … max): 11.965 s … 12.370 s 10 runs Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false) Time (mean ± σ): 2.799 s ± 0.022 s [User: 0.735 s, System: 2.063 s] Range (min … max): 2.769 s … 2.836 s 10 runs - Printing a single ref shows no real difference between the "files" and "reftable" backends. Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1) Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.4 ms, System: 1.0 ms] Range (min … max): 1.4 ms … 1.8 ms 1779 runs Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 2.5 ms 1753 runs Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000) Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.3 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 1.9 ms 1840 runs Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.0 ms 1831 runs Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.1 ms 1848 runs Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.1 ms 1762 runs So overall, performance depends on the usecases. Except for many sequential writes the "reftable" backend is roughly on par or significantly faster than the "files" backend though. Given that the "files" backend has received 18 years of optimizations by now this can be seen as a win. Furthermore, we can expect that the "reftable" backend will grow faster over time when attention turns more towards optimizations. The complete test suite passes, except for those tests explicitly marked to require the REFFILES prerequisite. Some tests in t0610 are marked as failing because they depend on still-in-flight bug fixes. Tests can be run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT environment variable to "reftable". There is a single known conceptual incompatibility with the dumb HTTP transport. As "info/refs" SHOULD NOT contain the HEAD reference, and because the "HEAD" file is not valid anymore, it is impossible for the remote client to figure out the default branch without changing the protocol. This shortcoming needs to be handled in a subsequent patch series. As the reftable library has already been introduced a while ago, this commit message will not go into the details of how exactly the on-disk format works. Please refer to our preexisting technical documentation at Documentation/technical/reftable for this. [1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/ Original-idea-by: Shawn Pearce <spearce@spearce.org> Based-on-patch-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-02setup: introduce "extensions.refStorage" extensionPatrick Steinhardt1-0/+5
Introduce a new "extensions.refStorage" extension that allows us to specify the ref storage format used by a repository. For now, the only supported format is the "files" format, but this list will likely soon be extended to also support the upcoming "reftable" format. There have been discussions on the Git mailing list in the past around how exactly this extension should look like. One alternative [1] that was discussed was whether it would make sense to model the extension in such a way that backends are arbitrarily stackable. This would allow for a combined value of e.g. "loose,packed-refs" or "loose,reftable", which indicates that new refs would be written via "loose" files backend and compressed into "packed-refs" or "reftable" backends, respectively. It is arguable though whether this flexibility and the complexity that it brings with it is really required for now. It is not foreseeable that there will be a proliferation of backends in the near-term future, and the current set of existing formats and formats which are on the horizon can easily be configured with the much simpler proposal where we have a single value, only. Furthermore, if we ever see that we indeed want to gain the ability to arbitrarily stack the ref formats, then we can adapt the current extension rather easily. Given that Git clients will refuse any unknown value for the "extensions.refStorage" extension they would also know to ignore a stacked "loose,packed-refs" in the future. So let's stick with the easy proposal for the time being and wire up the extension. [1]: <pull.1408.git.1667846164.gitgitgadget@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09Merge branch 'js/doc-unit-tests'Junio C Hamano1-0/+240
Process to add some form of low-level unit tests has started. * js/doc-unit-tests: ci: run unit tests in CI unit tests: add TAP unit test framework unit tests: add a project plan document
2023-11-10unit tests: add a project plan documentJosh Steadmon1-0/+240
In our current testing environment, we spend a significant amount of effort crafting end-to-end tests for error conditions that could easily be captured by unit tests (or we simply forgo some hard-to-setup and rare error conditions). Describe what we hope to accomplish by implementing unit tests, and explain some open questions and milestones. Discuss desired features for test frameworks/harnesses, and provide a comparison of several different frameworks. Finally, document our rationale for implementing a custom framework. Co-authored-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: add missing quotesElijah Newren1-1/+1
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: add some commas where they are helpfulElijah Newren2-3/+3
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: fix capitalizationElijah Newren1-1/+1
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: add missing articleElijah Newren2-2/+2
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: fix choice of articleElijah Newren1-1/+1
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: fix singular vs. pluralElijah Newren3-3/+3
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: fix verb tenseElijah Newren1-1/+1
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: fix subject/verb agreementElijah Newren2-3/+3
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: remove extraneous wordsElijah Newren3-3/+3
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: add missing wordsElijah Newren4-4/+4
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: fix apostrophe usageElijah Newren1-1/+1
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: fix typosElijah Newren2-3/+3
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: wording improvementsElijah Newren6-8/+8
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29Merge branch 'en/header-split-cache-h-part-3'Junio C Hamano1-2/+2
Header files cleanup. * en/header-split-cache-h-part-3: (28 commits) fsmonitor-ll.h: split this header out of fsmonitor.h hash-ll, hashmap: move oidhash() to hash-ll object-store-ll.h: split this header out of object-store.h khash: name the structs that khash declares merge-ll: rename from ll-merge git-compat-util.h: remove unneccessary include of wildmatch.h builtin.h: remove unneccessary includes list-objects-filter-options.h: remove unneccessary include diff.h: remove unnecessary include of oidset.h repository: remove unnecessary include of path.h log-tree: replace include of revision.h with simple forward declaration cache.h: remove this no-longer-used header read-cache*.h: move declarations for read-cache.c functions from cache.h repository.h: move declaration of the_index from cache.h merge.h: move declarations for merge.c from cache.h diff.h: move declaration for global in diff.c from cache.h preload-index.h: move declarations for preload-index.c from elsewhere sparse-index.h: move declarations for sparse-index.c from cache.h name-hash.h: move declarations for name-hash.c from cache.h run-command.h: move declarations for run-command.c from cache.h ...
2023-06-21merge-ll: rename from ll-mergeElijah Newren1-2/+2
A long term (but rather minor) pet-peeve of mine was the name ll-merge.[ch]. I thought it made it harder to realize what stuff was related to merging when I was working on the merge machinery and trying to improve it. Further, back in d1cbe1e6d8a ("hash-ll.h: split out of hash.h to remove dependency on repository.h", 2023-04-22), we have split the portions of hash.h that do not depend upon repository.h into a "hash-ll.h" (due to the recommendation to use "ll" for "low-level" in its name[1], but which I used as a suffix precisely because of my distaste for "ll-merge"). When we discussed adding additional "*-ll.h" files, a request was made that we use "ll" consistently as either a prefix or a suffix. Since it is already in use as both a prefix and a suffix, the only way to do so is to rename some files. Besides my distaste for the ll-merge.[ch] name, let me also note that the files ll-fsmonitor.h, ll-hash.h, ll-merge.h, ll-object-store.h, ll-read-cache.h would have essentially nothing to do with each other and make no sense to group. But giving them the common "ll-" prefix would group them. Using "-ll" as a suffix thus seems just much more logical to me. Rename ll-merge.[ch] to merge-ll.[ch] to achieve this consistency, and to ensure we get a more logical grouping of files. [1] https://lore.kernel.org/git/kl6lsfcu1g8w.fsf@chooglen-macbookpro.roam.corp.google.com/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12docs: typofixesLinus Arver1-1/+1
These were found with an automated CLI tool [1]. Only the "Documentation" subfolder (and not source code files) was considered because the docs are user-facing. [1]: https://crates.io/crates/typos-cli Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15Merge branch 'ds/bundle-uri-5'Junio C Hamano1-4/+4
The bundle-URI subsystem adds support for creation-token heuristics to help incremental fetches. * ds/bundle-uri-5: bundle-uri: test missing bundles with heuristic bundle-uri: store fetch.bundleCreationToken fetch: fetch from an external bundle URI bundle-uri: drop bundle.flag from design doc clone: set fetch.bundleURI if appropriate bundle-uri: download in creationToken order bundle-uri: parse bundle.<id>.creationToken values bundle-uri: parse bundle.heuristic=creationToken t5558: add tests for creationToken heuristic bundle: verify using check_connected() bundle: test unbundling with incomplete history
2023-01-31bundle-uri: drop bundle.flag from design docDerrick Stolee1-4/+4
The Implementation Plan section lists a 'bundle.flag' option that is not documented anywhere else. What is documented elsewhere in the document and implemented by previous changes is the 'bundle.heuristic' config key. For now, a heuristic is required to indicate that a bundle list is organized for use during 'git fetch', and it is also sufficient for all existing designs. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-23Documentation: render dash correctlyAndrei Rybak2-2/+2
Three hyphens are rendered verbatim in documentation, so "--" has to be used to produce a dash. Fix asciidoc output for dashes. This is similar to previous commits f0b922473e (Documentation: render special characters correctly, 2021-07-29) and de82095a95 (doc hash-function-transition: fix asciidoc output, 2021-02-05). Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-18Merge branch 'en/sparse-checkout-design'Taylor Blau1-0/+1103
Design doc. * en/sparse-checkout-design: sparse-checkout.txt: new document with sparse-checkout directions
2022-11-11repository-version.txt: partialClone casing changeKousik Sanagavarapu1-2/+2
Remotes are considered "promisor" if extensions.partialClone and some other configuration variables are set. The casing for this in Documentation/technical/repository-version.txt is not proper and may cause confusion. This change corrects this casing. Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08Merge branch 'po/glossary-around-traversal'Taylor Blau2-5/+5
The glossary entries for "commit-graph file" and "reachability bitmap" have been added. * po/glossary-around-traversal: glossary: add reachability bitmap description glossary: add "commit graph" description doc: use 'object database' not ODB or abbreviation doc: use "commit-graph" hyphenation consistently
2022-11-07sparse-checkout.txt: new document with sparse-checkout directionsElijah Newren1-0/+1103
Once upon a time, Matheus wrote some patches to make git grep [--cached | <REVISION>] ... restrict its output to the sparsity specification when working in a sparse checkout[1]. That effort got derailed by two things: (1) The --sparse-index work just beginning which we wanted to avoid creating conflicts for (2) Never deciding on flag and config names and planned high level behavior for all commands. More recently, Shaoxuan implemented a more limited form of Matheus' patches that only affected --cached, using a different flag name, but also changing the default behavior in line with what Matheus did. This again highlighted the fact that we never decided on command line flag names, config option names, and the big picture path forward. The --sparse-index work has been mostly complete (or at least released into production even if some small edges remain) for quite some time now. We have also had several discussions on flag and config names, though we never came to solid conclusions. Stolee once upon a time suggested putting all these into some document in Documentation/technical[3], which Victoria recently also requested[4]. I'm behind the times, but here's a patch attempting to finally do that. [1] https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/ (See his second link in that email in particular) [2] https://lore.kernel.org/git/20220908001854.206789-2-shaoxuan.yuan02@gmail.com/ [3] https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/ (Scroll to the very end for the final few paragraphs) [4] https://lore.kernel.org/git/cafcedba-96a2-cb85-d593-ef47c8c8397c@github.com/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-30doc: use 'object database' not ODB or abbreviationPhilip Oakley2-2/+2
The abbreviation 'ODB' is used in the technical documentation sections for commit-graph and parallel-checkout, along with an 'odb' option in `git-pack-redundant`, without expansion. Use 'object database' in full, in those entries. The text has not been reflowed to keep the changes minimal. While in the glossary for `object` terms, add the common`oid` abbreviation to its entry. Signed-off-by: Philip Oakley <philipoakley@iee.email> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-30doc: use "commit-graph" hyphenation consistentlyPhilip Oakley1-3/+3
Note, historical release notes have not been updated. Signed-off-by: Philip Oakley <philipoakley@iee.email> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-24trace2: add global counter mechanismJeff Hostetler1-0/+31
Add global counters mechanism to Trace2. The Trace2 counters mechanism adds the ability to create a set of global counter variables and an API to increment them efficiently. Counters can optionally report per-thread usage in addition to the sum across all threads. Counter events are emitted to the Trace2 logs when a thread exits and at process exit. Counters are an alternative to `data` and `data_json` events. Counters are useful when you want to measure something across the life of the process, when you don't want per-measurement events for performance reasons, when the data does not fit conveniently within a region, or when your control flow does not easily let you write the final total. For example, you might use this to report the number of calls to unzip() or the number of de-delta steps during a checkout. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24trace2: add stopwatch timersJeff Hostetler1-0/+90
Add stopwatch timer mechanism to Trace2. Timers are an alternative to Trace2 Regions. Regions are useful for measuring the time spent in various computation phases, such as the time to read the index, time to scan for unstaged files, time to scan for untracked files, and etc. However, regions are not appropriate in all places. For example, during a checkout, it would be very inefficient to use regions to measure the total time spent inflating objects from the ODB from across the entire lifetime of the process; a per-unzip() region would flood the output and significantly slow the command; and some form of post-processing would be requried to compute the time spent in unzip(). Timers can be used to measure a series of timer intervals and emit a single summary event (at thread and/or process exit). Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24api-trace2.txt: elminate section describing the public trace2 APIJeff Hostetler1-54/+7
Eliminate the mostly obsolete `Public API` sub-section from the `Trace2 API` section in the documentation. Strengthen the referral to `trace2.h`. Most of the technical information in this sub-section was moved to `trace2.h` in 6c51cb525d (trace2: move doc to trace2.h, 2019-11-17) to be adjacent to the function prototypes. The remaining text wasn't that useful by itself. Furthermore, the text would need a bit of overhaul to add routines that do not immediately generate a message, such as stopwatch timers. So it seemed simpler to just get rid of it. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24tr2tls: clarify TLS terminologyJeff Hostetler1-4/+4
Reduce or eliminate use of the term "TLS" in the Trace2 code. The term "TLS" has two popular meanings: "thread-local storage" and "transport layer security". In the Trace2 source, the term is associated with the former. There was concern on the mailing list about it refering to the latter. Update the source and documentation to eliminate the use of the "TLS" term or replace it with the phrase "thread-local storage" to reduce ambiguity. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-07bundle-uri: fix technical doc issuesDerrick Stolee1-4/+4
Two documentation issues exist in the technical docs for the bundle URI feature. First, there is an extraneous "the" across a linebreak, making the nonsensical phrase "the bundle the list" which should just be "the bundle list". Secondly, the asciidoc update treats the string "`have`s" as starting a "<code>" block, but the second tick is interpreted as an apostrophe instead of a closing "</code>" tag. This causes entire sentences to be formatted as code until the next one comes along. Simply adding a space here does not work properly as the rendered HTML keeps that space. Instead, restructure the sentence slightly to avoid using a plural, allowing the HTML to render correctly. Reported-by: Philip Oakley <philipoakley@iee.email> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-21Merge branch 'js/typofix'Junio C Hamano4-7/+7
* js/typofix: Documentation: clean up various typos in technical docs Documentation: clean up a few misspelled word typos