diff options
Diffstat (limited to 'Documentation/technical')
| -rw-r--r-- | Documentation/technical/api-index-skel.txt | 2 | ||||
| -rw-r--r-- | Documentation/technical/api-simple-ipc.txt | 10 | ||||
| -rw-r--r-- | Documentation/technical/api-trace2.txt | 17 | ||||
| -rw-r--r-- | Documentation/technical/bitmap-format.txt | 147 | ||||
| -rw-r--r-- | Documentation/technical/commit-graph.txt | 2 | ||||
| -rw-r--r-- | Documentation/technical/hash-function-transition.txt | 4 | ||||
| -rw-r--r-- | Documentation/technical/multi-pack-index.txt | 103 | ||||
| -rw-r--r-- | Documentation/technical/parallel-checkout.txt | 10 | ||||
| -rw-r--r-- | Documentation/technical/partial-clone.txt | 10 | ||||
| -rw-r--r-- | Documentation/technical/platform-support.txt | 190 | ||||
| -rw-r--r-- | Documentation/technical/racy-git.txt | 10 | ||||
| -rw-r--r-- | Documentation/technical/reftable.txt | 10 | ||||
| -rw-r--r-- | Documentation/technical/repository-version.txt | 38 | ||||
| -rw-r--r-- | Documentation/technical/rerere.txt | 6 | ||||
| -rw-r--r-- | Documentation/technical/sparse-checkout.txt | 2 | ||||
| -rw-r--r-- | Documentation/technical/unit-tests.txt | 242 |
16 files changed, 730 insertions, 73 deletions
diff --git a/Documentation/technical/api-index-skel.txt b/Documentation/technical/api-index-skel.txt index eda8c195c1..7780a76b08 100644 --- a/Documentation/technical/api-index-skel.txt +++ b/Documentation/technical/api-index-skel.txt @@ -1,7 +1,7 @@ Git API Documents ================= -Git has grown a set of internal API over time. This collection +Git has grown a set of internal APIs over time. This collection documents them. //////////////////////////////////////////////////////////////// diff --git a/Documentation/technical/api-simple-ipc.txt b/Documentation/technical/api-simple-ipc.txt index d44ada98e7..c4fb152b23 100644 --- a/Documentation/technical/api-simple-ipc.txt +++ b/Documentation/technical/api-simple-ipc.txt @@ -2,7 +2,7 @@ Simple-IPC API ============== The Simple-IPC API is a collection of `ipc_` prefixed library routines -and a basic communication protocol that allow an IPC-client process to +and a basic communication protocol that allows an IPC-client process to send an application-specific IPC-request message to an IPC-server process and receive an application-specific IPC-response message. @@ -20,12 +20,12 @@ IPC-client. The IPC-client routines within a client application process connect to the IPC-server and send a request message and wait for a response. -When received, the response is returned back the caller. +When received, the response is returned back to the caller. For example, the `fsmonitor--daemon` feature will be built as a server application on top of the IPC-server library routines. It will have threads watching for file system events and a thread pool waiting for -client connections. Clients, such as `git status` will request a list +client connections. Clients, such as `git status`, will request a list of file system events since a point in time and the server will respond with a list of changed files and directories. The formats of the request and response are application-specific; the IPC-client and @@ -37,7 +37,7 @@ Comparison with sub-process model The Simple-IPC mechanism differs from the existing `sub-process.c` model (Documentation/technical/long-running-process-protocol.txt) and -used by applications like Git-LFS. In the LFS-style sub-process model +used by applications like Git-LFS. In the LFS-style sub-process model, the helper is started by the foreground process, communication happens via a pair of file descriptors bound to the stdin/stdout of the sub-process, the sub-process only serves the current foreground @@ -102,4 +102,4 @@ stateless request, receive an application-specific response, and disconnect. It is a one round trip facility for querying the server. The Simple-IPC routines hide the socket, named pipe, and thread pool details and allow the application -layer to focus on the application at hand. +layer to focus on the task at hand. diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt index de5fc25059..5817b18310 100644 --- a/Documentation/technical/api-trace2.txt +++ b/Documentation/technical/api-trace2.txt @@ -128,7 +128,7 @@ yields ------------ $ cat ~/log.event -{"event":"version","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"3","exe":"2.20.1.155.g426c96fcdb"} +{"event":"version","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"4","exe":"2.20.1.155.g426c96fcdb"} {"event":"start","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621027Z","file":"common-main.c","line":39,"t_abs":0.001173,"argv":["git","version"]} {"event":"cmd_name","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621122Z","file":"git.c","line":432,"name":"version","hierarchy":"version"} {"event":"exit","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621236Z","file":"git.c","line":662,"t_abs":0.001227,"code":0} @@ -344,7 +344,7 @@ only present on the "start" and "atexit" events. { "event":"version", ... - "evt":"3", # EVENT format version + "evt":"4", # EVENT format version "exe":"2.20.1.155.g426c96fcdb" # git version } ------------ @@ -835,6 +835,19 @@ The "value" field may be an integer or a string. } ------------ +`"printf"`:: + This event logs a human-readable message with no particular formatting + guidelines. ++ +------------ +{ + "event":"printf", + ... + "t_abs":0.015905, # elapsed time in seconds + "msg":"Hello world" # optional +} +------------ + == Example Trace2 API Usage diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index c2e652b71a..bfb0ec7beb 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -114,7 +114,7 @@ result in an empty bitmap (no bits set). * N entries with compressed bitmaps, one for each indexed commit + -Where `N` is the total amount of entries in this bitmap index. +Where `N` is the total number of entries in this bitmap index. Each entry contains the following: ** {empty} @@ -126,7 +126,7 @@ Each entry contains the following: ** {empty} 1-byte XOR-offset: :: The xor offset used to compress this bitmap. For an entry - in position `x`, a XOR offset of `y` means that the actual + in position `x`, an XOR offset of `y` means that the actual bitmap representing this commit is composed by XORing the bitmap for this entry with the bitmap in entry `x-y` (i.e. the bitmap `y` entries before this one). @@ -239,7 +239,7 @@ bitmaps. For a `.bitmap` containing `nr_entries` reachability bitmaps, the table contains a list of `nr_entries` <commit_pos, offset, xor_row> triplets -(sorted in the ascending order of `commit_pos`). The content of i'th +(sorted in the ascending order of `commit_pos`). The content of the i'th triplet is - * {empty} @@ -255,3 +255,144 @@ triplet is - xor_row (4 byte integer, network byte order): :: The position of the triplet whose bitmap is used to compress this one, or `0xffffffff` if no such bitmap exists. + +Pseudo-merge bitmaps +-------------------- + +If the `BITMAP_OPT_PSEUDO_MERGES` flag is set, a variable number of +bytes (preceding the name-hash cache, commit lookup table, and trailing +checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps. + +For more information on what pseudo-merges are, why they are useful, and +how to configure them, see the information in linkgit:gitpacking[7]. + +=== File format + +If enabled, pseudo-merge bitmaps are stored in an optional section at +the end of a `.bitmap` file. The format is as follows: + +.... ++-------------------------------------------+ +| .bitmap File | ++-------------------------------------------+ +| | +| Pseudo-merge bitmaps (Variable Length) | +| +---------------------------+ | +| | commits_bitmap (EWAH) | | +| +---------------------------+ | +| | merge_bitmap (EWAH) | | +| +---------------------------+ | +| | ++-------------------------------------------+ +| | +| Lookup Table | +| +---------------------------+ | +| | commit_pos (4 bytes) | | +| +---------------------------+ | +| | offset (8 bytes) | | +| +------------+--------------+ | +| | +| Offset Cases: | +| ------------- | +| | +| 1. MSB Unset: single pseudo-merge bitmap | +| + offset to pseudo-merge bitmap | +| | +| 2. MSB Set: multiple pseudo-merges | +| + offset to extended lookup table | +| | ++-------------------------------------------+ +| | +| Extended Lookup Table (Optional) | +| +----+----------+----------+----------+ | +| | N | Offset 1 | .... | Offset N | | +| +----+----------+----------+----------+ | +| | | 8 bytes | .... | 8 bytes | | +| +----+----------+----------+----------+ | +| | ++-------------------------------------------+ +| | +| Pseudo-merge position table | +| +----+----------+----------+----------+ | +| | N | Offset 1 | .... | Offset N | | +| +----+----------+----------+----------+ | +| | | 8 bytes | .... | 8 bytes | | +| +----+----------+----------+----------+ | +| | ++-------------------------------------------+ +| | +| Pseudo-merge Metadata | +| +-----------------------------------+ | +| | # pseudo-merges (4 bytes) | | +| +-----------------------------------+ | +| | # commits (4 bytes) | | +| +-----------------------------------+ | +| | Lookup offset (8 bytes) | | +| +-----------------------------------+ | +| | Extension size (8 bytes) | | +| +-----------------------------------+ | +| | ++-------------------------------------------+ +.... + +* One or more pseudo-merge bitmaps, each containing: + + ** `commits_bitmap`, an EWAH-compressed bitmap describing the set of + commits included in the this psuedo-merge. + + ** `merge_bitmap`, an EWAH-compressed bitmap describing the union of + the set of objects reachable from all commits listed in the + `commits_bitmap`. + +* A lookup table, mapping pseudo-merged commits to the pseudo-merges + they belong to. Entries appear in increasing order of each commit's + bit position. Each entry is 12 bytes wide, and is comprised of the + following: + + ** `commit_pos`, a 4-byte unsigned value (in network byte-order) + containing the bit position for this commit. + + ** `offset`, an 8-byte unsigned value (also in network byte-order) + containing either one of two possible offsets, depending on whether or + not the most-significant bit is set. + + *** If unset (i.e. `offset & ((uint64_t)1<<63) == 0`), the offset + (relative to the beginning of the `.bitmap` file) at which the + pseudo-merge bitmap for this commit can be read. This indicates + only a single pseudo-merge bitmap contains this commit. + + *** If set (i.e. `offset & ((uint64_t)1<<63) != 0`), the offset + (again relative to the beginning of the `.bitmap` file) at which + the extended offset table can be located describing the set of + pseudo-merge bitmaps which contain this commit. This indicates + that multiple pseudo-merge bitmaps contain this commit. + +* An (optional) extended lookup table (written if and only if there is + at least one commit which appears in more than one pseudo-merge). + There are as many entries as commits which appear in multiple + pseudo-merges. Each entry contains the following: + + ** `N`, a 4-byte unsigned value equal to the number of pseudo-merges + which contain a given commit. + + ** An array of `N` 8-byte unsigned values, each of which is + interpreted as an offset (relative to the beginning of the + `.bitmap` file) at which a pseudo-merge bitmap for this commit can + be read. These values occur in no particular order. + +* Positions for all pseudo-merges, each stored as an 8-byte unsigned + value (in network byte-order) containing the offset (relative to the + beginning of the `.bitmap` file) of each consecutive pseudo-merge. + +* A 4-byte unsigned value (in network byte-order) equal to the number of + pseudo-merges. + +* A 4-byte unsigned value (in network byte-order) equal to the number of + unique commits which appear in any pseudo-merge. + +* An 8-byte unsigned value (in network byte-order) equal to the number + of bytes between the start of the pseudo-merge section and the + beginning of the lookup table. + +* An 8-byte unsigned value (in network byte-order) equal to the number + of bytes in the pseudo-merge section (including this field). diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index 86fed0de0f..2c26e95e51 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -136,7 +136,7 @@ Design Details - Commit grafts and replace objects can change the shape of the commit history. The latter can also be enabled/disabled on the fly using - `--no-replace-objects`. This leads to difficultly storing both possible + `--no-replace-objects`. This leads to difficulty storing both possible interpretations of a commit id, especially when computing generation numbers. The commit-graph will not be read or written when replace-objects or grafts are present. diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt index ed57481089..7102c7c8f5 100644 --- a/Documentation/technical/hash-function-transition.txt +++ b/Documentation/technical/hash-function-transition.txt @@ -148,8 +148,8 @@ Detailed Design Repository format extension ~~~~~~~~~~~~~~~~~~~~~~~~~~~ A SHA-256 repository uses repository format version `1` (see -Documentation/technical/repository-version.txt) with extensions -`objectFormat` and `compatObjectFormat`: +linkgit:gitrepository-layout[5]) with `extensions.objectFormat` and +`extensions.compatObjectFormat` (see linkgit:git-config[1]) set to: [core] repositoryFormatVersion = 1 diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt index f2221d2b44..cc063b30be 100644 --- a/Documentation/technical/multi-pack-index.txt +++ b/Documentation/technical/multi-pack-index.txt @@ -61,6 +61,109 @@ Design Details - The MIDX file format uses a chunk-based approach (similar to the commit-graph file) that allows optional data to be added. +Incremental multi-pack indexes +------------------------------ + +As repositories grow in size, it becomes more expensive to write a +multi-pack index (MIDX) that includes all packfiles. To accommodate +this, the "incremental multi-pack indexes" feature allows for combining +a "chain" of multi-pack indexes. + +Each individual component of the chain need only contain a small number +of packfiles. Appending to the chain does not invalidate earlier parts +of the chain, so repositories can control how much time is spent +updating the MIDX chain by determining the number of packs in each layer +of the MIDX chain. + +=== Design state + +At present, the incremental multi-pack indexes feature is missing two +important components: + + - The ability to rewrite earlier portions of the MIDX chain (i.e., to + "compact" some collection of adjacent MIDX layers into a single + MIDX). At present the only supported way of shrinking a MIDX chain + is to rewrite the entire chain from scratch without the `--split` + flag. ++ +There are no fundamental limitations that stand in the way of being able +to implement this feature. It is omitted from the initial implementation +in order to reduce the complexity, but will be added later. + + - Support for reachability bitmaps. The classic single MIDX + implementation does support reachability bitmaps (see the section + titled "multi-pack-index reverse indexes" in + linkgit:gitformat-pack[5] for more details). ++ +As above, there are no fundamental limitations that stand in the way of +extending the incremental MIDX format to support reachability bitmaps. +The design below specifically takes this into account, and support for +reachability bitmaps will be added in a future patch series. It is +omitted from the current implementation for the same reason as above. ++ +In brief, to support reachability bitmaps with the incremental MIDX +feature, the concept of the pseudo-pack order is extended across each +layer of the incremental MIDX chain to form a concatenated pseudo-pack +order. This concatenation takes place in the same order as the chain +itself (in other words, the concatenated pseudo-pack order for a chain +`{$H1, $H2, $H3}` would be the pseudo-pack order for `$H1`, followed by +the pseudo-pack order for `$H2`, followed by the pseudo-pack order for +`$H3`). ++ +The layout will then be extended so that each layer of the incremental +MIDX chain can write a `*.bitmap`. The objects in each layer's bitmap +are offset by the number of objects in the previous layers of the chain. + +=== File layout + +Instead of storing a single `multi-pack-index` file (with an optional +`.rev` and `.bitmap` extension) in `$GIT_DIR/objects/pack`, incremental +MIDXs are stored in the following layout: + +---- +$GIT_DIR/objects/pack/multi-pack-index.d/ +$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-chain +$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H1.midx +$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H2.midx +$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H3.midx +---- + +The `multi-pack-index-chain` file contains a list of the incremental +MIDX files in the chain, in order. The above example shows a chain whose +`multi-pack-index-chain` file would contain the following lines: + +---- +$H1 +$H2 +$H3 +---- + +The `multi-pack-index-$H1.midx` file contains the first layer of the +multi-pack-index chain. The `multi-pack-index-$H2.midx` file contains +the second layer of the chain, and so on. + +When both an incremental- and non-incremental MIDX are present, the +non-incremental MIDX is always read first. + +=== Object positions for incremental MIDXs + +In the original multi-pack-index design, we refer to objects via their +lexicographic position (by object IDs) within the repository's singular +multi-pack-index. In the incremental multi-pack-index design, we refer +to objects via their index into a concatenated lexicographic ordering +among each component in the MIDX chain. + +If `objects_nr()` is a function that returns the number of objects in a +given MIDX layer, then the index of an object at lexicographic position +`i` within, say, $H3 is defined as: + +---- +objects_nr($H2) + objects_nr($H1) + i +---- + +(in the C implementation, this is often computed as `i + +m->num_objects_in_base`). + Future Work ----------- diff --git a/Documentation/technical/parallel-checkout.txt b/Documentation/technical/parallel-checkout.txt index 47c9b6183c..b4a144e5f4 100644 --- a/Documentation/technical/parallel-checkout.txt +++ b/Documentation/technical/parallel-checkout.txt @@ -63,7 +63,7 @@ improvements over the sequential code, but there was still too much lock contention. A `perf` profiling indicated that around 20% of the runtime during a local Linux clone (on an SSD) was spent in locking functions. For this reason this approach was rejected in favor of using multiple -child processes, which led to a better performance. +child processes, which led to better performance. Multi-Process Solution ---------------------- @@ -126,7 +126,7 @@ Then, for each assigned item, each worker: * W5: Writes the result to the file descriptor opened at W2. -* W6: Calls `fstat()` or lstat()` on the just-written path, and sends +* W6: Calls `fstat()` or `lstat()` on the just-written path, and sends the result back to the main process, together with the end status of the operation and the item's identification number. @@ -148,7 +148,7 @@ information, the main process handles the results in two steps: - First, it updates the in-memory index with the `lstat()` information sent by the workers. (This must be done first as this information - might me required in the following step.) + might be required in the following step.) - Then it writes the items which collided on disk (i.e. items marked with `PC_ITEM_COLLIDED`). More on this below. @@ -185,7 +185,7 @@ quite straightforward: for each parallel-eligible entry, the main process must remove all files that prevent this entry from being written (before enqueueing it). This includes any non-directory file in the leading path of the entry. Later, when a worker gets assigned the entry, -it looks again for the non-directories files and for an already existing +it looks again for the non-directory files and for an already existing file at the entry's path. If any of these checks finds something, the worker knows that there was a path collision. @@ -232,7 +232,7 @@ conversion and re-encoding, are eligible for parallel checkout. Ineligible entries are checked out by the classic sequential codepath *before* spawning workers. -Note: submodules's files are also eligible for parallel checkout (as +Note: submodules' files are also eligible for parallel checkout (as long as they don't fall into any of the excluding categories mentioned above). But since each submodule is checked out in its own child process, we don't mix the superproject's and the submodules' files in diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt index 92fcee2bff..bf5ec5c82d 100644 --- a/Documentation/technical/partial-clone.txt +++ b/Documentation/technical/partial-clone.txt @@ -3,7 +3,7 @@ Partial Clone Design Notes The "Partial Clone" feature is a performance optimization for Git that allows Git to function without having a complete copy of the repository. -The goal of this work is to allow Git better handle extremely large +The goal of this work is to allow Git to better handle extremely large repositories. During clone and fetch operations, Git downloads the complete contents @@ -102,7 +102,7 @@ or commits that reference missing trees. - On the client a repository extension is added to the local config to prevent older versions of git from failing mid-operation because of missing objects that they cannot handle. - See "extensions.partialClone" in Documentation/technical/repository-version.txt" + See `extensions.partialClone` in linkgit:git-config[1]. Handling Missing Objects @@ -256,7 +256,7 @@ remote in a specific order. - Dynamic object fetching currently uses the existing pack protocol V0 which means that each object is requested via fetch-pack. The server will send a full set of info/refs when the connection is established. - If there are large number of refs, this may incur significant overhead. + If there are a large number of refs, this may incur significant overhead. Future Work @@ -265,7 +265,7 @@ Future Work - Improve the way to specify the order in which promisor remotes are tried. + -For example this could allow to specify explicitly something like: +For example this could allow specifying explicitly something like: "When fetching from this remote, I want to use these promisor remotes in this order, though, when pushing or fetching to that remote, I want to use those promisor remotes in that order." @@ -322,7 +322,7 @@ Footnotes [a] expensive-to-modify list of missing objects: Earlier in the design of partial clone we discussed the need for a single list of missing objects. - This would essentially be a sorted linear list of OIDs that the were + This would essentially be a sorted linear list of OIDs that were omitted by the server during a clone or subsequent fetches. This file would need to be loaded into memory on every object lookup. diff --git a/Documentation/technical/platform-support.txt b/Documentation/technical/platform-support.txt new file mode 100644 index 0000000000..0a2fb28d62 --- /dev/null +++ b/Documentation/technical/platform-support.txt @@ -0,0 +1,190 @@ +Platform Support Policy +======================= + +Git has a history of providing broad "support" for exotic platforms and older +platforms, without an explicit commitment. Stakeholders of these platforms may +want a more predictable support commitment. This is only possible when platform +stakeholders supply Git developers with adequate tooling, so we can test for +compatibility or develop workarounds for platform-specific quirks on our own. +Various levels of platform-specific tooling will allow us to make more solid +commitments around Git's compatibility with that platform. + +Note that this document is about maintaining existing support for a platform +that has generally worked in the past; for adding support to a platform which +doesn't generally work with Git, the stakeholders for that platform are expected +to do the bulk of that work themselves. We will consider such patches if they +don't make life harder for other supported platforms or for Git contributors. +Some contributors may volunteer to help with the initial or continued support, +but that's not a given. Support work which is too intrusive or difficult for the +project to maintain may still not be accepted. + +Minimum Requirements +-------------------- + +The rest of this doc describes best practices for platforms to make themselves +easy to support. However, before considering support at all, platforms need to +meet the following minimum requirements: + +* Has C99 or C11 + +* Uses versions of dependencies which are generally accepted as stable and + supportable, e.g., in line with the version used by other long-term-support + distributions + +* Has active security support (taking security releases of dependencies, etc) + +These requirements are a starting point, and not sufficient on their own for the +Git community to be enthusiastic about supporting your platform. Maintainers of +platforms which do meet these requirements can follow the steps below to make it +more likely that Git updates will respect the platform's needs. + +Compatible by next release +-------------------------- + +To increase probability that compatibility issues introduced in a release +will be fixed in a later release: + +* You should send a bug report as soon as you notice the breakage on your + platform. The sooner you notice, the better; watching `seen` means you can + notice problems before they are considered "done with review"; whereas + watching `master` means the stable branch could break for your platform, but + you have a decent chance of avoiding a tagged release breaking you. See "The + Policy" in link:../howto/maintain-git.html["How to maintain Git"] for an + overview of which branches are used in the Git project, and how. + +* The bug report should include information about what platform you are using. + +* You should also use linkgit:git-bisect[1] and determine which commit + introduced the breakage. + +* Please include any information you have about the nature of the breakage: is + it a memory alignment issue? Is an underlying library missing or broken for + your platform? Is there some quirk about your platform which means typical + practices (like malloc) behave strangely? + +* If possible, build Git from the exact same source both for your platform and + for a mainstream platform, to see if the problem you noticed appears only + on your platform. If the problem appears in both, then it's not a + compatibility issue, but we of course appreciate hearing about it in a bug + report anyway, to benefit users of every platform. If it appears only on your + platform, mention clearly that it is a compatibility issue in your report. + +* Once we begin to fix the issue, please work closely with the contributor + working on it to test the proposed fix against your platform. + +Example: NonStop +https://lore.kernel.org/git/01bd01da681a$b8d70a70$2a851f50$@nexbridge.com/[reports +problems] when they're noticed. + +Compatible on `master` and releases +----------------------------------- + +To make sure all stable builds and regular releases work for your platform the +first time, help us avoid breaking `master` for your platform: + +* You should run regular tests against the `next` branch and + publish breakage reports to the mailing list immediately when they happen. + +** Ideally, these tests should run daily. They must run more often than + weekly, as topics generally spend at least 7 days in `next` before graduating + to `master`, and it takes time to put the brakes on a patch once it lands in + `next`. + +** You may want to ask to join the mailto:git-security@googlegroups.com[security + mailing list] in order to run tests against the fixes proposed there, too. + +* It may make sense to automate these; if you do, make sure they are not noisy + (you don't need to send a report when everything works, only when something + breaks; you don't need to send repeated reports for the same breakage night + after night). + +* Breakage reports should be actionable - include clear error messages that can + help developers who may not have access to test directly on your platform. + +* You should use git-bisect and determine which commit introduced the breakage; + if you can't do this with automation, you should do this yourself manually as + soon as you notice a breakage report was sent. + +* You should either: + +** Provide on-demand access to your platform to a trusted developer working to + fix the issue, so they can test their fix, OR + +** Work closely with the developer fixing the issue; the turnaround to check + that their proposed fix works for your platform should be fast enough that it + doesn't hinder the developer working on that fix. Slow testing turnarounds + may cause the fix to miss the next release, or the developer may lose + interest in working on the fix at all. + +Example: +https://lore.kernel.org/git/CAHd-oW6X4cwD_yLNFONPnXXUAFPxgDoccv2SOdpeLrqmHCJB4Q@mail.gmail.com/[AIX] +provides a build farm and runs tests against release candidates. + +Compatible on `next` +-------------------- + +To avoid reactive debugging and fixing when changes hit a release or stable, you +can aim to ensure `next` always works for your platform. (See "The Policy" in +link:../howto/maintain-git.html["How to maintain Git"] for an overview of how +`next` is used in the Git project.) To do that: + +* You should add a runner for your platform to the GitHub Actions or GitLab CI + suite. This suite is run when any Git developer proposes a new patch, and + having a runner for your platform/configuration means every developer will + know if they break you, immediately. + +** If adding it to an existing CI suite is infeasible (due to architecture + constraints or for performance reasons), any other method which runs as + automatically and quickly as possible works, too. For example, a service + which snoops on the mailing list and automatically runs tests on new [PATCH] + emails, replying to the author with the results, would also be within the + spirit of this requirement. + +* If you rely on Git avoiding a specific pattern that doesn't work well with + your platform (like a certain malloc pattern), raise it on the mailing list. + We'll work case-by-case to look for a solution that doesn't unnecessarily + constrain other platforms to keep compatibility with yours. + +* If you rely on some configuration or behavior, add a test for it. Untested + behavior is subject to breakage at any time. + +** Clearly label these tests as necessary for platform compatibility. Add them + to an isolated compatibility-related test suite, like a new t* file or unit + test suite, so that they're easy to remove when compatibility is no longer + required. If the specific compatibility need is gated behind an issue with + another project, link to documentation of that issue (like a bug or email + thread) to make it easier to tell when that compatibility need goes away. + +** Include a comment with an expiration date for these tests no more than 1 year + from now. You can update the expiration date if your platform still needs + that assurance down the road, but we need to know you still care about that + compatibility case and are working to make it unnecessary. + +Example: We run our +https://git.kernel.org/pub/scm/git/git.git/tree/.github/workflows/main.yml[CI +suite] on Windows, Ubuntu, Mac, and others. + +Getting help writing platform support patches +--------------------------------------------- + +In general, when sending patches to fix platform support problems, follow +these guidelines to make sure the patch is reviewed with the appropriate level +of urgency: + +* Clearly state in the commit message that you are fixing a platform breakage, + and for which platform. + +* Use the CI and test suite to ensure that the fix for your platform doesn't + break other platforms. + +* If possible, add a test ensuring this regression doesn't happen again. If + it's not possible to add a test, explain why in the commit message. + +Platform Maintainers +-------------------- + +If you maintain a platform, or Git for that platform, and intend to work with +the Git project to ensure compatibility, please send a patch to add yourself to +this list. + +NonStop: Randall S. Becker <rsbecker@nexbridge.com> diff --git a/Documentation/technical/racy-git.txt b/Documentation/technical/racy-git.txt index ceda4bbfda..59bea66c0f 100644 --- a/Documentation/technical/racy-git.txt +++ b/Documentation/technical/racy-git.txt @@ -11,7 +11,7 @@ write out the next tree object to be committed. The state is "virtual" in the sense that it does not necessarily have to, and often does not, match the files in the working tree. -There are cases Git needs to examine the differences between the +There are cases where Git needs to examine the differences between the virtual working tree state in the index and the files in the working tree. The most obvious case is when the user asks `git diff` (or its low level implementation, `git diff-files`) or @@ -165,9 +165,9 @@ Avoiding runtime penalty In order to avoid the above runtime penalty, post 1.4.2 Git used to have a code that made sure the index file -got timestamp newer than the youngest files in the index when -there are many young files with the same timestamp as the -resulting index file would otherwise would have by waiting +got a timestamp newer than the youngest files in the index when +there were many young files with the same timestamp as the +resulting index file otherwise would have by waiting before finishing writing the index file out. I suspected that in practice the situation where many paths in the @@ -190,7 +190,7 @@ In a large project where raciness avoidance cost really matters, however, the initial computation of all object names in the index takes more than one second, and the index file is written out after all that happens. Therefore the timestamp of the -index file will be more than one seconds later than the +index file will be more than one second later than the youngest file in the working tree. This means that in these cases there actually will not be any racily clean entry in the resulting index. diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt index 6a67cc4174..dd0b37c4e3 100644 --- a/Documentation/technical/reftable.txt +++ b/Documentation/technical/reftable.txt @@ -46,7 +46,7 @@ search lookup, and range scans. Storage in the file is organized into variable sized blocks. Prefix compression is used within a single block to reduce disk space. Block -size and alignment is tunable by the writer. +size and alignment are tunable by the writer. Performance ^^^^^^^^^^^ @@ -115,7 +115,7 @@ Varint encoding Varint encoding is identical to the ofs-delta encoding method used within pack files. -Decoder works such as: +Decoder works as follows: .... val = buf[ptr] & 0x7f @@ -175,7 +175,7 @@ log_index* footer .... -in a log-only file the first log block immediately follows the file +In a log-only file, the first log block immediately follows the file header, without padding to block alignment. Block size @@ -247,7 +247,7 @@ uint32( hash_id ) .... The header is identical to `version_number=1`, with the 4-byte hash ID -("sha1" for SHA1 and "s256" for SHA-256) append to the header. +("sha1" for SHA1 and "s256" for SHA-256) appended to the header. For maximum backward compatibility, it is recommended to use version 1 when writing SHA1 reftables. @@ -288,7 +288,7 @@ The 2-byte `restart_count` stores the number of entries in the `restart_count` to binary search between restarts before starting a linear scan. -Exactly `restart_count` 3-byte `restart_offset` values precedes the +Exactly `restart_count` 3-byte `restart_offset` values precede the `restart_count`. Offsets are relative to the start of the block and refer to the first byte of any `ref_record` whose name has not been prefix compressed. Entries in the `restart_offset` list must be sorted, diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt index 8ef664b0b9..b9bb81a81f 100644 --- a/Documentation/technical/repository-version.txt +++ b/Documentation/technical/repository-version.txt @@ -65,38 +65,6 @@ Note that if no extensions are specified in the config file, then provides no benefit, and makes the repository incompatible with older implementations of git). -This document will serve as the master list for extensions. Any -implementation wishing to define a new extension should make a note of -it here, in order to claim the name. - -The defined extensions are: - -==== `noop` - -This extension does not change git's behavior at all. It is useful only -for testing format-1 compatibility. - -==== `preciousObjects` - -When the config key `extensions.preciousObjects` is set to `true`, -objects in the repository MUST NOT be deleted (e.g., by `git-prune` or -`git repack -d`). - -==== `partialClone` - -When the config key `extensions.partialClone` is set, it indicates -that the repo was created with a partial clone (or later performed -a partial fetch) and that the remote may have omitted sending -certain unwanted objects. Such a remote is called a "promisor remote" -and it promises that all such omitted objects can be fetched from it -in the future. - -The value of this key is the name of the promisor remote. - -==== `worktreeConfig` - -If set, by default "git config" reads from both "config" and -"config.worktree" file from GIT_DIR in that order. In -multiple working directory mode, "config" file is shared while -"config.worktree" is per-working directory (i.e., it's in -GIT_COMMON_DIR/worktrees/<id>/config.worktree) +The defined extensions are given in the `extensions.*` section of +linkgit:git-config[1]. Any implementation wishing to define a new +extension should make a note of it there, in order to claim the name. diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt index be58f1bee3..580f23360a 100644 --- a/Documentation/technical/rerere.txt +++ b/Documentation/technical/rerere.txt @@ -60,7 +60,7 @@ By resolving this conflict, to leave line D, the user declares: what AB and AC wanted to do. As branch AC2 refers to the same commit as AC, the above implies that -this is also compatible what AB and AC2 wanted to do. +this is also compatible with what AB and AC2 wanted to do. By extension, this means that rerere should recognize that the above conflicts are the same. To do this, the labels on the conflict @@ -76,7 +76,7 @@ examples would both result in the following normalized conflict: Sorting hunks ~~~~~~~~~~~~~ -As before, lets imagine that a common ancestor had a file with line A +As before, let's imagine that a common ancestor had a file with line A its early part, and line X in its late part. And then four branches are forked that do these things: @@ -145,7 +145,7 @@ Nested conflicts Nested conflicts are handled very similarly to "simple" conflicts. Similar to simple conflicts, the conflict is first normalized by stripping the labels from conflict markers, stripping the common ancestor -version, and the sorting the conflict hunks, both for the outer and the +version, and sorting the conflict hunks, both for the outer and the inner conflict. This is done recursively, so any number of nested conflicts can be handled. diff --git a/Documentation/technical/sparse-checkout.txt b/Documentation/technical/sparse-checkout.txt index fa0d01cbda..d968659354 100644 --- a/Documentation/technical/sparse-checkout.txt +++ b/Documentation/technical/sparse-checkout.txt @@ -287,7 +287,7 @@ everything behaves like a dense checkout with a few exceptions (e.g. branch checkouts and switches write fewer things, knowing the VFS will lazily write the rest on an as-needed basis). -Since there is no publically available VFS-related code for folks to try, +Since there is no publicly available VFS-related code for folks to try, the number of folks who can test such a usecase is limited. The primary reason to note the Behavior C usecase is that as we fix things diff --git a/Documentation/technical/unit-tests.txt b/Documentation/technical/unit-tests.txt new file mode 100644 index 0000000000..5a432b7b29 --- /dev/null +++ b/Documentation/technical/unit-tests.txt @@ -0,0 +1,242 @@ += Unit Testing + +In our current testing environment, we spend a significant amount of effort +crafting end-to-end tests for error conditions that could easily be captured by +unit tests (or we simply forgo some hard-to-setup and rare error conditions). +Unit tests additionally provide stability to the codebase and can simplify +debugging through isolation. Writing unit tests in pure C, rather than with our +current shell/test-tool helper setup, simplifies test setup, simplifies passing +data around (no shell-isms required), and reduces testing runtime by not +spawning a separate process for every test invocation. + +We believe that a large body of unit tests, living alongside the existing test +suite, will improve code quality for the Git project. + +== Definitions + +For the purposes of this document, we'll use *test framework* to refer to +projects that support writing test cases and running tests within the context +of a single executable. *Test harness* will refer to projects that manage +running multiple executables (each of which may contain multiple test cases) and +aggregating their results. + +In reality, these terms are not strictly defined, and many of the projects +discussed below contain features from both categories. + +For now, we will evaluate projects solely on their framework features. Since we +are relying on having TAP output (see below), we can assume that any framework +can be made to work with a harness that we can choose later. + + +== Summary + +We believe the best way forward is to implement a custom TAP framework for the +Git project. We use a version of the framework originally proposed in +https://lore.kernel.org/git/c902a166-98ce-afba-93f2-ea6027557176@gmail.com/[1]. + +See the <<framework-selection,Framework Selection>> section below for the +rationale behind this decision. + + +== Choosing a test harness + +During upstream discussion, it was occasionally noted that `prove` provides many +convenient features, such as scheduling slower tests first, or re-running +previously failed tests. + +While we already support the use of `prove` as a test harness for the shell +tests, it is not strictly required. The t/Makefile allows running shell tests +directly (though with interleaved output if parallelism is enabled). Git +developers who wish to use `prove` as a more advanced harness can do so by +setting DEFAULT_TEST_TARGET=prove in their config.mak. + +We will follow a similar approach for unit tests: by default the test +executables will be run directly from the t/Makefile, but `prove` can be +configured with DEFAULT_UNIT_TEST_TARGET=prove. + + +[[framework-selection]] +== Framework selection + +There are a variety of features we can use to rank the candidate frameworks, and +those features have different priorities: + +* Critical features: we probably won't consider a framework without these +** Can we legally / easily use the project? +*** <<license,License>> +*** <<vendorable-or-ubiquitous,Vendorable or ubiquitous>> +*** <<maintainable-extensible,Maintainable / extensible>> +*** <<major-platform-support,Major platform support>> +** Does the project support our bare-minimum needs? +*** <<tap-support,TAP support>> +*** <<diagnostic-output,Diagnostic output>> +*** <<runtime-skippable-tests,Runtime-skippable tests>> +* Nice-to-have features: +** <<parallel-execution,Parallel execution>> +** <<mock-support,Mock support>> +** <<signal-error-handling,Signal & error-handling>> +* Tie-breaker stats +** <<project-kloc,Project KLOC>> +** <<adoption,Adoption>> + +[[license]] +=== License + +We must be able to legally use the framework in connection with Git. As Git is +licensed only under GPLv2, we must eliminate any LGPLv3, GPLv3, or Apache 2.0 +projects. + +[[vendorable-or-ubiquitous]] +=== Vendorable or ubiquitous + +We want to avoid forcing Git developers to install new tools just to run unit +tests. Any prospective frameworks and harnesses must either be vendorable +(meaning, we can copy their source directly into Git's repository), or so +ubiquitous that it is reasonable to expect that most developers will have the +tools installed already. + +[[maintainable-extensible]] +=== Maintainable / extensible + +It is unlikely that any pre-existing project perfectly fits our needs, so any +project we select will need to be actively maintained and open to accepting +changes. Alternatively, assuming we are vendoring the source into our repo, it +must be simple enough that Git developers can feel comfortable making changes as +needed to our version. + +In the comparison table below, "True" means that the framework seems to have +active developers, that it is simple enough that Git developers can make changes +to it, and that the project seems open to accepting external contributions (or +that it is vendorable). "Partial" means that at least one of the above +conditions holds. + +[[major-platform-support]] +=== Major platform support + +At a bare minimum, unit-testing must work on Linux, MacOS, and Windows. + +In the comparison table below, "True" means that it works on all three major +platforms with no issues. "Partial" means that there may be annoyances on one or +more platforms, but it is still usable in principle. + +[[tap-support]] +=== TAP support + +The https://testanything.org/[Test Anything Protocol] is a text-based interface +that allows tests to communicate with a test harness. It is already used by +Git's integration test suite. Supporting TAP output is a mandatory feature for +any prospective test framework. + +In the comparison table below, "True" means this is natively supported. +"Partial" means TAP output must be generated by post-processing the native +output. + +Frameworks that do not have at least Partial support will not be evaluated +further. + +[[diagnostic-output]] +=== Diagnostic output + +When a test case fails, the framework must generate enough diagnostic output to +help developers find the appropriate test case in source code in order to debug +the failure. + +[[runtime-skippable-tests]] +=== Runtime-skippable tests + +Test authors may wish to skip certain test cases based on runtime circumstances, +so the framework should support this. + +[[parallel-execution]] +=== Parallel execution + +Ideally, we will build up a significant collection of unit test cases, most +likely split across multiple executables. It will be necessary to run these +tests in parallel to enable fast develop-test-debug cycles. + +In the comparison table below, "True" means that individual test cases within a +single test executable can be run in parallel. We assume that executable-level +parallelism can be handled by the test harness. + +[[mock-support]] +=== Mock support + +Unit test authors may wish to test code that interacts with objects that may be +inconvenient to handle in a test (e.g. interacting with a network service). +Mocking allows test authors to provide a fake implementation of these objects +for more convenient tests. + +[[signal-error-handling]] +=== Signal & error handling + +The test framework should fail gracefully when test cases are themselves buggy +or when they are interrupted by signals during runtime. + +[[project-kloc]] +=== Project KLOC + +The size of the project, in thousands of lines of code as measured by +https://dwheeler.com/sloccount/[sloccount] (rounded up to the next multiple of +1,000). As a tie-breaker, we probably prefer a project with fewer LOC. + +[[adoption]] +=== Adoption + +As a tie-breaker, we prefer a more widely-used project. We use the number of +GitHub / GitLab stars to estimate this. + + +=== Comparison + +:true: [lime-background]#True# +:false: [red-background]#False# +:partial: [yellow-background]#Partial# + +:gpl: [lime-background]#GPL v2# +:isc: [lime-background]#ISC# +:mit: [lime-background]#MIT# +:expat: [lime-background]#Expat# +:lgpl: [lime-background]#LGPL v2.1# + +:custom-impl: https://lore.kernel.org/git/c902a166-98ce-afba-93f2-ea6027557176@gmail.com/[Custom Git impl.] +:greatest: https://github.com/silentbicycle/greatest[Greatest] +:criterion: https://github.com/Snaipe/Criterion[Criterion] +:c-tap: https://github.com/rra/c-tap-harness/[C TAP] +:check: https://libcheck.github.io/check/[Check] +:clar: https://github.com/clar-test/clar[Clar] + +[format="csv",options="header",width="33%",subs="specialcharacters,attributes,quotes,macros"] +|===== +Framework,"<<license,License>>","<<vendorable-or-ubiquitous,Vendorable or ubiquitous>>","<<maintainable-extensible,Maintainable / extensible>>","<<major-platform-support,Major platform support>>","<<tap-support,TAP support>>","<<diagnostic-output,Diagnostic output>>","<<runtime--skippable-tests,Runtime- skippable tests>>","<<parallel-execution,Parallel execution>>","<<mock-support,Mock support>>","<<signal-error-handling,Signal & error handling>>","<<project-kloc,Project KLOC>>","<<adoption,Adoption>>" +{custom-impl},{gpl},{true},{true},{true},{true},{true},{true},{false},{false},{false},1,0 +{greatest},{isc},{true},{partial},{true},{partial},{true},{true},{false},{false},{false},3,1400 +{criterion},{mit},{false},{partial},{true},{true},{true},{true},{true},{false},{true},19,1800 +{c-tap},{expat},{true},{partial},{partial},{true},{false},{true},{false},{false},{false},4,33 +{check},{lgpl},{false},{partial},{true},{true},{true},{false},{false},{false},{true},17,973 +{clar},{isc},{false},{partial},{true},{true},{true},{true},{false},{false},{true},1,192 +|===== + +=== Additional framework candidates + +Several suggested frameworks have been eliminated from consideration: + +* Incompatible licenses: +** https://github.com/zorgnax/libtap[libtap] (LGPL v3) +** https://cmocka.org/[cmocka] (Apache 2.0) +* Missing source: https://www.kindahl.net/mytap/doc/index.html[MyTap] +* No TAP support: +** https://nemequ.github.io/munit/[µnit] +** https://github.com/google/cmockery[cmockery] +** https://github.com/lpabon/cmockery2[cmockery2] +** https://github.com/ThrowTheSwitch/Unity[Unity] +** https://github.com/siu/minunit[minunit] +** https://cunit.sourceforge.net/[CUnit] + + +== Milestones + +* Add useful tests of library-like code +* Integrate with + https://lore.kernel.org/git/20230502211454.1673000-1-calvinwan@google.com/[stdlib + work] +* Run alongside regular `make test` target |
