| Age | Commit message (Collapse) | Author | Files | Lines |
|
The previous change allowed for a Git server to advertise the
'bundle-uri' command as a capability based on the
uploadPack.advertiseBundleURIs config option. Create a set of tests that
check that this capability is advertised using 'git ls-remote'.
In order to test this functionality across three protocols (file, git,
and http), create lib-bundle-uri-protocol.sh to generalize the tests,
allowing the other test scripts to set an environment variable and
otherwise inherit the setup and tests from this script.
The tests currently only test that the 'bundle-uri' command is
advertised or not. Other actions will be tested as the Git client learns
to request the 'bundle-uri' command and parse its response.
To help with URI escaping, specifically for file paths with a space in
them, extract a 'sed' invocation from t9199-git-svn-info.sh into a
helper function for use here, too.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a skeleton server-side implementation of a new "bundle-uri" command
to protocol v2. This will allow conforming clients to optionally seed
their initial clones or incremental fetches from URLs containing
"*.bundle" files created with "git bundle create".
This change only performs the basic boilerplate of advertising a new
protocol v2 capability. The new 'bundle-uri' capability allows a client
to request a list of bundles. Right now, the server only returns a flush
packet, which corresponds to an empty advertisement. The bundle.* config
namespace describes which key-value pairs will be communicated across
this interface in future updates.
The critical bit right now is that the new boolean
uploadPack.adverstiseBundleURIs config value signals whether or not this
capability should be advertised at all.
An earlier version of this patch [1] used a different transfer format
than the "key=value" pairs in the current implementation. The change was
made to unify the protocol v2 command with the bundle lists provided by
independent bundle servers. Further, the standard allows for the server
to advertise a URI that contains a bundle list. This allows users
automatically discovering bundle providers that are loosely associated
with the origin server, but without the origin server knowing exactly
which bundles are currently available.
[1] https://lore.kernel.org/git/RFC-patch-v2-01.13-2fc87ce092b-20220311T155841Z-avarab@gmail.com/
The very-deep headings needed to be modified to stop at level 4 due to
documentation build issues. These were not recognized in earlier builds
since the file was previously in the Documentation/technical/ directory
and was built in a different way. With its current location, the
heavily-nested details were causing build issues and they are now
replaced with a bulletted list of details.
Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since aef7d75e580 (builtin/bundle.c: let parse-options parse
subcommands, 2022-08-19) we've been segfaulting if no argument was
provided.
The fix is easy, as all of the "git bundle" subcommands require a
non-option argument we can check that we have arguments left after
calling parse-options().
This makes use of code added in 73c3253d75e (bundle: framework for
options before bundle file, 2019-11-10), before this change that code
has always been unreachable. In 73c3253d75e we'd never reach it as we
already checked "argc < 2" in cmd_bundle() itself.
Then when aef7d75e580 (whose segfault we're fixing here) migrated this
code to the subcommand API it removed that "argc < 2" check, but we
were still checking the wrong "argc" in parse_options_cmd_bundle(), we
need to check the "newargc". The "argc" will always be >= 1, as it
will necessarily contain at least the subcommand name
itself (e.g. "create").
As an aside, this could be safely squashed into this, but let's not do
that for this minimal segfault fix, as it's an unrelated refactoring:
--- a/builtin/bundle.c
+++ b/builtin/bundle.c
@@ -55,13 +55,12 @@ static int parse_options_cmd_bundle(int argc,
const char * const usagestr[],
const struct option options[],
char **bundle_file) {
- int newargc;
- newargc = parse_options(argc, argv, NULL, options, usagestr,
+ argc = parse_options(argc, argv, NULL, options, usagestr,
PARSE_OPT_STOP_AT_NON_OPTION);
- if (!newargc)
+ if (!argc)
usage_with_options(usagestr, options);
*bundle_file = prefix_filename(prefix, argv[0]);
- return newargc;
+ return argc;
}
static int cmd_bundle_create(int argc, const char **argv, const char *prefix) {
Reported-by: Hubert Jasudowicz <hubertj@stmcyber.pl>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Tested-by: Hubert Jasudowicz <hubertj@stmcyber.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Even though the cat-file command with `--batch-check` option does not
complain when `--use-mailmap` option is given, the latter option is
ignored. Compute the size of the object after replacing the idents and
report it instead.
In order to make `--batch-check` option honour the mailmap mechanism we
have to read the contents of the commit/tag object.
There were two ways to do it:
1. Make two calls to `oid_object_info_extended()`. If `--use-mailmap`
option is given, the first call will get us the type of the object
and second call will only be made if the object type is either a
commit or tag to get the contents of the object.
2. Make one call to `oid_object_info_extended()` to get the type of the
object. Then, if the object type is either of commit or tag, make a
call to `repo_read_object_file()` to read the contents of the object.
I benchmarked the following command with both the above approaches and
compared against the current implementation where `--use-mailmap`
option is ignored:
`git cat-file --use-mailmap --batch-all-objects --batch-check --buffer
--unordered`
The results can be summarized as follows:
Time (mean ± σ)
default 827.7 ms ± 104.8 ms
first approach 6.197 s ± 0.093 s
second approach 1.975 s ± 0.217 s
Since, the second approach is faster than the first one, I implemented
it in this patch.
The command git cat-file can now use the mailmap mechanism to replace
idents with canonical versions for commit and tag objects. There are
several options like `--batch`, `--batch-check` and `--batch-command`
that can be combined with `--use-mailmap`. But the documentation for
`--batch`, `--batch-check` and `--batch-command` doesn't say so. This
patch fixes that documentation.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Even though the cat-file command with `-s` option does not complain when
`--use-mailmap` option is given, the latter option is ignored. Compute
the size of the object after replacing the idents and report it instead.
In order to make `-s` option honour the mailmap mechanism we have to
read the contents of the commit/tag object. Make use of the call to
`oid_object_info_extended()` to get the contents of the object and store
in `buf`. `buf` is later freed in the function.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the -L argument to "git log" is passed the zero-width regular
expression "$" (as in "-L :$:line-range.c"), this results in an
infinite loop in find_funcname_matching_regexp().
Modify find_funcname_matching_regexp to correctly match the entire line
instead of the zero-width match at eol and update the loop condition to
prevent an infinite loop in the event of other undiscovered corner cases.
The primary change is that we pre-decrement the beginning-of-line marker
('bol') before comparing it to '\n'. In the case of '$', where we match the
'\n' at the end of the line and start the loop with bol == eol, this
ensures that bol will find the beginning of the line on which the match
occurred.
Originally reported in <https://stackoverflow.com/q/74690545/147356>.
Signed-off-by: Lars Kellogg-Stedman <lars@oddbit.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Fix a pair of bugs in 'git branch'.
* rj/branch-copy-and-rename:
branch: force-copy a branch to itself via @{-1} is a no-op
|
|
The advice message given by "git status" when it takes long time to
enumerate untracked paths has been updated.
* rr/status-untracked-advice:
status: modernize git-status "slow untracked files" advice
|
|
Introduce a case insensitive mode to the Bash completion helpers.
* aw/complete-case-insensitive:
completion: add case-insensitive match of pseudorefs
completion: add optional ignore-case when matching refs
|
|
Test fix.
* rs/t3920-crlf-eating-grep-fix:
t3920: support CR-eating grep
|
|
Test fix.
* js/t3920-shell-and-or-fix:
t3920: don't ignore errors of more than one command with `|| true`
|
|
Test fix.
* ab/t4023-avoid-losing-exit-status-of-diff:
t4023: fix ignored exit codes of git
|
|
Test fix.
* ab/t7600-avoid-losing-exit-status-of-git:
t7600: don't ignore "rev-parse" exit code in helper
|
|
Test fix.
* ab/t5314-avoid-losing-exit-status:
t5314: check exit code of "git"
|
|
Test fix.
* rs/t4205-do-not-exit-in-test-script:
t4205: don't exit test script on failure
|
|
Unlike other test helper functions, 'test_oid' doesn't terminate its
output with a LF, but, alas, the reason for this, if any, is not
mentioned in 2c02b110da (t: add test functions to translate
hash-related values, 2018-09-13)).
Now, in the vast majority of cases 'test_oid' is invoked in a command
substitution that is part of a heredoc or supplies an argument to a
command or the value to a variable, and the command substitution would
chop off any trailing LFs, so in these cases the lack or presence of a
trailing LF in its output doesn't matter. However:
- There appear to be only three cases where 'test_oid' is not
invoked in a command substitution:
$ git grep '\stest_oid ' -- ':/t/*.sh'
t0000-basic.sh: test_oid zero >actual &&
t0000-basic.sh: test_oid zero >actual &&
t0000-basic.sh: test_oid zero >actual &&
These are all in test cases checking that 'test_oid' actually
works, and that the size of its output matches the size of the
corresponding hash function with conditions like
test $(wc -c <actual) -eq 40
In these cases the lack of trailing LF does actually matter,
though they could be trivially updated to account for the presence
of a trailing LF.
- There are also a few cases where the lack of trailing LF in
'test_oid's output actually hurts, because tests need to compare
its output with LF terminated file contents, forcing developers to
invoke it as 'echo $(test_oid ...)' to append the missing LF:
$ git grep 'echo "\?$(test_oid ' -- ':/t/*.sh'
t1302-repo-version.sh: echo $(test_oid version) >expect &&
t1500-rev-parse.sh: echo "$(test_oid algo)" >expect &&
t4044-diff-index-unique-abbrev.sh: echo "$(test_oid val1)" > foo &&
t4044-diff-index-unique-abbrev.sh: echo "$(test_oid val2)" > foo &&
t5313-pack-bounds-checks.sh: echo $(test_oid oidfff) >file &&
And there is yet another similar case in an in-flight topic at:
https://public-inbox.org/git/813e81a058227bd373cec802e443fcd677042fb4.1670862677.git.gitgitgadget@gmail.com/
Arguably we would be better off if 'test_oid' terminated its output
with a LF. So let's update 'test_oid' accordingly, update its tests
in t0000 to account for the extra character in those size tests, and
remove the now unnecessary 'echo $(...)' command substitutions around
'test_oid' invocations as well.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The editor program used by Git when editing the sequencer "todo" file
is determined by examining a few environment variables and also
affected by configuration variables. Introduce "git var
GIT_SEQUENCE_EDITOR" to give users access to the final result of the
logic without having to know the exact details.
This is very similar in spirit to 44fcb497 (Teach git var about
GIT_EDITOR, 2009-11-11) that introduced "git var GIT_EDITOR".
Signed-off-by: Sean Allred <allred.sean@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If you pass a bogus argument to %(refname), you may end up with a
message like this:
$ git for-each-ref --format='%(refname:foo)'
fatal: unrecognized %(refname:foo) argument: foo
which is confusing. It should just say:
fatal: unrecognized %(refname) argument: foo
which is clearer, and is consistent with most other atom parsers. Those
other parsers do not have the same problem because they pass the atom
name from a string literal in the parser function. But because the
parser for %(refname) also handles %(upstream) and %(push), it instead
uses atom->name, which includes the arguments. The oid atom parser which
handles %(tree), %(parent), etc suffers from the same problem.
It seems like the cleanest fix would be for atom->name to be _just_ the
name, since there's already a separate "args" field. But since that
field is also used for other things, we can't change it easily (e.g.,
it's how we find things in the used_atoms array, and clearly %(refname)
and %(refname:short) are not the same thing).
Instead, we'll teach our error_bad_arg() function to stop at the first
":". This is a little hacky, as we're effectively re-parsing the name,
but the format is simple enough to do this as a one-liner, and this
localizes the change to the error-reporting code.
We'll give the same treatment to err_no_arg(). None of its callers use
this atom->name trick, but it's worth future-proofing it while we're
here.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Atom parsers that take arguments generally have a catch-all for "this
arg is not recognized". Most of them use the same printf template, which
is good, because it makes life easier for translators. Let's pull this
template into a helper function, which makes the code in the parsers
shorter and avoids any possibility of differences.
As with the previous commit, we'll pick an arbitrary atom to make sure
the test suite covers this code.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Many atom parsers give the same error message, differing only in the
name of the atom. If we use "%s does not take arguments", that should
make life easier for translators, as they only need to translate one
string. And in doing so, we can easily pull it into a helper function to
make sure they are all using the exact same string.
I've added a basic test here for %(HEAD), just to make sure this code is
exercised at all in the test suite. We could cover each such atom, but
the effort-to-reward ratio of trying to maintain an exhaustive list
doesn't seem worth it.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A regression was introduced in
12fc4ad89e (diff.c: use utf8_strwidth() to count display width, 2022-09-14)
that causes missing newlines after "Unmerged" entries in `git diff
--cached --stat` output.
This problem affects v2.39.0-rc0 through v2.39.0.
Add the missing newline along with a new test to cover this
behavior.
Signed-off-by: Peter Grayson <pete@jpgrayson.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Test fix.
* js/t0021-windows-pwd:
t0021: use Windows-friendly `pwd`
|
|
"git var UNKNOWN_VARIABLE" and "git var VARIABLE" with the variable
given an empty value used to behave identically. Now the latter
just gives an empty output, while the former still gives an error
message.
* sa/git-var-empty:
var: allow GIT_EDITOR to return null
var: do not print usage() with a correct invocation
|
|
Fix a bug where `pack-objects` would not respect multiple `--filter`
arguments when invoked directly.
* rs/multi-filter-args:
list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT()
pack-objects: simplify --filter handling
pack-objects: fix handling of multiple --filter options
t5317: demonstrate failure to handle multiple --filter options
t5317: stop losing return codes of git ls-files
|
|
The pack-bitmap machinery is taught to log the paths of redundant
bitmap(s) to trace2 instead of stderr.
* tl/pack-bitmap-absolute-paths:
pack-bitmap.c: trace bitmap ignore logs when midx-bitmap is found
pack-bitmap.c: break out of the bitmap loop early if not tracing
pack-bitmap.c: avoid exposing absolute paths
pack-bitmap.c: remove unnecessary "open_pack_index()" calls
|
|
Various leak fixes.
* ab/various-leak-fixes:
built-ins: use free() not UNLEAK() if trivial, rm dead code
revert: fix parse_options_concat() leak
cherry-pick: free "struct replay_opts" members
rebase: don't leak on "--abort"
connected.c: free the "struct packed_git"
sequencer.c: fix "opts->strategy" leak in read_strategy_opts()
ls-files: fix a --with-tree memory leak
revision API: call graph_clear() in release_revisions()
unpack-file: fix ancient leak in create_temp_file()
built-ins & libs & helpers: add/move destructors, fix leaks
dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache"
read-cache.c: clear and free "sparse_checkout_patterns"
commit: discard partial cache before (re-)reading it
{reset,merge}: call discard_index() before returning
tests: mark tests as passing with SANITIZE=leak
|
|
"merge-tree" learns a new `--merge-base` option.
* kz/merge-tree-merge-base:
docs: fix description of the `--merge-base` option
merge-tree.c: allow specifying the merge-base when --stdin is passed
merge-tree.c: add --merge-base=<commit> option
|
|
`git bisect` becomes a builtin.
* dd/git-bisect-builtin:
bisect; remove unused "git-bisect.sh" and ".gitignore" entry
Turn `git bisect` into a full built-in
bisect--helper: log: allow arbitrary number of arguments
bisect--helper: handle states directly
bisect--helper: emit usage for "git bisect"
bisect test: test exit codes on bad usage
bisect--helper: identify as bisect when report error
bisect-run: verify_good: account for non-negative exit status
bisect run: keep some of the post-v2.30.0 output
bisect: fix output regressions in v2.30.0
bisect: refactor bisect_run() to match CodingGuidelines
bisect tests: test for v2.30.0 "bisect run" regressions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Currently, auto correction doesn't work reliably for commands which must
run in a work tree (e.g. `git status`) in Git work trees which are
created from a bare repository.
As far as I'm able to determine, this has been broken since commit
659fef199f (help: use early config when autocorrecting aliases,
2017-06-14), where the call to `git_config()` in `help_unknown_cmd()`
was replaced with a call to `read_early_config()`. From what I can tell,
the actual cause for the unexpected error is that we call
`git_default_config()` in the `git_unknown_cmd_config` callback instead
of simply returning `0` for config entries which we aren't interested
in.
Calling `git_default_config()` in this callback to `read_early_config()`
seems like a bad idea since those calls will initialize a bunch of state
in `environment.c` (among other things `is_bare_repository_cfg`) before
we've properly detected that we're running in a work tree.
All other callbacks provided to `read_early_config()` appear to only
extract their configurations while simply returning `0` for all other
config keys.
This commit changes the `git_unknown_cmd_config` callback to not call
`git_default_config()`. Instead we also simply return `0` for config
keys which we're not interested in.
Additionally the commit adds a new test case covering `help.autocorrect`
in a work tree created from a bare clone.
Signed-off-by: Simon Gerber <gesimu@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When Git's test suite uses `test_cmp`, it is not actually trying to
compare binary files as the name `cmp` would suggest to users familiar
with Unix' tools, but the tests instead verify that actual output
matches the expected text.
On Unix, `cmp` works well enough for Git's purposes because only Line
Feed characters are used as line endings. However, on Windows, while
most tools accept Line Feeds as line endings, many tools produce
Carriage Return + Line Feed line endings, including some of the tools
used by the test suite (which are therefore provided via Git for Windows
SDK). Therefore, `cmp` would frequently fail merely due to different
line endings.
To accommodate for that, the `mingw_test_cmp` function was introduced
into Git's test suite to perform a line-by-line comparison that ignores
line endings. This function is a Bash function that is only used on
Windows, everywhere else `cmp` is used.
This is a double whammy because `cmp` is fast, and `mingw_test_cmp` is
slow, even more so on Windows because it is a Bash script function, and
Bash scripts are known to run particularly slowly on Windows due to
Bash's need for the POSIX emulation layer provided by the MSYS2 runtime.
The commit message of 32ed3314c104 (t5351: avoid using `test_cmp` for
binary data, 2022-07-29) provides an illuminating account of the
consequences: On Windows, the platform on which Git could really use all
the help it can get to improve its performance, the time spent on one
entire test script was reduced from half an hour to less than half a
minute merely by avoiding a single call to `mingw_test_cmp` in but a
single test case.
Learning the lesson to avoid shell scripting wherever possible, the Git
for Windows project implemented a minimal replacement for
`mingw_test_cmp` in the form of a `test-tool` subcommand that parses the
input files line by line, ignoring line endings, and compares them.
Essentially the same thing as `mingw_test_cmp`, but implemented in
C instead of Bash. This solution served the Git for Windows project
well, over years.
However, when this solution was finally upstreamed, the conclusion was
reached that a change to use `git diff --no-index` instead of
`mingw_test_cmp` was more easily reviewed and hence should be used
instead.
The reason why this approach was not even considered in Git for Windows
is that in 2007, there was already a motion on the table to use Git's
own diff machinery to perform comparisons in Git's test suite, but it
was dismissed in https://lore.kernel.org/git/xmqqbkrpo9or.fsf@gitster.g/
as undesirable because tests might potentially succeed due to bugs in
the diff machinery when they should not succeed, and those bugs could
therefore hide regressions that the tests try to prevent.
By the time Git for Windows' `mingw-test-cmp` in C was finally
contributed to the Git mailing list, reviewers agreed that the diff
machinery had matured enough and should be used instead.
When the concern was raised that the diff machinery, due to its
complexity, would perform substantially worse than the test helper
originally implemented in the Git for Windows project, a test
demonstrated that these performance differences are well lost within the
100+ minutes it takes to run Git's test suite on Windows.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In b3b1a21d1a5 (sequencer: rewrite update-refs as user edits todo list,
2022-07-19), the 'todo_list_filter_update_refs()' step was added to handle
the removal of 'update-ref' lines from a 'rebase-todo'. Specifically, it
removes potential ref updates from the "update refs state" if a ref does not
have a corresponding 'update-ref' line.
However, because 'write_update_refs_state()' will not update the state if
the 'refs_to_oids' list was empty, removing *all* 'update-ref' lines will
result in the state remaining unchanged from how it was initialized (with
all refs' "after" OID being null). Then, when the ref update is applied, all
refs will be updated to null and consequently deleted.
To fix this, delete the 'update-refs' state file when 'refs_to_oids' is
empty. Additionally, add a tests covering "all update-ref lines removed"
cases.
Reported-by: herr.kaste <herr.kaste@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
|
|
Recently, a vulnerability was reported that can lead to an out-of-bounds
write when reading an unreasonably large gitattributes file. The root
cause of this error are multiple integer overflows in different parts of
the code when there are either too many lines, when paths are too long,
when attribute names are too long, or when there are too many attributes
declared for a pattern.
As all of these are related to size, it seems reasonable to restrict the
size of the gitattributes file via git-fsck(1). This allows us to both
stop distributing known-vulnerable objects via common hosting platforms
that have fsck enabled, and users to protect themselves by enabling the
`fetch.fsckObjects` config.
There are basically two checks:
1. We verify that size of the gitattributes file is smaller than
100MB.
2. We verify that the maximum line length does not exceed 2048
bytes.
With the preceding commits, both of these conditions would cause us to
either ignore the complete gitattributes file or blob in the first case,
or the specific line in the second case. Now with these consistency
checks added, we also grow the ability to stop distributing such files
in the first place when `receive.fsckObjects` is enabled.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
|
|
Both the padding and wrapping formatting directives allow the caller to
specify an integer that ultimately leads to us adding this many chars to
the result buffer. As a consequence, it is trivial to e.g. allocate 2GB
of RAM via a single formatting directive and cause resource exhaustion
on the machine executing this logic. Furthermore, it is debatable
whether there are any sane usecases that require the user to pad data to
2GB boundaries or to indent wrapped data by 2GB.
Restrict the input sizes to 16 kilobytes at a maximum to limit the
amount of bytes that can be requested by the user. This is not meant
as a fix because there are ways to trivially amplify the amount of
data we generate via formatting directives; the real protection is
achieved by the changes in previous steps to catch and avoid integer
wraparound that causes us to under-allocate and access beyond the
end of allocated memory reagions. But having such a limit
significantly helps fuzzing the pretty format, because the fuzzer is
otherwise quite fast to run out-of-memory as it discovers these
formatters.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In `strbuf_utf8_replace()`, we call `utf8_width()` to compute the width
of the current glyph. If the glyph is a control character though it can
be that `utf8_width()` returns `-1`, but because we assign this value to
a `size_t` the conversion will cause us to underflow. This bug can
easily be triggered with the following command:
$ git log --pretty='format:xxx%<|(1,trunc)%x10'
>From all I can see though this seems to be a benign underflow that has
no security-related consequences.
Fix the bug by using an `int` instead. When we see a control character,
we now copy it into the target buffer but don't advance the current
width of the string.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The return type of both `utf8_strwidth()` and `utf8_strnwidth()` is
`int`, but we operate on string lengths which are typically of type
`size_t`. This means that when the string is longer than `INT_MAX`, we
will overflow and thus return a negative result.
This can lead to an out-of-bounds write with `--pretty=format:%<1)%B`
and a commit message that is 2^31+1 bytes long:
=================================================================
==26009==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000001168 at pc 0x7f95c4e5f427 bp 0x7ffd8541c900 sp 0x7ffd8541c0a8
WRITE of size 2147483649 at 0x603000001168 thread T0
#0 0x7f95c4e5f426 in __interceptor_memcpy /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
#1 0x5612bbb1068c in format_and_pad_commit pretty.c:1763
#2 0x5612bbb1087a in format_commit_item pretty.c:1801
#3 0x5612bbc33bab in strbuf_expand strbuf.c:429
#4 0x5612bbb110e7 in repo_format_commit_message pretty.c:1869
#5 0x5612bbb12d96 in pretty_print_commit pretty.c:2161
#6 0x5612bba0a4d5 in show_log log-tree.c:781
#7 0x5612bba0d6c7 in log_tree_commit log-tree.c:1117
#8 0x5612bb691ed5 in cmd_log_walk_no_free builtin/log.c:508
#9 0x5612bb69235b in cmd_log_walk builtin/log.c:549
#10 0x5612bb6951a2 in cmd_log builtin/log.c:883
#11 0x5612bb56c993 in run_builtin git.c:466
#12 0x5612bb56d397 in handle_builtin git.c:721
#13 0x5612bb56db07 in run_argv git.c:788
#14 0x5612bb56e8a7 in cmd_main git.c:923
#15 0x5612bb803682 in main common-main.c:57
#16 0x7f95c4c3c28f (/usr/lib/libc.so.6+0x2328f)
#17 0x7f95c4c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
#18 0x5612bb5680e4 in _start ../sysdeps/x86_64/start.S:115
0x603000001168 is located 0 bytes to the right of 24-byte region [0x603000001150,0x603000001168)
allocated by thread T0 here:
#0 0x7f95c4ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
#1 0x5612bbcdd556 in xrealloc wrapper.c:136
#2 0x5612bbc310a3 in strbuf_grow strbuf.c:99
#3 0x5612bbc32acd in strbuf_add strbuf.c:298
#4 0x5612bbc33aec in strbuf_expand strbuf.c:418
#5 0x5612bbb110e7 in repo_format_commit_message pretty.c:1869
#6 0x5612bbb12d96 in pretty_print_commit pretty.c:2161
#7 0x5612bba0a4d5 in show_log log-tree.c:781
#8 0x5612bba0d6c7 in log_tree_commit log-tree.c:1117
#9 0x5612bb691ed5 in cmd_log_walk_no_free builtin/log.c:508
#10 0x5612bb69235b in cmd_log_walk builtin/log.c:549
#11 0x5612bb6951a2 in cmd_log builtin/log.c:883
#12 0x5612bb56c993 in run_builtin git.c:466
#13 0x5612bb56d397 in handle_builtin git.c:721
#14 0x5612bb56db07 in run_argv git.c:788
#15 0x5612bb56e8a7 in cmd_main git.c:923
#16 0x5612bb803682 in main common-main.c:57
#17 0x7f95c4c3c28f (/usr/lib/libc.so.6+0x2328f)
SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 in __interceptor_memcpy
Shadow bytes around the buggy address:
0x0c067fff81d0: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
0x0c067fff81e0: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd
0x0c067fff81f0: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
0x0c067fff8200: fd fd fd fa fa fa fd fd fd fd fa fa 00 00 00 fa
0x0c067fff8210: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
=>0x0c067fff8220: fd fa fa fa fd fd fd fa fa fa 00 00 00[fa]fa fa
0x0c067fff8230: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8240: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8250: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8260: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8270: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==26009==ABORTING
Now the proper fix for this would be to convert both functions to return
an `size_t` instead of an `int`. But given that this commit may be part
of a security release, let's instead do the minimal viable fix and die
in case we see an overflow.
Add a test that would have previously caused us to crash.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `utf8_strnwidth()` function calls `utf8_width()` in a loop and adds
its returned width to the end result. `utf8_width()` can return `-1`
though in case it reads a control character, which means that the
computed string width is going to be wrong. In the worst case where
there are more control characters than non-control characters, we may
even return a negative string width.
Fix this bug by treating control characters as having zero width.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `%w(width,indent1,indent2)` formatting directive can be used to
rewrap text to a specific width and is designed after git-shortlog(1)'s
`-w` parameter. While the three parameters are all stored as `size_t`
internally, `strbuf_add_wrapped_text()` accepts integers as input. As a
result, the casted integers may overflow. As these now-negative integers
are later on passed to `strbuf_addchars()`, we will ultimately run into
implementation-defined behaviour due to casting a negative number back
to `size_t` again. On my platform, this results in trying to allocate
9000 petabyte of memory.
Fix this overflow by using `cast_size_t_to_int()` so that we reject
inputs that cannot be represented as an integer.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When a formatting directive has a `+` or ` ` after the `%`, then we add
either a line feed or space if the placeholder expands to a non-empty
string. In specific cases though this logic doesn't work as expected,
and we try to add the character even in the case where the formatting
directive is empty.
One such pattern is `%w(1)%+d%+w(2)`. `%+d` expands to reference names
pointing to a certain commit, like in `git log --decorate`. For a tagged
commit this would for example expand to `\n (tag: v1.0.0)`, which has a
leading newline due to the `+` modifier and a space added by `%d`. Now
the second wrapping directive will cause us to rewrap the text to
`\n(tag:\nv1.0.0)`, which is one byte shorter due to the missing leading
space. The code that handles the `+` magic now notices that the length
has changed and will thus try to insert a leading line feed at the
original posititon. But as the string was shortened, the original
position is past the buffer's boundary and thus we die with an error.
Now there are two issues here:
1. We check whether the buffer length has changed, not whether it
has been extended. This causes us to try and add the character
past the string boundary.
2. The current logic does not make any sense whatsoever. When the
string got expanded due to the rewrap, putting the separator into
the original position is likely to put it somewhere into the
middle of the rewrapped contents.
It is debatable whether `%+w()` makes any sense in the first place.
Strictly speaking, the placeholder never expands to a non-empty string,
and consequentially we shouldn't ever accept this combination. We thus
fix the bug by simply refusing `%+w()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|