<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/run-command.c, branch v2.38.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://www.git.shady.money/git/atom?h=v2.38.4</id>
<link rel='self' href='https://www.git.shady.money/git/atom?h=v2.38.4'/>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/'/>
<updated>2022-08-17T16:21:41Z</updated>
<entry>
<title>pipe_command(): mark stdin descriptor as non-blocking</title>
<updated>2022-08-17T16:21:41Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2022-08-17T06:10:22Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=716c1f649e372a0784b9826cd3839e7b373e2ea9'/>
<id>urn:sha1:716c1f649e372a0784b9826cd3839e7b373e2ea9</id>
<content type='text'>
Our pipe_command() helper lets you both write to and read from a child
process on its stdin/stdout. It's supposed to work without deadlocks
because we use poll() to check when descriptors are ready for reading or
writing. But there's a bug: if both the data to be written and the data
to be read back exceed the pipe buffer, we'll deadlock.

The issue is that the code assumes that if you have, say, a 2MB buffer
to write and poll() tells you that the pipe descriptor is ready for
writing, that calling:

  write(cmd-&gt;in, buf, 2*1024*1024);

will do a partial write, filling the pipe buffer and then returning what
it did write. And that is what it would do on a socket, but not for a
pipe. When writing to a pipe, at least on Linux, it will block waiting
for the child process to read() more. And now we have a potential
deadlock, because the child may be writing back to us, waiting for us to
read() ourselves.

An easy way to trigger this is:

  git -c add.interactive.useBuiltin=true \
      -c interactive.diffFilter=cat \
      checkout -p HEAD~200

The diff against HEAD~200 will be big, and the filter wants to write all
of it back to us (obviously this is a dummy filter, but in the real
world something like diff-highlight would similarly stream back a big
output).

If you set add.interactive.useBuiltin to false, the problem goes away,
because now we're not using pipe_command() anymore (instead, that part
happens in perl). But this isn't a bug in the interactive code at all.
It's the underlying pipe_command() code which is broken, and has been
all along.

We presumably didn't notice because most calls only do input _or_
output, not both. And the few that do both, like gpg calls, may have
large inputs or outputs, but never both at the same time (e.g., consider
signing, which has a large payload but a small signature comes back).

The obvious fix is to put the descriptor into non-blocking mode, and
indeed, that makes the problem go away. Callers shouldn't need to
care, because they never see the descriptor (they hand us a buffer to
feed into it).

The included test fails reliably on Linux without this patch. Curiously,
it doesn't fail in our Windows CI environment, but has been reported to
do so for individual developers. It should pass in any environment after
this patch (courtesy of the compat/ layers added in the last few
commits).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pipe_command(): handle ENOSPC when writing to a pipe</title>
<updated>2022-08-17T16:21:41Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2022-08-17T06:09:42Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=c6d3cce6f3c4d1a8d9ebc556c38f1335afdfeb6c'/>
<id>urn:sha1:c6d3cce6f3c4d1a8d9ebc556c38f1335afdfeb6c</id>
<content type='text'>
When write() to a non-blocking pipe fails because the buffer is full,
POSIX says we should see EAGAIN. But our mingw_write() compat layer on
Windows actually returns ENOSPC for this case. This is probably
something we want to correct, but given that we don't plan to use
non-blocking descriptors in a lot of places, we can work around it by
just catching ENOSPC alongside EAGAIN. If we ever do fix mingw_write(),
then this patch can be reverted.

We don't actually use a non-blocking pipe yet, so this is still just
preparation.

Helped-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pipe_command(): avoid xwrite() for writing to pipe</title>
<updated>2022-08-17T16:21:40Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2022-08-17T06:08:06Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=14eab817e499cb047dd8ba21e688257a06d043f0'/>
<id>urn:sha1:14eab817e499cb047dd8ba21e688257a06d043f0</id>
<content type='text'>
If xwrite() sees an EAGAIN response, it will loop forever until the
write succeeds (or encounters a real error). This is due to ef1cf0167a
(xwrite: poll on non-blocking FDs, 2016-06-26), with the idea that we
won't be surprised by a descriptor unexpectedly set as non-blocking.

But that will make things awkward when we do want a non-blocking
descriptor, and a future patch will switch pipe_command() to using one.
In that case, looping on EAGAIN is bad, because the process on the other
end of the pipe may be waiting on us before doing another read() on the
pipe, which would mean we deadlock.

In practice we're not supposed to ever see EAGAIN here, since poll()
will have just told us the descriptor is ready for writing. But our
Windows emulation of poll() will always return "ready" for writing to a
pipe descriptor! This is due to 94f4d01932 (mingw: workaround for hangs
when sending STDIN, 2020-02-17).

Our best bet in that case is to keep handling other descriptors, as any
read() we do may allow the child command to make forward progress (i.e.,
its write() finishes, and then it read()s from its stdin, freeing up
space in the pipe buffer). This means we might busy-loop between poll()
and write() on Windows if the child command is slow to read our input,
but it's much better than the alternative of deadlocking.

In practice, this busy-looping should be rare:

  - for small inputs, we'll just write the whole thing in a single
    write() anyway, non-blocking or not

  - for larger inputs where the child reads input and then processes it
    before writing (e.g., gpg verifying a signature), we may make a few
    extra write() calls that get EAGAIN during the initial write, but
    once it has taken in the whole input, we'll correctly block waiting
    to read back the data.

  - for larger inputs where the child process is streaming output back
    (like a diff filter), we'll likewise see some extra EAGAINs, but
    most of them will be followed immediately by a read(), which will
    let the child command make forward progress.

Of course it won't happen at all for now, since we don't yet use a
non-blocking pipe. This is just preparation for when we do.

Helped-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'js/wait-or-whine-can-fail'</title>
<updated>2022-06-13T22:53:44Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-06-13T22:53:44Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=da4827056acd8acac882fdfc93fd5b70472927ea'/>
<id>urn:sha1:da4827056acd8acac882fdfc93fd5b70472927ea</id>
<content type='text'>
We used to log an error return from wait_or_whine() as process
termination of the waited child, which was incorrect.

* js/wait-or-whine-can-fail:
  run-command: don't spam trace2_child_exit()
</content>
</entry>
<entry>
<title>Merge branch 'ab/hooks-regression-fix'</title>
<updated>2022-06-13T22:53:41Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-06-13T22:53:41Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=1a7f6be5b17f572fc68ff2a2e0c079d50c671c74'/>
<id>urn:sha1:1a7f6be5b17f572fc68ff2a2e0c079d50c671c74</id>
<content type='text'>
In Git 2.36 we revamped the way how hooks are invoked.  One change
that is end-user visible is that the output of a hook is no longer
directly connected to the standard output of "git" that spawns the
hook, which was noticed post release.  This is getting corrected.

* ab/hooks-regression-fix:
  hook API: fix v2.36.0 regression: hooks should be connected to a TTY
  run-command: add an "ungroup" option to run_process_parallel()
</content>
</entry>
<entry>
<title>run-command: don't spam trace2_child_exit()</title>
<updated>2022-06-07T19:48:19Z</updated>
<author>
<name>Josh Steadmon</name>
<email>steadmon@google.com</email>
</author>
<published>2022-06-07T18:21:57Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=ce3986bb224067ecdb372d7cb7a56d22682e9a31'/>
<id>urn:sha1:ce3986bb224067ecdb372d7cb7a56d22682e9a31</id>
<content type='text'>
In rare cases[1], wait_or_whine() cannot determine a child process's
status (and will return -1 in this case). This can cause Git to issue
trace2 child_exit events despite the fact that the child may still be
running. In pathological cases, we've seen &gt; 80 million exit events in
our trace logs for a single child process.

Fix this by only issuing trace2 events in finish_command_in_signal() if
we get a value other than -1 from wait_or_whine(). This can lead to
missing child_exit events in such a case, but that is preferable to
duplicating events on a scale that threatens to fill the user's
filesystem with invalid trace logs.

[1]: This can happen when:

* waitpid() returns -1 and errno != EINTR
* waitpid() returns an invalid PID
* the status set by waitpid() has neither the WIFEXITED() nor
  WIFSIGNALED() flags

Signed-off-by: Josh Steadmon &lt;steadmon@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>run-command: add an "ungroup" option to run_process_parallel()</title>
<updated>2022-06-07T17:01:41Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-06-07T08:48:19Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=fd3aaf53f713d424d3c08cffa7e76e29b31638ba'/>
<id>urn:sha1:fd3aaf53f713d424d3c08cffa7e76e29b31638ba</id>
<content type='text'>
Extend the parallel execution API added in c553c72eed6 (run-command:
add an asynchronous parallel child processor, 2015-12-15) to support a
mode where the stdout and stderr of the processes isn't captured and
output in a deterministic order, instead we'll leave it to the kernel
and stdio to sort it out.

This gives the API same functionality as GNU parallel's --ungroup
option. As we'll see in a subsequent commit the main reason to want
this is to support stdout and stderr being connected to the TTY in the
case of jobs=1, demonstrated here with GNU parallel:

	$ parallel --ungroup 'test -t {} &amp;&amp; echo TTY || echo NTTY' ::: 1 2
	TTY
	TTY
	$ parallel 'test -t {} &amp;&amp; echo TTY || echo NTTY' ::: 1 2
	NTTY
	NTTY

Another is as GNU parallel's documentation notes a potential for
optimization. As demonstrated in next commit our results with "git
hook run" will be similar, but generally speaking this shows that if
you want to run processes in parallel where the exact order isn't
important this can be a lot faster:

	$ hyperfine -r 3 -L o ,--ungroup 'parallel {o} seq ::: 10000000 &gt;/dev/null '
	Benchmark 1: parallel  seq ::: 10000000 &gt;/dev/null
	  Time (mean ± σ):     220.2 ms ±   9.3 ms    [User: 124.9 ms, System: 96.1 ms]
	  Range (min … max):   212.3 ms … 230.5 ms    3 runs

	Benchmark 2: parallel --ungroup seq ::: 10000000 &gt;/dev/null
	  Time (mean ± σ):     154.7 ms ±   0.9 ms    [User: 136.2 ms, System: 25.1 ms]
	  Range (min … max):   153.9 ms … 155.7 ms    3 runs

	Summary
	  'parallel --ungroup seq ::: 10000000 &gt;/dev/null ' ran
	    1.42 ± 0.06 times faster than 'parallel  seq ::: 10000000 &gt;/dev/null '

A large part of the juggling in the API is to make the API safer for
its maintenance and consumers alike.

For the maintenance of the API we e.g. avoid malloc()-ing the
"pp-&gt;pfd", ensuring that SANITIZE=address and other similar tools will
catch any unexpected misuse.

For API consumers we take pains to never pass the non-NULL "out"
buffer to an API user that provided the "ungroup" option. The
resulting code in t/helper/test-run-command.c isn't typical of such a
user, i.e. they'd typically use one mode or the other, and would know
whether they'd provided "ungroup" or not.

We could also avoid the strbuf_init() for "buffered_output" by having
"struct parallel_processes" use a static PARALLEL_PROCESSES_INIT
initializer, but let's leave that cleanup for later.

Using a global "run_processes_parallel_ungroup" variable to enable
this option is rather nasty, but is being done here to produce as
minimal of a change as possible for a subsequent regression fix. This
change is extracted from a larger initial version[1] which ends up
with a better end-state for the API, but in doing so needed to modify
all existing callers of the API. Let's defer that for now, and
narrowly focus on what we need for fixing the regression in the
subsequent commit.

It's safe to do this with a global variable because:

 A) hook.c is the only user of it that sets it to non-zero, and before
    we'll get any other API users we'll refactor away this method of
    passing in the option, i.e. re-roll [1].

 B) Even if hook.c wasn't the only user we don't have callers of this
    API that concurrently invoke this parallel process starting API
    itself in parallel.

As noted above "A" &amp;&amp; "B" are rather nasty, and we don't want to live
with those caveats long-term, but for now they should be an acceptable
compromise.

1. https://lore.kernel.org/git/cover-v2-0.8-00000000000-20220518T195858Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>run-command API users: use "env" not "env_array" in comments &amp; names</title>
<updated>2022-06-02T21:31:27Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-06-02T09:09:51Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=b3193252c4278e9039fbb896a35f84abc1fb5aac'/>
<id>urn:sha1:b3193252c4278e9039fbb896a35f84abc1fb5aac</id>
<content type='text'>
Follow-up on a preceding commit which changed all references to the
"env_array" when referring to the "struct child_process" member. These
changes are all unnecessary for the compiler, but help the code's
human readers.

All the comments that referred to "env_array" have now been updated,
as well as function names and variables that had "env_array" in their
name, they now refer to "env".

In addition the "out" name for the submodule.h prototype was
inconsistent with the function definition's use of "env_array" in
submodule.c. Both of them use "env" now.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>run-command API: rename "env_array" to "env"</title>
<updated>2022-06-02T21:31:16Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-06-02T09:09:50Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=29fda24dd11e90583f3ea9ff2f90ee9acacd7792'/>
<id>urn:sha1:29fda24dd11e90583f3ea9ff2f90ee9acacd7792</id>
<content type='text'>
Start following-up on the rename mentioned in c7c4bdeccf3 (run-command
API: remove "env" member, always use "env_array", 2021-11-25) of
"env_array" to "env".

The "env_array" name was picked in 19a583dc39e (run-command: add
env_array, an optional argv_array for env, 2014-10-19) because "env"
was taken. Let's not forever keep the oddity of "*_array" for this
"struct strvec", but not for its "args" sibling.

This commit is almost entirely made with a coccinelle rule[1]. The
only manual change here is in run-command.h to rename the struct
member itself and to change "env_array" to "env" in the
CHILD_PROCESS_INIT initializer.

The rest of this is all a result of applying [1]:

 * make contrib/coccinelle/run_command.cocci.patch
 * patch -p1 &lt;contrib/coccinelle/run_command.cocci.patch
 * git add -u

1. cat contrib/coccinelle/run_command.pending.cocci
   @@
   struct child_process E;
   @@
   - E.env_array
   + E.env

   @@
   struct child_process *E;
   @@
   - E-&gt;env_array
   + E-&gt;env

I've avoided changing any comments and derived variable names here,
that will all be done in the next commit.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ab/config-based-hooks-2'</title>
<updated>2022-02-09T22:21:00Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-02-09T22:21:00Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=c70bc338e9a35b45263c3c68913ad516e9e70d62'/>
<id>urn:sha1:c70bc338e9a35b45263c3c68913ad516e9e70d62</id>
<content type='text'>
More "config-based hooks".

* ab/config-based-hooks-2:
  run-command: remove old run_hook_{le,ve}() hook API
  receive-pack: convert push-to-checkout hook to hook.h
  read-cache: convert post-index-change to use hook.h
  commit: convert {pre-commit,prepare-commit-msg} hook to hook.h
  git-p4: use 'git hook' to run hooks
  send-email: use 'git hook run' for 'sendemail-validate'
  git hook run: add an --ignore-missing flag
  hooks: convert worktree 'post-checkout' hook to hook library
  hooks: convert non-worktree 'post-checkout' hook to hook library
  merge: convert post-merge to use hook.h
  am: convert applypatch-msg to use hook.h
  rebase: convert pre-rebase to use hook.h
  hook API: add a run_hooks_l() wrapper
  am: convert {pre,post}-applypatch to use hook.h
  gc: use hook library for pre-auto-gc hook
  hook API: add a run_hooks() wrapper
  hook: add 'run' subcommand
</content>
</entry>
</feed>
