<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/pack-objects.c, branch v1.2.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://www.git.shady.money/git/atom?h=v1.2.5</id>
<link rel='self' href='https://www.git.shady.money/git/atom?h=v1.2.5'/>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/'/>
<updated>2006-04-04T06:42:25Z</updated>
<entry>
<title>safe_fgets() - even more anal fgets()</title>
<updated>2006-04-04T06:42:25Z</updated>
<author>
<name>Junio C Hamano</name>
<email>junkio@cox.net</email>
</author>
<published>2006-04-04T06:41:09Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=687dd75c95f9212244b6cf4fe60b40db44de01ba'/>
<id>urn:sha1:687dd75c95f9212244b6cf4fe60b40db44de01ba</id>
<content type='text'>
This is from Linus -- the previous round forgot to clear error
after EINTR case.

Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>pack-objects: be incredibly anal about stdio semantics</title>
<updated>2006-04-02T20:46:27Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-04-02T20:31:54Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=da93d12b00425a37e81e227671f13130efcfe93f'/>
<id>urn:sha1:da93d12b00425a37e81e227671f13130efcfe93f</id>
<content type='text'>
This is the "letter of the law" version of using fgets() properly in the
face of incredibly broken stdio implementations.  We can work around the
Solaris breakage with SA_RESTART, but in case anybody else is ever that
stupid, here's the "safe" (read: "insanely anal") way to use fgets.

It probably goes without saying that I'm not terribly impressed by
Solaris libc.

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>Fix Solaris stdio signal handling stupidities</title>
<updated>2006-04-02T20:41:56Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-04-02T20:28:27Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=fb7a6531e67333b22967bf5b96ef22a28f3b2552'/>
<id>urn:sha1:fb7a6531e67333b22967bf5b96ef22a28f3b2552</id>
<content type='text'>
This uses sigaction() to install the SIGALRM handler with SA_RESTART, so
that Solaris stdio doesn't break completely when a signal interrupts a
read.

Thanks to Jason Riedy for confirming the silly Solaris signal behaviour.

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>pack-objects eye-candy: finishing touches.</title>
<updated>2006-02-23T00:02:59Z</updated>
<author>
<name>Junio C Hamano</name>
<email>junkio@cox.net</email>
</author>
<published>2006-02-23T00:02:59Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=183bdb2cccff792f11fd9e825df67af446aff171'/>
<id>urn:sha1:183bdb2cccff792f11fd9e825df67af446aff171</id>
<content type='text'>
This updates the progress output to match "every one second or
every percent whichever comes early" used by unpack-objects, as
discussed on the list.

Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>also adds progress when actually writing a pack</title>
<updated>2006-02-22T22:51:58Z</updated>
<author>
<name>Nicolas Pitre</name>
<email>nico@cam.org</email>
</author>
<published>2006-02-22T22:41:32Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=5e8dc750ee56d8c295ecd7478a6bd5d148cb7177'/>
<id>urn:sha1:5e8dc750ee56d8c295ecd7478a6bd5d148cb7177</id>
<content type='text'>
If that pack is big, it takes significant time to write and might
benefit from some more eye candies as well.  This is however disabled
when the pack is written to stdout since in that case the output is
usually piped into unpack_objects which already does its own progress
reporting.

Signed-off-by: Nicolas Pitre &lt;nico@cam.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>nicer eye candies for pack-objects</title>
<updated>2006-02-22T21:15:26Z</updated>
<author>
<name>Nicolas Pitre</name>
<email>nico@cam.org</email>
</author>
<published>2006-02-22T21:00:08Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=b2504a0d2ff5a51feb516f7732beb9549b5db454'/>
<id>urn:sha1:b2504a0d2ff5a51feb516f7732beb9549b5db454</id>
<content type='text'>
This provides a stable and simpler progress reporting mechanism that
updates progress as often as possible but accurately not updating more
than once a second.  The deltification phase is also made more
interesting to watch (since repacking a big repository and only seeing a
dot appear once every many seconds is rather boring and doesn't provide
much food for anticipation).

Signed-off-by: Nicolas Pitre &lt;nico@cam.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>pack-objects: avoid delta chains that are too long.</title>
<updated>2006-02-22T21:14:57Z</updated>
<author>
<name>Junio C Hamano</name>
<email>junkio@cox.net</email>
</author>
<published>2006-02-18T04:58:45Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=15b4d577ae2e0117b7b5a4add2217442a8458812'/>
<id>urn:sha1:15b4d577ae2e0117b7b5a4add2217442a8458812</id>
<content type='text'>
This tries to rework the solution for the excess delta chain
problem. An earlier commit worked it around ``cheaply'', but
repeated repacking risks unbound growth of delta chains.

This version counts the length of delta chain we are reusing
from the existing pack, and makes sure a base object that has
sufficiently long delta chain does not get deltified.

Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>pack-objects: finishing touches.</title>
<updated>2006-02-22T21:14:57Z</updated>
<author>
<name>Junio C Hamano</name>
<email>junkio@cox.net</email>
</author>
<published>2006-02-16T19:55:51Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=ab7cd7bb8c02dc40ca3a909653e8f56226f9e440'/>
<id>urn:sha1:ab7cd7bb8c02dc40ca3a909653e8f56226f9e440</id>
<content type='text'>
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series.  This may become necessary if
repeated repacking makes delta chain too long.  With this, the
output of the command becomes identical to that of the older
implementation.  But the performance suffers greatly.

It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.

It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.

  $ time old-git-pack-objects --stdout &gt;/dev/null &lt;RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects....................
  real    12m8.530s       user    11m1.450s       sys     0m57.920s
  $ time git-pack-objects --stdout &gt;/dev/null &lt;RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects.....................
  Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
  real    0m59.549s       user    0m56.670s       sys     0m2.400s
  $ time git-pack-objects --stdout --no-reuse-delta &gt;/dev/null &lt;RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects.....................
  Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
  real    11m13.830s      user    9m45.240s       sys     0m44.330s

There is one remaining issue when --no-reuse-delta option is not
used.  It can create delta chains that are deeper than specified.

    A&lt;--B&lt;--C&lt;--D   E   F   G

Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.

B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:

    E&lt;--F&lt;--G&lt;--A

Oops.  We ended up making D a bit too deep, didn't we?  B, C and
D form a chain on top of A!

This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta.  Unfortunately, deferring the decision until just before
the deltification is not an option.  To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.

To prevent this from happening, we should keep A from being
deltified.  But how would we tell that, cheaply?

To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3).  Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.

However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.

Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>pack-objects: reuse data from existing packs.</title>
<updated>2006-02-22T21:14:56Z</updated>
<author>
<name>Junio C Hamano</name>
<email>junkio@cox.net</email>
</author>
<published>2006-02-16T01:34:29Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=3f9ac8d259fb919e001671c5e403e5fceaabf0d8'/>
<id>urn:sha1:3f9ac8d259fb919e001671c5e403e5fceaabf0d8</id>
<content type='text'>
When generating a new pack, notice if we have already needed
objects in existing packs.  If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.

Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed).  In such a case, we can just copy it instead of
going through the usual uncompressing &amp; recompressing cycle.

Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:

    $ git-rev-list --objects v2.6.16-rc3 &gt;RL
    $ wc -l RL
    184141 RL
    $ time git-pack-objects p &lt;RL
    Generating pack...
    Done counting 184141 objects.
    Packing 184141 objects....................
    a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2

    real    12m4.323s
    user    11m2.560s
    sys     0m55.950s

With this patch, the same input:

    $ time ../git.junio/git-pack-objects q &lt;RL
    Generating pack...
    Done counting 184141 objects.
    Packing 184141 objects.....................
    a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
    Total 184141, written 184141, reused 182441

    real    1m2.608s
    user    0m55.090s
    sys     0m1.830s

Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>Make pack-objects chattier.</title>
<updated>2006-02-12T21:01:54Z</updated>
<author>
<name>Junio C Hamano</name>
<email>junkio@cox.net</email>
</author>
<published>2006-02-12T21:01:54Z</published>
<link rel='alternate' type='text/html' href='https://www.git.shady.money/git/commit/?id=024701f1d88d79f3777bf45c82437f40a80b6eaa'/>
<id>urn:sha1:024701f1d88d79f3777bf45c82437f40a80b6eaa</id>
<content type='text'>
You could give -q to squelch it, but currently no tool does it.
This would make 'git clone host:repo here' over ssh not silent
again.

Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
</feed>
