git/diff.c, branch v2.10.1

Merge branch 'js/regexec-buf' into maint

2016-09-29T23:49:45Z

Some codepaths in "git diff" used regexec(3) on a buffer that was mmap(2)ed, which may not have a terminating NUL, leading to a read beyond the end of the mapped region. This was fixed by introducing a regexec_buf() helper that takes a pair with REG_STARTEND extension. * js/regexec-buf: regex: use regexec_buf() regex: add regexec_buf() that can work on a non NUL-terminated string regex: -G feeds a non NUL-terminated string to regexec() and fails

regex: use regexec_buf()

2016-09-21T20:56:15Z

The new regexec_buf() function operates on buffers with an explicitly specified length, rather than NUL-terminated strings. We need to use this function whenever the buffer we want to pass to regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated). Note: the original motivation for this patch was to fix a bug where `git diff -G ` would crash. This patch converts more callers, though, some of which allocated to construct NUL-terminated strings, or worse, modified buffers to temporarily insert NULs while calling regexec(3). By converting them to use regexec_buf(), the code has become much cleaner. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano

diff: remove dead code

2016-09-08T20:54:37Z

When `len < 1`, len has to be 0 or negative, emit_line will then remove the first character and by then `len` would be negative. As this doesn't happen, it is safe to assume it is dead code. This continues to simplify the code, which was started in b8d9c1a66b (2009-09-03, diff.c: the builtin_diff() deals with only two-file comparison). Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano

diff: omit found pointer from emit_callback

2016-09-08T20:54:23Z

We keep the actual data in the diff options, which are just as accessible. Remove the pointer stored in struct emit_callback for readability. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano

diff.c: use diff_options directly

2016-09-08T20:46:46Z

The value of `ecbdata->opt` is accessible via the short variable `o` already, so let's use that instead. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano

Merge branch 'kw/patch-ids-optim'

2016-08-12T16:47:39Z

When "git rebase" tries to compare set of changes on the updated upstream and our own branch, it computes patch-id for all of these changes and attempts to find matches. This has been optimized by lazily computing the full patch-id (which is expensive) to be compared only for changes that touch the same set of paths. * kw/patch-ids-optim: rebase: avoid computing unnecessary patch IDs patch-ids: add flag to create the diff patch id using header only data patch-ids: replace the seen indicator with a commit pointer patch-ids: stop using a hand-rolled hashmap implementation

Merge branch 'jk/diff-do-not-reuse-wtf-needs-cleaning'

2016-08-03T22:10:29Z

There is an optimization used in "git diff $treeA $treeB" to borrow an already checked-out copy in the working tree when it is known to be the same as the blob being compared, expecting that open/mmap of such a file is faster than reading it from the object store, which involves inflating and applying delta. This however kicked in even when the checked-out copy needs to go through the convert-to-git conversion (including the clean filter), which defeats the whole point of the optimization. The optimization has been disabled when the conversion is necessary. * jk/diff-do-not-reuse-wtf-needs-cleaning: diff: do not reuse worktree files that need "clean" conversion

patch-ids: add flag to create the diff patch id using header only data

2016-07-29T21:10:01Z

This will allow a diff patch id to be created using only the header data so that the contents of the file will not have to be loaded. Signed-off-by: Kevin Willford Signed-off-by: Junio C Hamano

diff: do not reuse worktree files that need "clean" conversion

2016-07-22T19:31:24Z

When accessing a blob for a diff, we may try to reuse file contents in the working tree, under the theory that it is faster to mmap those file contents than it would be to extract the content from the object database. When we have to filter those contents, though, that assumption does not hold. Even for our internal conversions like CRLF, we have to allocate and fill a new buffer anyway. But much worse, for external clean filters we have to exec an arbitrary script, and we have no idea how expensive it may be to run. So let's skip this optimization when conversion into git's "clean" form is required. This applies whenever the "want_file" flag is false. When it's true, the caller actually wants the smudged worktree contents, which the reused file by definition already has (in fact, this is a key optimization going the other direction, since reusing the worktree file there lets us skip smudge filters). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

Merge branch 'bc/cocci'

2016-07-19T20:22:16Z

Conversion from unsigned char sha1[20] to struct object_id continues. * bc/cocci: diff: convert prep_temp_blob() to struct object_id merge-recursive: convert merge_recursive_generic() to object_id merge-recursive: convert leaf functions to use struct object_id merge-recursive: convert struct merge_file_info to object_id merge-recursive: convert struct stage_data to use object_id diff: rename struct diff_filespec's sha1_valid member diff: convert struct diff_filespec to struct object_id coccinelle: apply object_id Coccinelle transformations coccinelle: convert hashcpy() with null_sha1 to hashclr() contrib/coccinelle: add basic Coccinelle transforms hex: add oid_to_hex_r()