Does increasing the number of unified diff context lines have any downsides?

Question

By default, diff -u and git diff produce unified diffs with context lines. Apart from the size of the diff file itself, is there any disadvantage to increasing the number of context lines? I assume that it may help in cases where the file(s) to be patched have been modified since the patch was made. Specifically, if you increase the number of context lines, are there cases where patch will fail, where it wouldn't have if you hadn't done that?

Answer 1

Yes. Consider the following case:

There's a file f1 :

a
b
c
d
e
f
g

You modify the f line, and get either

--- f1  2013-04-15 13:23:57.524966109 +0200
+++ f2  2013-04-15 13:25:17.832965720 +0200
@@ -5,3 +5,3 @@
 e
-f
+f2
 g

or

--- f1  2013-04-15 13:23:57.524966109 +0200
+++ f2  2013-04-15 13:25:17.832965720 +0200
@@ -1,7 +1,7 @@
 a
 b
 c
 d
 e
-f
+f2
 g

depending on whether you use the -U1 or the -U5 option with diff . Assume now that someone else edited the upper section of the file as follows:

a
b1
c
d
e
f
g

Here's the output of the two patch commands:

$ patch f3 < u1.patch 
patching file f3
$ patch f3 < u5.patch 
patching file f3
Hunk #1 succeeded at 1 with fuzz 2.

The patch was successfully applied in both scenarios, however, in the second run we had to use a fuzz value of 2. What does that mean?

First patch looks for a place where all lines of the context match. If no such place is found, and it's a context diff, and the maximum fuzz factor is set to 1 or more, then another scan takes place ignoring the first and last line of context. If that fails, and the maximum fuzz factor is set to 2 or more, the first two and last two lines of context are ignored, and another scan is made.

As you can see from this description from man patch , the patch created with the -U5 version will take longer to apply in such a scenario, or even worse, if the fuzz value used by patch isn't big enough, applying the patch will fail.

Answer 2

The size of context has two major impacts on the patch:

The bigger the context, the more you are sure you are applying the patch in the correct context.
The bigger the context, the more changes may be grouped in one hunk.

Think of the following example (misspellings intentional):

original file:                     Changed file:

This is                            This is
some tex.                          some text.
You are on                         You are on
stackoverflo.com                   stackoverflo.com
Completely unrelated               Completely unrelated
tet here.                          text here.
Goodbye.                           Goodbye.

Which a context size of 1 line, you will get a patch with two hunks:

@@ -1,2 +1,3 @@
 This is
-some tex.
+some text.
 You are on
@@ -4,3 +5,3 @@
 Completely unrelated
-tet here.
+text here.
 Goodbye.

And with a context size of 3 lines, you will get a patch with one hunk:

@@ -1,6 +1,7 @@
 This is
-some tex.
+some text.
 You are on
 stackoverflo.com
 Completely unrelated
-tet here.
+text here.
 Goodbye.

Now imagine a second fix:

Changed file:                      Further changed file:

This is                            This is
some text.                         some text.
You are on                         You are on
stackoverflo.com                   stackoverflow.com
Completely unrelated               Completely unrelated
text here.                         text here.
Goodbye.                           Goodbye.

The patch here is:

@@ -3,3 +3,3 @@
 You are on
-stackoverflo.com
+stackoverflow.com
 Completely unrelated

Now, let's say you reverse the order of patching. So you first apply the second patch (fixing flow in the address) and then one of the first patches (either with -U 1 or -U 3 ).

In the case of -U 1 the patches are applied cleanly.
In the case of -U 3 the patches are not clean and patching may fail or accepted with fuzz.

Conclusion

Think of the extremes first. If you have zero context, it is very east to get the patch wrong or the hunks applied to the wrong lines. If you have infinite context, then every patch basically becomes a whole replacement of the file, making it hard to reorder patches.

So it's easy to understand that too low and too high contexts are both bad. So obviously there should be a trade-off between better matching and spotting individual changes.

Unfortunately, there is no optimal number of lines and one could say the default context size is what the collective developer minds have come to accept as a fair trade-off. You could increase it if it helps your cause, but be careful of the implications.

Does increasing the number of unified diff context lines have any downsides?

Question

2 answers

solution1
4 ACCPTED 2013-04-15 11:33:09

solution2
2 2013-04-15 11:46:27

Conclusion

Does increasing the number of unified diff context lines have any downsides?

Question

2 answers

solution1 4 ACCPTED 2013-04-15 11:33:09

solution2 2 2013-04-15 11:46:27

Conclusion

solution1
4 ACCPTED 2013-04-15 11:33:09

solution2
2 2013-04-15 11:46:27