简体   繁体   中英

Python Diff Two Multiline Strings Like GitHub

I want to achieve a diff output like github's commit diff view . And I tried this:

import difflib

first = """
def
baz
"""

second = """
deff
ba
bar
foo
"""

diff = ''
for text in difflib.unified_diff(first, second):
    for prefix in ('---', '+++', '@@'):
        if text.startswith(prefix):
            break
    else:
        diff += text

The output is:

 d e f+f 
 b a-z 
+b+a+r+
+f+o+o+

How can I achieve,

1 def+f
2 ba-z
+
3 bar
4 foo
# -
# 5 line
# 6 line

an output just like this. Thanks.

I'm not quite sure what format you mean with gitlab; I've not seen char-by-char diffs in gitlab like your example. If you want a more standardish line-by-line output, then I think you just have to pass lists to the diff function:

for text in difflib.unified_diff(first.split("\n"), second.split("\n")):
    if text[:3] not in ('+++', '---', '@@ '):
        print text

As every line is different in your example, diff is just going to see each line as having been totally changed and give you an output like:

-def
-baz
+deff
+ba
+bar
+foo

If you want to do something more fancy, you can treat the data as a single string (as you were) and then try and split on new-lines. The return format seems to be "{operation}{char}" (including new line chars), so you can group and detect lines which all have the same operation and apply the correct logic.

I can't quite work out the rules you're trying to apply based on your example (are you grouping all mixed lines, then added lines then removed lines or something else?), so I can't give you an exact example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM