简体   繁体   English

如何使用 Python 的 difflib 生成类似于 Unix sdiff 命令的两个文件的并排比较?

[英]How to use Python's difflib to produce side-by-side comparison of two files similar to Unix sdiff command?

I am using Python 2.6 and I want to create a simple GUI with two side-by-side text panes comparing two text files (file1.txt & file2.txt).我正在使用 Python 2.6,我想创建一个简单的 GUI,其中有两个并排的文本窗格比较两个文本文件(file1.txt 和 file2.txt)。

I am using difflib but it is not clear for me how to produce a result similar to the sdiff Unix command.我正在使用difflib ,但我不清楚如何生成类似于sdiff Unix 命令的结果。

In order to reproduce a side-by-side comparison, I need difflib to return two variables file1_diff and file2_diff , for instance.例如,为了重现并排比较,我需要difflib返回两个变量file1_difffile2_diff

I have also considered to use sdiff output directly and parse it to separate the panes but it turned out not to be as easy as it seems... Any hints?我也考虑过直接使用sdiff output 并解析它以分隔窗格,但结果并不像看起来那么容易......有什么提示吗?

How about something like this?这样的事情怎么样?

>>> a = ['cat', 'dog', 'horse']
>>> b = ['cat', 'horse', 'chicken']
>>> comparison = list(l for l in difflib.Differ().compare(a,b) if not l.startswith('?'))
>>> left = [l[2:] if l.startswith((' ', '-')) else '' for l in comparison]
>>> right = [l[2:] if l.startswith((' ', '+')) else '' for l in comparison]
>>> left
['cat', 'dog', 'horse', '']
>>> right
['cat', '', 'horse', 'chicken']

You can use difflib.Differ to return a single sequence of lines with a marker at the start of each line which describes the line.您可以使用difflib.Differ返回单个行序列,在描述该行的每行开头处带有一个标记。 The markers tell you the following information about the line:标记会告诉您有关该行的以下信息:

Marker标记 Description描述
'- ' line unique to file 1文件 1 独有的行
'+ ' line unique to file 2文件 2 独有的行
' ' line common to both files两个文件共有的行
'? ' line not present in either input files任一输入文件中都不存在该行

You can use this information to decide how to display the data.您可以使用此信息来决定如何显示数据。 For example, if the marker is例如,如果标记是 , you put the line both in the left and right widgets. ,您将线放在左右小部件中。 If it's + , you could put a blank line on the left and the actual line on the right showing that the line is unique to the text on the right.如果它是+ ,您可以在左侧放置一个空行,在右侧放置实际行,以表明该行对于右侧的文本是唯一的。 Likewise, - means the line is unique to the left.同样, -表示该行在左侧是唯一的。

For example, you can create two text widgets t1 and t2 , one for the left and one for the right.例如,您可以创建两个文本小部件t1t2 ,一个用于左侧,一个用于右侧。 You can compare two files by creating a list of lines for each and then passing them to the compare method of the differ and then iterating over the results.您可以通过为每个文件创建一个行列表,然后将它们传递给差异的compare方法,然后迭代结果来比较两个文件。

t1 = tk.Text(...)
t2 = tk.Text(...)

f1 = open("file1.txt", "r").readlines()
f2 = open("file2.txt", "r").readlines()

differ = difflib.Differ()
for line in differ.compare(f1, f2):
    marker = line[0]
    if marker == " ":
        # line is same in both
        t1.insert("end", line[2:])
        t2.insert("end", line[2:])

    elif marker == "-":
        # line is only on the left
        t1.insert("end", line[2:])
        t2.insert("end", "\n")

    elif marker == "+":
        # line is only on the right
        t1.insert("end", "\n")
        t2.insert("end", line[2:])

The above code ignores lines with the marker ?上面的代码忽略带有标记的行? since those are extra lines that attempt to bring attention to the different characters on the previous line and aren't actually part of either file.因为这些是额外的行,试图引起人们对前一行不同字符的注意,并且实际上不是任何一个文件的一部分。 You could use that information to highlight the individual characters if you wish.如果您愿意,您可以使用该信息来突出显示各个字符。

Building on @Bryan Oakley's answer, I wrote a quick Gist:基于@Bryan Oakley 的回答,我写了一个简短的要点:

https://gist.github.com/jlumbroso/3ef433b4402b4f157728920a66cc15ed https://gist.github.com/jlumbroso/3ef433b4402b4f157728920a66cc15ed

with a side-by-side diff method (including the method to produce this side-by-side arrangement using the textwrap library) that you can call on two lists of lines:使用并排差异方法(包括使用textwrap库生成这种并排排列的方法),您可以在两个行列表上调用:

print(better_diff(
    ["a", "c",      "a", "a", "a", "a",      "a", "a", "e"],
    ["a", "c", "b", "a", "a", "a", "a", "d", "a", "a"],
    width=20,
    as_string=True,
    left_title="  LEFT",
))

will produce:将产生:

  LEFT   | 
-------- | --------
a        | a
c        | c
         | b
a        | a
a        | a
a        | a
a        | a
         | d
a        | a
a        | a
e        | 

I've tried to do files diff with difflib.context_diff :我试图用difflib.context_diff做文件差异:

diff = difflib.context_diff(fromlines, tolines, fromfile='file1.txt', tofile='file2.txt')
sys.stdout.writelines(diff)

In this case your output will be something like this:在这种情况下,您的输出将是这样的:

*** file1.txt
--- file2.txt
***************
*** 1,6 ****
! aasdf
  qwer
  123
! poiu
! xzcv34
  xzcv
--- 1,6 ----
! asdf
  qwer
+ mnbv
  123
! cvnn
  xzcv

In this case you'll be able easily to separate each file diff, but I'm not sure if you will be satisfied by the output of context_diff.在这种情况下,您将能够轻松地分离每个文件差异,但我不确定您是否会对 context_diff 的输出感到满意。 You haven't mentioned in what way you're using the difflib.您还没有提到您使用 difflib 的方式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM