[英]How to use Python's difflib to produce side-by-side comparison of two files similar to Unix sdiff command?
I am using Python 2.6 and I want to create a simple GUI with two side-by-side text panes comparing two text files (file1.txt & file2.txt).我正在使用 Python 2.6,我想创建一个简单的 GUI,其中有两个并排的文本窗格比较两个文本文件(file1.txt 和 file2.txt)。
I am using difflib but it is not clear for me how to produce a result similar to the sdiff Unix command.我正在使用difflib ,但我不清楚如何生成类似于sdiff Unix 命令的结果。
In order to reproduce a side-by-side comparison, I need difflib to return two variables file1_diff
and file2_diff
, for instance.例如,为了重现并排比较,我需要difflib返回两个变量
file1_diff
和file2_diff
。
I have also considered to use sdiff output directly and parse it to separate the panes but it turned out not to be as easy as it seems... Any hints?我也考虑过直接使用sdiff output 并解析它以分隔窗格,但结果并不像看起来那么容易......有什么提示吗?
How about something like this?这样的事情怎么样?
>>> a = ['cat', 'dog', 'horse']
>>> b = ['cat', 'horse', 'chicken']
>>> comparison = list(l for l in difflib.Differ().compare(a,b) if not l.startswith('?'))
>>> left = [l[2:] if l.startswith((' ', '-')) else '' for l in comparison]
>>> right = [l[2:] if l.startswith((' ', '+')) else '' for l in comparison]
>>> left
['cat', 'dog', 'horse', '']
>>> right
['cat', '', 'horse', 'chicken']
You can use difflib.Differ to return a single sequence of lines with a marker at the start of each line which describes the line.您可以使用difflib.Differ返回单个行序列,在描述该行的每行开头处带有一个标记。 The markers tell you the following information about the line:
标记会告诉您有关该行的以下信息:
Marker![]() |
Description![]() |
---|---|
'- ' |
line unique to file 1![]() |
'+ ' |
line unique to file 2![]() |
' ' |
line common to both files![]() |
'? ' |
line not present in either input files![]() |
You can use this information to decide how to display the data.您可以使用此信息来决定如何显示数据。 For example, if the marker is
例如,如果标记是
, you put the line both in the left and right widgets. ,您将线放在左右小部件中。 If it's
+
, you could put a blank line on the left and the actual line on the right showing that the line is unique to the text on the right.如果它是
+
,您可以在左侧放置一个空行,在右侧放置实际行,以表明该行对于右侧的文本是唯一的。 Likewise, -
means the line is unique to the left.同样,
-
表示该行在左侧是唯一的。
For example, you can create two text widgets t1
and t2
, one for the left and one for the right.例如,您可以创建两个文本小部件
t1
和t2
,一个用于左侧,一个用于右侧。 You can compare two files by creating a list of lines for each and then passing them to the compare
method of the differ and then iterating over the results.您可以通过为每个文件创建一个行列表,然后将它们传递给差异的
compare
方法,然后迭代结果来比较两个文件。
t1 = tk.Text(...)
t2 = tk.Text(...)
f1 = open("file1.txt", "r").readlines()
f2 = open("file2.txt", "r").readlines()
differ = difflib.Differ()
for line in differ.compare(f1, f2):
marker = line[0]
if marker == " ":
# line is same in both
t1.insert("end", line[2:])
t2.insert("end", line[2:])
elif marker == "-":
# line is only on the left
t1.insert("end", line[2:])
t2.insert("end", "\n")
elif marker == "+":
# line is only on the right
t1.insert("end", "\n")
t2.insert("end", line[2:])
The above code ignores lines with the marker ?
上面的代码忽略带有标记的行
?
since those are extra lines that attempt to bring attention to the different characters on the previous line and aren't actually part of either file.因为这些是额外的行,试图引起人们对前一行不同字符的注意,并且实际上不是任何一个文件的一部分。 You could use that information to highlight the individual characters if you wish.
如果您愿意,您可以使用该信息来突出显示各个字符。
Building on @Bryan Oakley's answer, I wrote a quick Gist:基于@Bryan Oakley 的回答,我写了一个简短的要点:
https://gist.github.com/jlumbroso/3ef433b4402b4f157728920a66cc15ed https://gist.github.com/jlumbroso/3ef433b4402b4f157728920a66cc15ed
with a side-by-side diff method (including the method to produce this side-by-side arrangement using the textwrap
library) that you can call on two lists of lines:使用并排差异方法(包括使用
textwrap
库生成这种并排排列的方法),您可以在两个行列表上调用:
print(better_diff(
["a", "c", "a", "a", "a", "a", "a", "a", "e"],
["a", "c", "b", "a", "a", "a", "a", "d", "a", "a"],
width=20,
as_string=True,
left_title=" LEFT",
))
will produce:将产生:
LEFT |
-------- | --------
a | a
c | c
| b
a | a
a | a
a | a
a | a
| d
a | a
a | a
e |
I've tried to do files diff with difflib.context_diff :我试图用difflib.context_diff做文件差异:
diff = difflib.context_diff(fromlines, tolines, fromfile='file1.txt', tofile='file2.txt')
sys.stdout.writelines(diff)
In this case your output will be something like this:在这种情况下,您的输出将是这样的:
*** file1.txt
--- file2.txt
***************
*** 1,6 ****
! aasdf
qwer
123
! poiu
! xzcv34
xzcv
--- 1,6 ----
! asdf
qwer
+ mnbv
123
! cvnn
xzcv
In this case you'll be able easily to separate each file diff, but I'm not sure if you will be satisfied by the output of context_diff.在这种情况下,您将能够轻松地分离每个文件差异,但我不确定您是否会对 context_diff 的输出感到满意。 You haven't mentioned in what way you're using the difflib.
您还没有提到您使用 difflib 的方式。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.