[英]How to compare 2 txt files in Python
我已经写了一个程序,文件比较new1.txt
与new2.txt
,哪些是那里的线new1.txt
而不是new2.txt
已被写入到difference.txt file
。
有人可以看看,让我知道下面给出的代码需要哪些更改。 代码多次打印相同的值。
file1 = open("new1.txt",'r')
file2 = open("new2.txt",'r')
NewFile = open("difference.txt",'w')
for line1 in file1:
for line2 in file2:
if line2 != line1:
NewFile.write(line1)
file1.close()
file2.close()
NewFile.close()
这是一个使用with
语句的例子,假设文件不是太大而不适合内存
# Open 'new1.txt' as f1, 'new2.txt' as f2 and 'diff.txt' as outf
with open('new1.txt') as f1, open('new2.txt') as f2, open('diff.txt', 'w') as outf:
# Read the lines from 'new2.txt' and store them into a python set
lines = set(f2.readlines())
# Loop through each line in 'new1.txt'
for line in f1:
# If the line was not in 'new2.txt'
if line not in lines:
# Write the line to the output file
outf.write(line)
with
语句只是自动关闭打开的文件。 这两段代码是相同的:
with open('temp.log') as temp:
temp.write('Temporary logging.')
# equal to:
temp = open('temp.log')
temp.write('Temporary logging.')
temp.close()
然而,使用其他路两个set
S,但是这又是不太内存effecient。 如果你的文件很大,这不会起作用:
# Again, open the three files as f1, f2 and outf
with open('new1.txt') as f1, open('new2.txt') as f2, open('diff.txt', 'w') as outf:
# Read the lines in 'new1.txt' and 'new2.txt'
s1, s2 = set(f1.readlines()), set(f2.readlines())
# `s1 - s2 | s2 - s2` returns the differences between two sets
# Now we simply loop through the different lines
for line in s1 - s2 | s2 - s1:
# And output all the different lines
outf.write(line)
请记住,最后一个代码可能无法保持行的顺序
例如,你有file1:line1 line2
和file2:line1 line3 line4
当你比较line1和line3时,你写入你的输出文件new line(line1),然后你去比较line1和line4,再次它们不相等,所以再次打印到你的输出文件(line1)...你需要如果你的条件是真的,要打破s。 您可以使用一些帮助变量来打破外部。
这是因为你的for循环。
如果我理解得很好,你想看看file2中的哪些行不存在于file2中。
因此,对于file1中的每一行,您必须检查file2中是否出现相同的行。 但这不是你用你的代码做的:对于file1中的每一行,你检查file2中的每一行(这是正确的),但每次file2中的行与file1的行不同时,你在file1中打印行! 因此,只有在检查了file2中的所有行之后才应在file1中打印行,以确保该行至少不出现一次。
它看起来像下面的东西:
file1 = open("new1.txt",'r')
file2 = open("new2.txt",'r')
NewFile = open("difference.txt",'w')
for line1 in file1:
if line1 not in file2:
NewFile.write(line1)
file1.close()
file2.close()
NewFile.close()
如果您的文件很大。您可以使用此文件。 for-else method
:
第二个for循环下面的else方法仅在第二个for循环完成时才执行,如果没有匹配则执行out break
修改:
with open('new1.txt') as file1, open('diff.txt', 'w') as NewFile :
for line1 in file1:
with open('new2.txt') as file2:
for line2 in file2:
if line2 == line1:
break
else:
NewFile.write(line1)
我总是觉得使用套装可以更容易地比较两个集合。 特别是因为“这个集合包含这个”操作运行i O(1),并且大多数嵌套循环可以简化为单个循环(在我看来更容易阅读)。
with open('test1.txt') as file1, open('test2.txt') as file2, open('diff.txt', 'w') as diff:
s1 = set(file1)
s2 = set(file2)
for e in s1:
if e not in s2:
diff.write(e)
您的循环执行多次。 为避免这种情况,请使用:
file1 = open("new1.txt",'r')
file2 = open("new2.txt",'r')
NewFile = open("difference.txt",'w')
for line1, line2 in izip(file1, file2):
if line2 != line1:
NewFile.write(line1)
file1.close()
file2.close()
NewFile.close()
只有在与file2的所有行进行比较后才能打印到NewFile
present = False
for line2 in file2:
if line2 == line1:
present = True
if not present:
NewFile.write(line1)
您可以使用基本的集合操作:
with open('new1.txt') as f1, open('new2.txt') as f2, open('diffs.txt', 'w') as diffs:
diffs.writelines(set(f1).difference(f2))
根据该参考文献 ,这将用O(n)执行,其中n是第一个文件中的行数。 如果您知道第二个文件明显小于第一个文件,则可以使用set.difference_update()
进行优化。 这具有复杂度O(n),其中n是第二文件中的行数。 例如:
with open('new1.txt') as f1, open('new2.txt') as f2, open('diffs.txt', 'w') as diffs:
s = set(f1)
s.difference_update(f2)
diffs.writelines(s)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.