如何在Python中比较2个txt文件

Question

我已经写了一个程序，文件比较new1.txt与new2.txt ，哪些是那里的线new1.txt而不是new2.txt已被写入到difference.txt file 。

有人可以看看，让我知道下面给出的代码需要哪些更改。 代码多次打印相同的值。

file1 = open("new1.txt",'r')        
file2 = open("new2.txt",'r')    
NewFile = open("difference.txt",'w')   
for line1 in file1:    
    for line2 in file2:    
        if line2 != line1:    
            NewFile.write(line1)    
file1.close()    
file2.close()
NewFile.close()

Answer 1

这是一个使用with语句的例子，假设文件不是太大而不适合内存

# Open 'new1.txt' as f1, 'new2.txt' as f2 and 'diff.txt' as outf
with open('new1.txt') as f1, open('new2.txt') as f2, open('diff.txt', 'w') as outf:

    # Read the lines from 'new2.txt' and store them into a python set
    lines = set(f2.readlines())

    # Loop through each line in 'new1.txt'
    for line in f1:

        # If the line was not in 'new2.txt'
        if line not in lines:

            # Write the line to the output file
            outf.write(line)

with语句只是自动关闭打开的文件。 这两段代码是相同的：

with open('temp.log') as temp:
    temp.write('Temporary logging.')

# equal to:

temp = open('temp.log')
temp.write('Temporary logging.')
temp.close()

然而，使用其他路两个set S，但是这又是不太内存effecient。 如果你的文件很大，这不会起作用：

# Again, open the three files as f1, f2 and outf
with open('new1.txt') as f1, open('new2.txt') as f2, open('diff.txt', 'w') as outf:

    # Read the lines in 'new1.txt' and 'new2.txt'
    s1, s2 = set(f1.readlines()), set(f2.readlines())

    # `s1 - s2 | s2 - s2` returns the differences between two sets
    # Now we simply loop through the different lines
    for line in s1 - s2 | s2 - s1:

        # And output all the different lines
        outf.write(line)

请记住，最后一个代码可能无法保持行的顺序

Answer 2

例如，你有file1：line1 line2

和file2：line1 line3 line4

当你比较line1和line3时，你写入你的输出文件new line（line1），然后你去比较line1和line4，再次它们不相等，所以再次打印到你的输出文件（line1）...你需要如果你的条件是真的，要打破s。 您可以使用一些帮助变量来打破外部。

Answer 3

这是因为你的for循环。

如果我理解得很好，你想看看file2中的哪些行不存在于file2中。

因此，对于file1中的每一行，您必须检查file2中是否出现相同的行。 但这不是你用你的代码做的：对于file1中的每一行，你检查file2中的每一行（这是正确的），但每次file2中的行与file1的行不同时，你在file1中打印行！ 因此，只有在检查了file2中的所有行之后才应在file1中打印行，以确保该行至少不出现一次。

它看起来像下面的东西：

file1 = open("new1.txt",'r')        
file2 = open("new2.txt",'r')
NewFile = open("difference.txt",'w')

for line1 in file1:
    if line1 not in file2:
        NewFile.write(line1)

file1.close()
file2.close()
NewFile.close()

Answer 4

如果您的文件很大。您可以使用此文件。 for-else method ：

第二个for循环下面的else方法仅在第二个for循环完成时才执行，如果没有匹配则执行out break

修改：

with open('new1.txt') as file1,  open('diff.txt', 'w') as NewFile :  
    for line1 in file1:    
       with open('new2.txt') as file2:
           for line2 in file2:    
               if line2 == line1: 
                   break
           else:
               NewFile.write(line1)

有关for-else方法的更多信息，请参阅此堆栈溢出问题for-else

Answer 5

我总是觉得使用套装可以更容易地比较两个集合。 特别是因为“这个集合包含这个”操作运行i O（1），并且大多数嵌套循环可以简化为单个循环（在我看来更容易阅读）。

with open('test1.txt') as file1, open('test2.txt') as file2, open('diff.txt', 'w') as diff:
    s1 = set(file1)
    s2 = set(file2)
    for e in s1:
        if e not in s2:
            diff.write(e)

Answer 6

您的循环执行多次。 为避免这种情况，请使用：

file1 = open("new1.txt",'r')        
file2 = open("new2.txt",'r')    
NewFile = open("difference.txt",'w')
for line1, line2 in izip(file1, file2):    
        if line2 != line1:    
            NewFile.write(line1)
file1.close()    
file2.close()
NewFile.close()

Answer 7

只有在与file2的所有行进行比较后才能打印到NewFile

present = False
for line2 in file2:    
    if line2 == line1:
        present = True
if not present:
    NewFile.write(line1)

Answer 8

您可以使用基本的集合操作：

with open('new1.txt') as f1, open('new2.txt') as f2, open('diffs.txt', 'w') as diffs:
    diffs.writelines(set(f1).difference(f2))

根据该参考文献，这将用O（n）执行，其中n是第一个文件中的行数。 如果您知道第二个文件明显小于第一个文件，则可以使用set.difference_update()进行优化。 这具有复杂度O（n），其中n是第二文件中的行数。 例如：

with open('new1.txt') as f1, open('new2.txt') as f2, open('diffs.txt', 'w') as diffs:
    s = set(f1)
    s.difference_update(f2)
    diffs.writelines(s)

如何在Python中比较2个txt文件

问题描述

8 个解决方案

解决方案1
3 已采纳 2015-07-30 10:46:06

解决方案2
1 2015-07-30 10:41:53

解决方案3
1 2015-07-30 10:44:37

解决方案4
1 2015-07-30 10:57:03

解决方案5
1 2015-07-30 11:05:22

解决方案6
0 2015-07-30 10:45:40

解决方案7
0 2015-07-30 10:47:53

解决方案8
0 2015-07-30 12:16:25

如何在Python中比较2个txt文件

问题描述

8 个解决方案

解决方案1 3 已采纳 2015-07-30 10:46:06

解决方案2 1 2015-07-30 10:41:53

解决方案3 1 2015-07-30 10:44:37

解决方案4 1 2015-07-30 10:57:03

解决方案5 1 2015-07-30 11:05:22

解决方案6 0 2015-07-30 10:45:40

解决方案7 0 2015-07-30 10:47:53

解决方案8 0 2015-07-30 12:16:25

解决方案1
3 已采纳 2015-07-30 10:46:06

解决方案2
1 2015-07-30 10:41:53

解决方案3
1 2015-07-30 10:44:37

解决方案4
1 2015-07-30 10:57:03

解决方案5
1 2015-07-30 11:05:22

解决方案6
0 2015-07-30 10:45:40

解决方案7
0 2015-07-30 10:47:53

解决方案8
0 2015-07-30 12:16:25