如何编写第一个文本文件中不存在的第二行文本中的行？

Question

I would like to compare two text files. 我想比较两个文本文件。 The first text file has lines that aren't in the second text file. 第一个文本文件中的行不在第二个文本文件中。 I would like to copy these lines and write them to a new txt file. 我想复制这些行并将它们写到新的txt文件中。 I would like a Python script for this as I do this a lot and do not want to go online constantly to find these new lines. 我想要一个Python脚本，因为我经常这样做，并且不想经常上网查找这些新行。 I do not need to acknowledge if there is something in file2 that is not in file1. 我不需要确认file2中是否有一些不在file1中的东西。

I have wrote some code that seems to work inconsistently. 我写了一些似乎不一致的代码。 I am unsure what I am doing wrong. 我不确定自己在做什么错。

newLines = open("file1.txt", "r")
originalLines = open("file2.txt", "r")
output = open("output.txt", "w")

lines1 = newLines.readlines()
lines2 = originalLines.readlines()
newLines.close()
originalLines.close()

duplicate = False
for line in lines1:
    if line.isspace():
        continue
    for line2 in lines2:
        if line == line2:
            duplicate = True
            break

    if duplicate == False:
        output.write(line)
    else:
        duplicate = False

output.close()

For file1.txt: 对于file1.txt：

Man
Dog
Axe
Cat
Potato
Farmer

file2.txt: file2.txt：

Man
Dog
Axe
Cat

The output.txt should be: output.txt应该是：

Potato
Farmer

but it is instead this: 而是这样的：

Cat
Potato
Farmer

Any help would be much appreciated! 任何帮助将非常感激！

Answer 1

Based on behavior, file2.txt doesn't end with a newline, so the contents of lines2 is ['Man\\n', 'Dog\\n', 'Axe\\n', 'Cat'] . 基于行为， file2.txt不以回车结束，所以内容lines2为['Man\\n', 'Dog\\n', 'Axe\\n', 'Cat'] Note the lack of a newline for 'Cat' . 注意缺少'Cat'的换行符。

I'd suggest normalizing your lines so they don't have newlines, replacing: 我建议对您的行进行规范化，以便它们没有换行符，而替换为：

lines1 = newLines.readlines()
lines2 = originalLines.readlines()

with: 与：

lines1 = [line.rstrip('\n') for line in newLines]
# Set comprehension makes lookup cheaper and dedupes
lines2 = {line.rstrip('\n') for line in originalLines}

and changing: 并更改：

output.write(line)

to: 至：

print(line, file=output)

which will add the newline for you. 它将为您添加换行符。 Really, the best solution is to avoid the inner loop entirely, changing all of this: 确实，最好的解决方案是完全避免内部循环，更改所有这些内容：

for line2 in lines2:
    if line == line2:
        duplicate = True
        break

if duplicate == False:
    output.write(line)
else:
    duplicate = False

to just: 只是：

if line not in lines2:
    print(line, file=output)

which, if you use a set for lines2 as I suggest, makes the cost of the test drop from linear in the number of lines in file2.txt to roughly constant no matter the size of file2.txt (as long as the set of unique lines can fit in memory at all). 如果您按照我的建议对lines2使用一set ，那么无论file2.txt的大小如何，测试的成本都会从file2.txt的行数线性file2.txt到大致恒定（只要这组唯一）行可以完全放在内存中）。

Even better, use with statements for your open files, and stream file1.txt rather than holding it in memory at all, and you end up with: 更好的是，对打开的文件使用with语句，并流file1.txt而不是完全将其保存在内存中，最终结果是：

with open("file2.txt") as origlines:
    lines2 = {line.rstrip('\n') for line in origlines}

with open("file1.txt") as newlines, open("output.txt", "w") as output:
    for line in newlines:
        line = line.rstrip('\n')
        if not line.isspace() and line not in lines2:
            print(line, file=output)

Answer 2

You can use numpy for smaller and faster solution. 您可以将numpy用于更小，更快的解决方案。 Here we are using these numpy methods np.loadtxt Docs: https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html np.setdiff1d Docs: https://docs.scipy.org/doc/numpy-1.14.5/reference/generated/numpy.setdiff1d.html np.savetxt Docs: https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html 在这里，我们使用以下numpy方法np.loadtxt文档： https: //docs.scipy.org/doc/numpy/reference/produced/numpy.loadtxt.html np.setdiff1d文档： https : //docs.scipy.org/ doc / numpy-1.14.5 / reference / generated / numpy.setdiff1d.html np.savetxt文件： https : //docs.scipy.org/doc/numpy/reference/generation/numpy.savetxt.html

import numpy as np


arr=np.setdiff1d(np.loadtxt('file1.txt',dtype=str),np.loadtxt('file2.txt',dtype=str))
np.savetxt('output.txt',b,fmt='%s')

如何编写第一个文本文件中不存在的第二行文本中的行？

问题描述

2 个解决方案

解决方案1
2 2019-04-02 01:45:14

解决方案2
0 2019-04-02 01:50:14

如何编写第一个文本文件中不存在的第二行文本中的行？

问题描述

2 个解决方案

解决方案1 2 2019-04-02 01:45:14

解决方案2 0 2019-04-02 01:50:14

解决方案1
2 2019-04-02 01:45:14

解决方案2
0 2019-04-02 01:50:14