简体   繁体   English

如何编写第一个文本文件中不存在的第二行文本中的行?

[英]How can I write the lines from the first text file that are not present in the second text file?

I would like to compare two text files. 我想比较两个文本文件。 The first text file has lines that aren't in the second text file. 第一个文本文件中的行不在第二个文本文件中。 I would like to copy these lines and write them to a new txt file. 我想复制这些行并将它们写到新的txt文件中。 I would like a Python script for this as I do this a lot and do not want to go online constantly to find these new lines. 我想要一个Python脚本,因为我经常这样做,并且不想经常上网查找这些新行。 I do not need to acknowledge if there is something in file2 that is not in file1. 我不需要确认file2中是否有一些不在file1中的东西。

I have wrote some code that seems to work inconsistently. 我写了一些似乎不一致的代码。 I am unsure what I am doing wrong. 我不确定自己在做什么错。

newLines = open("file1.txt", "r")
originalLines = open("file2.txt", "r")
output = open("output.txt", "w")

lines1 = newLines.readlines()
lines2 = originalLines.readlines()
newLines.close()
originalLines.close()

duplicate = False
for line in lines1:
    if line.isspace():
        continue
    for line2 in lines2:
        if line == line2:
            duplicate = True
            break

    if duplicate == False:
        output.write(line)
    else:
        duplicate = False

output.close()

For file1.txt: 对于file1.txt:

Man
Dog
Axe
Cat
Potato
Farmer

file2.txt: file2.txt:

Man
Dog
Axe
Cat

The output.txt should be: output.txt应该是:

Potato
Farmer

but it is instead this: 而是这样的:

Cat
Potato
Farmer

Any help would be much appreciated! 任何帮助将非常感激!

Based on behavior, file2.txt doesn't end with a newline, so the contents of lines2 is ['Man\\n', 'Dog\\n', 'Axe\\n', 'Cat'] . 基于行为, file2.txt不以回车结束,所以内容lines2['Man\\n', 'Dog\\n', 'Axe\\n', 'Cat'] Note the lack of a newline for 'Cat' . 注意缺少'Cat'的换行符。

I'd suggest normalizing your lines so they don't have newlines, replacing: 我建议对您的行进行规范化,以便它们没有换行符,而替换为:

lines1 = newLines.readlines()
lines2 = originalLines.readlines()

with: 与:

lines1 = [line.rstrip('\n') for line in newLines]
# Set comprehension makes lookup cheaper and dedupes
lines2 = {line.rstrip('\n') for line in originalLines}

and changing: 并更改:

output.write(line)

to: 至:

print(line, file=output)

which will add the newline for you. 它将为您添加换行符。 Really, the best solution is to avoid the inner loop entirely, changing all of this: 确实,最好的解决方案是完全避免内部循环,更改所有这些内容:

for line2 in lines2:
    if line == line2:
        duplicate = True
        break

if duplicate == False:
    output.write(line)
else:
    duplicate = False

to just: 只是:

if line not in lines2:
    print(line, file=output)

which, if you use a set for lines2 as I suggest, makes the cost of the test drop from linear in the number of lines in file2.txt to roughly constant no matter the size of file2.txt (as long as the set of unique lines can fit in memory at all). 如果您按照我的建议对lines2使用一set ,那么无论file2.txt的大小如何,测试的成本都会从file2.txt的行数线性file2.txt到大致恒定(只要这组唯一)行可以完全放在内存中)。

Even better, use with statements for your open files, and stream file1.txt rather than holding it in memory at all, and you end up with: 更好的是,对打开的文件使用with语句,并流file1.txt而不是完全将其保存在内存中,最终结果是:

with open("file2.txt") as origlines:
    lines2 = {line.rstrip('\n') for line in origlines}

with open("file1.txt") as newlines, open("output.txt", "w") as output:
    for line in newlines:
        line = line.rstrip('\n')
        if not line.isspace() and line not in lines2:
            print(line, file=output)

You can use numpy for smaller and faster solution. 您可以将numpy用于更小,更快的解决方案。 Here we are using these numpy methods np.loadtxt Docs: https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html np.setdiff1d Docs: https://docs.scipy.org/doc/numpy-1.14.5/reference/generated/numpy.setdiff1d.html np.savetxt Docs: https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html 在这里,我们使用以下numpy方法np.loadtxt文档: https: //docs.scipy.org/doc/numpy/reference/produced/numpy.loadtxt.html np.setdiff1d文档: https : //docs.scipy.org/ doc / numpy-1.14.5 / reference / generated / numpy.setdiff1d.html np.savetxt文件: https : //docs.scipy.org/doc/numpy/reference/generation/numpy.savetxt.html

import numpy as np


arr=np.setdiff1d(np.loadtxt('file1.txt',dtype=str),np.loadtxt('file2.txt',dtype=str))
np.savetxt('output.txt',b,fmt='%s')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较两个文本文件,替换第一个文件中包含第二个文件中的字符串的行 - Compare two text files, replace lines in first file that contain a string from lines in second file 如何编写 python 从名为“file1.txt”的文本文件中读取前两行 将从“file1.txt”读取的两行写入新文件“file2.txt” - How write python to Read the first two lines from a text file named "file1.txt" Write the two lines read from "file1.txt" to a new file "file2.txt" 将文本文件中的行写入.csv文件 - Write lines from text file into .csv file 如何使用变量将文本行写入文件? - How do I write text lines to file with variables to a file? 如何从文件的所有行中提取部分文本? - How can I extract a portion of text from all lines of a file? 如何读取一个文本文件的前N行并将其写入另一个文本文件? - How to read first N lines of a text file and write it to another text file? 当文件达到一定数量的行时,如何删除文本文件中的第一行? - How can I delete the first line in a text file when the file reaches a certain number of lines? 如何从文本文件中的不同行写入和读取? - How to write and read from different lines in a text file? 如何将文本的特定行写入新文件? - How to write specific lines of a text to a new file? 如何在保留旧行的同时在文本文件中写新行 - How can I write on new lines in a text file while preserving old ones
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM