如何編寫第一個文本文件中不存在的第二行文本中的行？

Question

我想比較兩個文本文件。 第一個文本文件中的行不在第二個文本文件中。 我想復制這些行並將它們寫到新的txt文件中。 我想要一個Python腳本，因為我經常這樣做，並且不想經常上網查找這些新行。 我不需要確認file2中是否有一些不在file1中的東西。

我寫了一些似乎不一致的代碼。 我不確定自己在做什么錯。

newLines = open("file1.txt", "r")
originalLines = open("file2.txt", "r")
output = open("output.txt", "w")

lines1 = newLines.readlines()
lines2 = originalLines.readlines()
newLines.close()
originalLines.close()

duplicate = False
for line in lines1:
    if line.isspace():
        continue
    for line2 in lines2:
        if line == line2:
            duplicate = True
            break

    if duplicate == False:
        output.write(line)
    else:
        duplicate = False

output.close()

對於file1.txt：

Man
Dog
Axe
Cat
Potato
Farmer

file2.txt：

Man
Dog
Axe
Cat

output.txt應該是：

Potato
Farmer

而是這樣的：

Cat
Potato
Farmer

任何幫助將非常感激！

Answer 1

基於行為， file2.txt不以回車結束，所以內容lines2為['Man\\n', 'Dog\\n', 'Axe\\n', 'Cat'] 注意缺少'Cat'的換行符。

我建議對您的行進行規范化，以便它們沒有換行符，而替換為：

lines1 = newLines.readlines()
lines2 = originalLines.readlines()

與：

lines1 = [line.rstrip('\n') for line in newLines]
# Set comprehension makes lookup cheaper and dedupes
lines2 = {line.rstrip('\n') for line in originalLines}

並更改：

output.write(line)

至：

print(line, file=output)

它將為您添加換行符。 確實，最好的解決方案是完全避免內部循環，更改所有這些內容：

for line2 in lines2:
    if line == line2:
        duplicate = True
        break

if duplicate == False:
    output.write(line)
else:
    duplicate = False

只是：

if line not in lines2:
    print(line, file=output)

如果您按照我的建議對lines2使用一set ，那么無論file2.txt的大小如何，測試的成本都會從file2.txt的行數線性file2.txt到大致恆定（只要這組唯一）行可以完全放在內存中）。

更好的是，對打開的文件使用with語句，並流file1.txt而不是完全將其保存在內存中，最終結果是：

with open("file2.txt") as origlines:
    lines2 = {line.rstrip('\n') for line in origlines}

with open("file1.txt") as newlines, open("output.txt", "w") as output:
    for line in newlines:
        line = line.rstrip('\n')
        if not line.isspace() and line not in lines2:
            print(line, file=output)

Answer 2

您可以將numpy用於更小，更快的解決方案。 在這里，我們使用以下numpy方法np.loadtxt文檔： https: //docs.scipy.org/doc/numpy/reference/produced/numpy.loadtxt.html np.setdiff1d文檔： https : //docs.scipy.org/ doc / numpy-1.14.5 / reference / generated / numpy.setdiff1d.html np.savetxt文件： https : //docs.scipy.org/doc/numpy/reference/generation/numpy.savetxt.html

import numpy as np


arr=np.setdiff1d(np.loadtxt('file1.txt',dtype=str),np.loadtxt('file2.txt',dtype=str))
np.savetxt('output.txt',b,fmt='%s')

如何編寫第一個文本文件中不存在的第二行文本中的行？

問題描述

2 個解決方案

解決方案1
2 2019-04-02 01:45:14

解決方案2
0 2019-04-02 01:50:14

如何編寫第一個文本文件中不存在的第二行文本中的行？

問題描述

2 個解決方案

解決方案1 2 2019-04-02 01:45:14

解決方案2 0 2019-04-02 01:50:14

解決方案1
2 2019-04-02 01:45:14

解決方案2
0 2019-04-02 01:50:14