Python 删除重复项并将它们写入新文件

Question

I want to remove duplicate lines from a text file and write two new text files: 1 output file without duplicates, and another file that contains the lines which are duplicated in my original file.我想从文本文件中删除重复的行并编写两个新的文本文件：1 output 文件没有重复，另一个文件包含在我的原始文件中重复的行。

import re
import sys

lines_seen = set() # holds lines already seen
lines_seen.clear()
dups=open("dups.txt", "w")
outfile = open("out.txt", "w")
for line in open("input.txt", "r"):
    if line not in lines_seen: # not a duplicate
        outfile.write(line)
        lines_seen.add(line)

    else:

        dups.write(line)
lines_seen.clear()
outfile.close()
dups.close()

The output file is smaller than the original, which means that there are lines removed; output文件比原来的要小，说明有行被去掉了； however the duplicated file is empty, no duplicate lines are written.但是重复的文件是空的，没有重复的行被写入。

Answer 1

Because you're clearing the dups file and writing to it again, you need to append to it instead:因为您正在清除 dups 文件并再次写入它，所以您需要将 append 改为：

dups=open("dups.txt", "a")

Python 删除重复项并将它们写入新文件

问题描述

1 个解决方案

解决方案1
0 2020-05-13 11:30:00

Python 删除重复项并将它们写入新文件

问题描述

1 个解决方案

解决方案1 0 2020-05-13 11:30:00

解决方案1
0 2020-05-13 11:30:00