简体   繁体   中英

Python removing duplicates and writing them to a new file

I want to remove duplicate lines from a text file and write two new text files: 1 output file without duplicates, and another file that contains the lines which are duplicated in my original file.

import re
import sys

lines_seen = set() # holds lines already seen
lines_seen.clear()
dups=open("dups.txt", "w")
outfile = open("out.txt", "w")
for line in open("input.txt", "r"):
    if line not in lines_seen: # not a duplicate
        outfile.write(line)
        lines_seen.add(line)

    else:

        dups.write(line)
lines_seen.clear()
outfile.close()
dups.close()

The output file is smaller than the original, which means that there are lines removed; however the duplicated file is empty, no duplicate lines are written.

Because you're clearing the dups file and writing to it again, you need to append to it instead:

dups=open("dups.txt", "a")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM