简体   繁体   中英

How to delete same words from different text file using python?

I am taking data from two text files , and compare them , if data in file1 is also in file2 , than it should delete data from file1

import sys
File1 = open("file1.txt")
File2 = open("file2.txt")
for lines in File1:
    for line in File2:
        for lines in line:
            print lines
you to
you too

you to

My program show the words which are in file2 , how i can delete entries from file1 which exists in file2 ?

You can use the fileinput module with inplace=True and load the second file into a set for lookup purposes...

import fileinput

with open('file2.txt') as fin:
    exclude = set(line.rstrip() for line in fin)

for line in fileinput.input('file1.txt', inplace=True):
    if line.rstrip() not in exclude:
        print line,

You could do somthing like that:

file2 = open('file2.txt').readlines()
with open('result.txt', 'w') as result:
    for line in open('file1.txt'):
        if line not in file2:

It won't modify "file1.txt" but instead create another file "result.txt" containing lines of file1 that are not in file2.

import string

file1 = set(map(string.rstrip, open("f1").readlines()))
file2 = set(map(string.rstrip, open("f2").readlines()))

print ( file1 - file2 ) | file2


set(['other great lines of text', 'from a file', 'against.', 'keep text', 'which is wanted to compare', 'This is', 'some text', 'keep is wanted to compare'])


This is 
keep text
from a file 
keep is wanted to compare


This is 
some text
from a file 
which is wanted to compare
other great lines of text

There's an issue with keeping order which may or may not be an issue.

If file2 fits in memory; you could use set() to avoid O(n) lookups for each line:

with open('file2.txt') as file2:
    entries = set(file2.read().splitlines())

with open('file1.txt') as file1, open('output.txt', 'w') as outfile:
    outfile.writelines(line for line in file1
                       if line.rstrip("\n") not in entries)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM