简体   繁体   中英

How to delete same words from different text file using python?

I am taking data from two text files , and compare them , if data in file1 is also in file2 , than it should delete data from file1

import sys
File1 = open("file1.txt")
File2 = open("file2.txt")
for lines in File1:
    for line in File2:
        for lines in line:
            print lines
File1
you
you to
you too
why
toh

File2
you
you to

My program show the words which are in file2 , how i can delete entries from file1 which exists in file2 ?

You can use the fileinput module with inplace=True and load the second file into a set for lookup purposes...

import fileinput

with open('file2.txt') as fin:
    exclude = set(line.rstrip() for line in fin)

for line in fileinput.input('file1.txt', inplace=True):
    if line.rstrip() not in exclude:
        print line,

You could do somthing like that:

file2 = open('file2.txt').readlines()
with open('result.txt', 'w') as result:
    for line in open('file1.txt'):
        if line not in file2:
            result.write(line)

It won't modify "file1.txt" but instead create another file "result.txt" containing lines of file1 that are not in file2.

import string

file1 = set(map(string.rstrip, open("f1").readlines()))
file2 = set(map(string.rstrip, open("f2").readlines()))

print ( file1 - file2 ) | file2

gives

set(['other great lines of text', 'from a file', 'against.', 'keep text', 'which is wanted to compare', 'This is', 'some text', 'keep is wanted to compare'])

f1

This is 
keep text
from a file 
keep is wanted to compare
against.

f2

This is 
some text
from a file 
which is wanted to compare
against.
other great lines of text

There's an issue with keeping order which may or may not be an issue.

If file2 fits in memory; you could use set() to avoid O(n) lookups for each line:

with open('file2.txt') as file2:
    entries = set(file2.read().splitlines())

with open('file1.txt') as file1, open('output.txt', 'w') as outfile:
    outfile.writelines(line for line in file1
                       if line.rstrip("\n") not in entries)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM