简体   繁体   English

如何使用python从不同的文本文件中删除相同的单词?

[英]How to delete same words from different text file using python?

I am taking data from two text files , and compare them , if data in file1 is also in file2 , than it should delete data from file1 我要从两个文本文件中获取数据,并进行比较,如果file1中的数据也位于file2中,则应该从file1中删除数据

import sys
File1 = open("file1.txt")
File2 = open("file2.txt")
for lines in File1:
    for line in File2:
        for lines in line:
            print lines
File1
you
you to
you too
why
toh

File2
you
you to

My program show the words which are in file2 , how i can delete entries from file1 which exists in file2 ? 我的程序显示file2中的单词,我如何从file2中存在的file1中删除条目?

You can use the fileinput module with inplace=True and load the second file into a set for lookup purposes... 您可以将fileinput模块与fileinput inplace=True一起使用, fileinput第二个文件加载到set以进行查找...

import fileinput

with open('file2.txt') as fin:
    exclude = set(line.rstrip() for line in fin)

for line in fileinput.input('file1.txt', inplace=True):
    if line.rstrip() not in exclude:
        print line,

You could do somthing like that: 您可以这样做:

file2 = open('file2.txt').readlines()
with open('result.txt', 'w') as result:
    for line in open('file1.txt'):
        if line not in file2:
            result.write(line)

It won't modify "file1.txt" but instead create another file "result.txt" containing lines of file1 that are not in file2. 它不会修改“ file1.txt”,而是创建另一个文件“ result.txt”,其中包含不在file2中的file1行。

import string

file1 = set(map(string.rstrip, open("f1").readlines()))
file2 = set(map(string.rstrip, open("f2").readlines()))

print ( file1 - file2 ) | file2

gives

set(['other great lines of text', 'from a file', 'against.', 'keep text', 'which is wanted to compare', 'This is', 'some text', 'keep is wanted to compare'])

f1 f1

This is 
keep text
from a file 
keep is wanted to compare
against.

f2 f2

This is 
some text
from a file 
which is wanted to compare
against.
other great lines of text

There's an issue with keeping order which may or may not be an issue. 保持秩序存在问题,这可能是问题,也可能不是问题。

If file2 fits in memory; 如果file2适合内存; you could use set() to avoid O(n) lookups for each line: 您可以使用set()避免对每行进行O(n)查找:

with open('file2.txt') as file2:
    entries = set(file2.read().splitlines())

with open('file1.txt') as file1, open('output.txt', 'w') as outfile:
    outfile.writelines(line for line in file1
                       if line.rstrip("\n") not in entries)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM