简体   繁体   English

Python,如何从另一个文本文件中的文本中删除文本?

[英]Python, How do I Remove Text From a Text From That's in Another Text File?

Kind of hard to explain, but I have in a script a text file that will have a bunch of characters. 有点难以解释,但我在脚本中有一个文本文件,其中包含一堆字符。 I also will have another master record of everything. 我还将拥有另一个主记录。 I want to take the first file and remove everything that is matching with the master record. 我想获取第一个文件并删除与主记录匹配的所有内容。 Some entries will eventually not appear in the first file at all. 某些条目最终不会出现在第一个文件中。 Here's kind of an example: 这是一个例子:

First file: 第一档:

Cow 
Duck
Sheep

Master Record: 主记录:

Duck
Sheep 
Cat
Dog

Any help is appreciated! 任何帮助表示赞赏!

Read though the master file and put the lines into a set, then compare the lines in the second file to the words in the master set: 读取主文件并将行放入集合,然后将第二个文件中的行与主集中的单词进行比较:

Code: 码:

# read in the master file and put each line into a set
with open('master') as f:
    master = {w.strip() for w in f.readlines()}

# read through the second file and keep each line not in master
with open('file1') as f:
    allowed = [w.strip() for w in f.readlines() if w.strip() not in master]

# show the allowed lines
for w in allowed:
    print(w)

Try this (assuming both your lists are files): 试试这个(假设你的列表都是文件):

master = open('master.txt', 'r').read()
f = open('file.txt', 'r').read()
f_arr = f.split('\n')
master_arr = master.split('\n')
fin_arr = []
for i in range(len(f_arr)):
    if not f_arr[i] in master_arr:
         fin_arr.append(f_arr[i])
final = '\n'.join(fin_arr)

Note, this does not include the file reading/writing. 注意,这不包括文件读/写。

The data: 数据:

file = """
cow
duck
sheep
"""

master_record = """
duck
sheep
cat
dog
"""

Now for the one-liner list comprehension no one wants to look at: 现在对于单线列表理解,没有人想看:

print([i for i in [x for x in file.replace('\n', ' ').split(' ') if x in master_record.replace('\n', ' ').split(' ')] if i])

That will return a list of all the words in file that also appear in the master record. 这将返回文件中也出现在主记录中的所有单词的列表。

Splitting it up: 拆分:

found = []

# Loop through ever word in `file`, replacing newlines with spaces,
for word in file.replace('\n', ' ').split(' '):
    # Check if the word is in the master file,
    if word in master_record.replace('\n', ' ').split(' '):
        # Make sure the word contains something,
        if word:
            # Add this word to found,
            found += [word]

# Print what we found,
print(found)

Hope this helps! 希望这可以帮助!

-Coolq -Coolq

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM