简体   繁体   English

在另一个主文本文件中搜索每个文本文件中的单词,如果使用python在主文件中找不到,则追加

[英]search each text file words in another main text file and append if not found in main file using python

I need help on python code on below scenario. 在以下情况下,我需要有关python代码的帮助。

I have two text files. 我有两个文本文件。 one main file and one list file. 一个主文件和一个列表文件。 Main file contains many words which i need to update when i found new word from list file. 主文件包含许多单词,当我从列表文件中找到新单词时需要更新。

I need to search each word of list file in main file. 我需要在主文件中搜索列表文件的每个单词。 if any word not found in main file then i need to append that new word in main file. 如果在主文件中找不到任何单词,那么我需要在主文件中附加该新单词。

i have code which will update file if string not found. 我有代码,如果找不到字符串,它将更新文件。 but, i need to search each word from text file. 但是,我需要从文本文件中搜索每个单词。

Main_File = "file path"
list_file="file path"

with open("Main_File", "r+") as file:
for line in file:
    if needle in line:
       break
else: # not found, we are at the eof
    file.write(needle) # append missing data
#this code will append if specific word not found in file.. but,i need to search each word from another file.

if word on your main file can be loaded in memory then you can load the words in set and check if the word is in main file like shown in sudo code below 如果可以将主文件中的单词加载到内存中,则可以加载set中的单词,并检查该单词是否在主文件中,如下面的sudo代码所示

main_file_words = set("load words from your main file".split())

list_file = # read list file
for word in list_file:
    if word not in main_file_words:
        main_file_words.add(word)
        list_file.write(word)

You could load the mainFile with mmap and search for the words from list file as follows: 您可以使用mmap加载mainFile并从列表文件中搜索单词,如下所示:

import mmap

mainFilePath= "mainFile.txt"
listFilePath= "listFile.txt"
newWords=[]

# open main file with mmap
with open(mainFilePath, 'r') as mainFile:
    mainFileMmap = mmap.mmap(mainFile.fileno(), 0 , access=mmap.ACCESS_READ)

    # open list file and search for words in main file with mmap.find()
    with open(listFilePath, 'r') as listFile:
        for line in listFile:
            line= line.replace("\r", "").replace("\n", "") # remove line-feeds (quick and dirty)
            if mainFileMmap.find(line.encode()) == -1:
                newWords.append(line)

# append new words to main file
with open(mainFilePath, 'a') as mainFile:
    for newWord in set(newWords):
        mainFile.write("\n{}".format(newWord))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM