简体   繁体   English

检查字符串是否在巨大文本文件中的快速方法

[英]Fast way to check if a string is in a huge text file

I'm looking for an easy way to check if all the strings that are in a list are in a huge text file (>35.000 words).我正在寻找一种简单的方法来检查列表中的所有字符串是否都在一个巨大的文本文件中(> 35.000 字)。

self.vierkant = ['BIT', 'ICE', 'TEN']


def geldig(self, file):
    self.file = file
    file = open(self.file, 'r')
    line = file.readline()
    self.file = ''

    while line:
        line = line.strip('\n')
        self.file += line
        line = file.readline()

    return len([woord for woord in self.vierkant if woord.lower() not in self.file]) == 0

I just copy the text file into self.file, then check if all words from self.vierkant are in self.file.我只是将文本文件复制到 self.file 中,然后检查 self.vierkant 中的所有单词是否都在 self.file 中。

The main problem is that it takes a very long time to read in the text file.主要问题是读取文本文件需要很长时间。 Is there an easier/faster way to do this?有没有更简单/更快的方法来做到这一点?

You can read the entire contents of a file with file.read() instead of calling readline() repeatedly and concatenating the result:您可以使用file.read()读取文件的全部内容,而不是重复调用readline()并连接结果:

with open(self.file) as f:
    self.file = f.read()

If you need to check a lot of words, you could also build a set from the file's contents for O(1) containment checks.如果您需要检查大量单词,您还可以从文件的内容中构建一个用于 O(1) 包含检查的集合

with open('a.txt') as f:
    s = set(f.read().splitlines())  # splitlines will remove the '\n' in the end and return a list of line.
for line in test_lines:
    line in s  # O(1) check if the the line in the line-set

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM