简体   繁体   中英

Fast way to check if a string is in a huge text file

I'm looking for an easy way to check if all the strings that are in a list are in a huge text file (>35.000 words).

self.vierkant = ['BIT', 'ICE', 'TEN']


def geldig(self, file):
    self.file = file
    file = open(self.file, 'r')
    line = file.readline()
    self.file = ''

    while line:
        line = line.strip('\n')
        self.file += line
        line = file.readline()

    return len([woord for woord in self.vierkant if woord.lower() not in self.file]) == 0

I just copy the text file into self.file, then check if all words from self.vierkant are in self.file.

The main problem is that it takes a very long time to read in the text file. Is there an easier/faster way to do this?

You can read the entire contents of a file with file.read() instead of calling readline() repeatedly and concatenating the result:

with open(self.file) as f:
    self.file = f.read()

If you need to check a lot of words, you could also build a set from the file's contents for O(1) containment checks.

with open('a.txt') as f:
    s = set(f.read().splitlines())  # splitlines will remove the '\n' in the end and return a list of line.
for line in test_lines:
    line in s  # O(1) check if the the line in the line-set

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM