简体   繁体   中英

Fast way to find if list of words contains at least one word that starts with certain letters (not "find ALL words"!)

I have set (not list) of strings (words). It is a big one. (It's ripped out of images with openCV and tesseract so there's no reliable way to predict its contents.)

At some point of working with this list I need to find out if it contains at least one word that begins with part I'm currently processing. So it's like (NOT an actual code):

if exists(word.startswith(word_part) in word_set) then continue else break

There is a very good answer on how to find all strings in list that start with something here :

result = [s for s in string_list if s.startswith(lookup)]

or

result = filter(lambda s: s.startswith(lookup), string_list)

But they return list or iterator of all strings found. I only need to find if any such string exists within set, not get them all. Performance-wise it seems kinda stupid to get list, then get its len and see if it's more than zero and then just drop that list.

It there a better / faster / cleaner way?

Your pseudocode is very close to real code!

if any(word.startswith(word_part) for word in word_set):
    continue
else:
    break

any returns as soon as it finds one true element, so it's efficient.

You need yield :

def find_word(word_set, letter):
    for word in word_set:
        if word.startswith(letter):
            yield word
    yield None
if next(find_word(word_set, letter)): print('word exists')

Yield gives out words lazily. So if you call it once, it will give out only one word.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM