简体   繁体   中英

Find indexes of items in list of string in an string with Python

I'm looking for a fast approach to find all the indexes in string which match with items (one or multiple words). Actually I do not need index in list I need index in string.

I have a list of words and an string like these:

words = ['must', 'shall', 'may','should','forbidden','car',...]
string= 'you should wash the car every day'

desired output:
[1,4]# should=1, car=4

The length of list some times can be more than hundreds of items and string more that tens of thousands.

I'm looking for a so fast approach because it is called a thousand times in each iteration.

I know how to implement it with loops and check all the items one-by-one but it's so slow!

One solution is make words set instead of list and then do simple list comprehension:

words = {'must', 'shall', 'may','should','forbidden','car'}
string= 'you should wash the car every day'

out = [i for i, w in enumerate(string.split()) if w in words]

print(out)

Prints:

[1, 4]

You need the Aho Corasick algorithm to this.

Given a set of strings and a text, it finds occurrences of all strings from the set in the given text in O(len+ans) , where len is the length of the text and ans is the size of the answer.

It uses an automaton and can be modified to suit your needs.

You can use dictionaries time complexity for look up dictionary is O(1)

string = 'you should wash the car every day'

wordToIndex = {word: index for index, word in enumerate(string.split())}

words = ['must', 'shall', 'may','should','forbidden','car']

result = [wordToIndex[word] for word in words if word in wordToIndex]

# [1,4]

Use list comprehension,

print([string.split().index(i) for i in string.split() if i in words]) 
#[1,4]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM