简体   繁体   English

使用 Python 在字符串中的字符串列表中查找项目的索引

[英]Find indexes of items in list of string in an string with Python

I'm looking for a fast approach to find all the indexes in string which match with items (one or multiple words).我正在寻找一种快速的方法来查找字符串中与项目(一个或多个单词)匹配的所有索引。 Actually I do not need index in list I need index in string.实际上我不需要列表中的索引我需要字符串中的索引。

I have a list of words and an string like these:我有一个单词列表和一个像这样的字符串:

words = ['must', 'shall', 'may','should','forbidden','car',...]
string= 'you should wash the car every day'

desired output:
[1,4]# should=1, car=4

The length of list some times can be more than hundreds of items and string more that tens of thousands.有时列表的长度可能超过数百个项目和字符串超过数万。

I'm looking for a so fast approach because it is called a thousand times in each iteration.我正在寻找一种如此快速的方法,因为它在每次迭代中被调用一千次。

I know how to implement it with loops and check all the items one-by-one but it's so slow!我知道如何用循环来实现它并一个一个地检查所有项目,但它太慢了!

One solution is make words set instead of list and then do simple list comprehension:一种解决方案是set words而不是list ,然后进行简单的列表理解:

words = {'must', 'shall', 'may','should','forbidden','car'}
string= 'you should wash the car every day'

out = [i for i, w in enumerate(string.split()) if w in words]

print(out)

Prints:印刷:

[1, 4]

You need the Aho Corasick algorithm to this.为此,您需要Aho Corasick算法。

Given a set of strings and a text, it finds occurrences of all strings from the set in the given text in O(len+ans) , where len is the length of the text and ans is the size of the answer.给定一组字符串和一个文本,它会在O(len+ans)的给定文本中找到该集合中所有字符串的出现,其中len是文本的长度,而ans是答案的大小。

It uses an automaton and can be modified to suit your needs.它使用自动机,可以根据您的需要进行修改。

You can use dictionaries time complexity for look up dictionary is O(1)您可以使用字典查找字典的时间复杂度为 O(1)

string = 'you should wash the car every day'

wordToIndex = {word: index for index, word in enumerate(string.split())}

words = ['must', 'shall', 'may','should','forbidden','car']

result = [wordToIndex[word] for word in words if word in wordToIndex]

# [1,4]

Use list comprehension,使用列表理解,

print([string.split().index(i) for i in string.split() if i in words]) 
#[1,4]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM