[英]How can I find a first occurrence of a string from a list in another string in python
I have a list of strings (about 100), and I want to find the first occurence of one of them in another string and the index in which it occurred. 我有一个字符串列表(大约100个),我想找到其中一个在另一个字符串中的第一次出现以及出现的索引。
I keep the index, and afterwords search again using another word list from that index on, and back to the first list until it reaches the end of the string. 我保留索引,然后使用该索引上的另一个单词列表再次搜索后单词,然后返回第一个列表,直到到达字符串末尾。
My current code (that searches for the first occurrence) looks like: 我当前的代码(用于搜索第一个匹配项)如下所示:
def findFirstOccurence(wordList, bigString, startIndex):
substrIndex = sys.maxint
for word in wordList:
tempIndex = bigString.find(word, startIndex)
if tempIndex < substrIndex and tempIndex != -1:
substrIndex = tempIndex
return substrIndex
This codes does the job, but takes a lot of time (I run it several times for the same word lists but in 100 big strings (about ~10K-20K words each). 这段代码可以完成工作,但是要花很多时间(我对相同的单词列表运行了几次,但是使用了100个大字符串(每个单词约10K-20K个单词)。
I am sure there's a better way (and a more pythonic way to do so). 我敢肯定,有更好的方法(和更Python化的方法)。
This seems work well and tells you what word it found (although that could be left out): 这似乎工作得很好,并告诉您找到了什么单词(尽管可以忽略):
words = 'a big red dog car woman mountain are the ditch'.split()
sentence = 'her smooth lips reminded me of the front of a big red car lying in the ditch'
from sys import maxint
def find(word, sentence):
try:
return sentence.index(word), word
except ValueError:
return maxint, None
print min(find(word, sentence) for word in words)
A one liner with list comprehension would be 具有清单理解力的班轮是
return min([index for index in [bigString.find(word, startIndex) for word in wordList] if index != -1])
But I would argue if you split it into two lines its more readable 但是我认为如果将它分成两行更易读
indexes = [bigString.find(word, startIndex) for word in wordList]
return min([index for index in indexes if index != -1])
import re
def findFirstOccurence(wordList, bigString, startIndex=0):
return re.search('|'.join(wordList), bigString[startIndex:]).start()
wordList = ['hello', 'world']
bigString = '1 2 3 world'
print findFirstOccurence(wordList, bigString)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.