简体   繁体   中英

How can I find a first occurrence of a string from a list in another string in python

I have a list of strings (about 100), and I want to find the first occurence of one of them in another string and the index in which it occurred.

I keep the index, and afterwords search again using another word list from that index on, and back to the first list until it reaches the end of the string.

My current code (that searches for the first occurrence) looks like:

        def findFirstOccurence(wordList, bigString, startIndex):
            substrIndex = sys.maxint
            for word in wordList:
                tempIndex = bigString.find(word, startIndex)
                if tempIndex < substrIndex and tempIndex != -1:
                    substrIndex = tempIndex
            return substrIndex  

This codes does the job, but takes a lot of time (I run it several times for the same word lists but in 100 big strings (about ~10K-20K words each).

I am sure there's a better way (and a more pythonic way to do so).

This seems work well and tells you what word it found (although that could be left out):

words = 'a big red dog car woman mountain are the ditch'.split()
sentence = 'her smooth lips reminded me of the front of a big red car lying in the ditch'

from sys import maxint
def find(word, sentence):
    try:
        return sentence.index(word), word
    except ValueError:
        return maxint, None
print min(find(word, sentence) for word in words)

A one liner with list comprehension would be

return min([index for index in [bigString.find(word, startIndex) for word in wordList] if index != -1])

But I would argue if you split it into two lines its more readable

indexes = [bigString.find(word, startIndex) for word in wordList]
return min([index for index in indexes if index != -1])
import re

def findFirstOccurence(wordList, bigString, startIndex=0):
    return re.search('|'.join(wordList), bigString[startIndex:]).start()

wordList = ['hello', 'world']
bigString = '1 2 3 world'

print findFirstOccurence(wordList, bigString)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM