Python: Is there a shortcut to finding which substring(from a set of substrings) comes first in a string?

Question

Let's say, I have a string:

s = "Hello, stack exchange. Let's solve my query"

And let's say I have 3 substrings

s1 = "solve"
s2 = "stack"
s3 = "Not present"

Is there a shortcut to determine which substring comes first in s?

I know, I can write a function which can find indexes of substrs, probably store substr-index pair in a dictionary and then compare all non negative index but is there a shorter way or pythonic way of doing this?

Answer 1

Another way of getting this using regex is:

import re
s = "Hello, stack exchange. Let's solve my query"
s1 = "solve"
s2 = "stack"
s3 = "Not present"
r1 = re.compile('|'.join([s1,s2,s3]))
r1.findall(s)

this will return a list like this:

['stack', 'solve']

from the index of the list you can get which of the search string occured first.

Answer 2

You could use generators to find all positions, and min() to locate the left-most:

positions = (s.find(sub), sub) for sub in (s1, s2, s3))
leftmost = min((pos, sub) for pos, sub in positions if pos > -1)[1]

This runs s.find() just once for each substring, filtering out any substring not present. If there are no substring matches at all, min() will throw a ValueError exception; you may want to catch that.

This does scan the string 3 times; if the number of substrings tested is large enough, you'd want to build a trie structure instead, loop over indices into s and test if the characters at that position are present in the trie:

def make_trie(*words):
     root = {}
     for word in words:
         current = root
         for letter in word:
             current = current.setdefault(letter, {})
         # insert sentinel at the end
         current[None] = None
     return root

def find_first(s, trie):
    for i in range(len(s)):
        pos, current, found = i, trie, []
        while pos < len(s) and s[pos] in current:
            found.append(s[pos])
            current = current[s[pos]]
            if None in current:  # whole substring detected
                return ''.join(found)
            pos += 1

leftmost = find_first(s, make_trie(s1, s2, s3))

The trie can be re-used for multiple strings.

Answer 3

This is the shortest way to do this . Create a regex, and use re.search to break at first match.

import re
inputs = ['solve','stack','Not present']
s = "Hello, stack exchange. Let's solve my query"
match = re.search(re.compile('|'.join(inputs)),s)
print(match.group())
#returns 'stack'

Demo: http://codepad.org/qoFtkQys

Answer 4

you can try this:

first_substr = min([(s.find(substr),substr) for substr in [s1, s2, s3] if s.find(substr)!=-1])[1]

Thanks

Python: Is there a shortcut to finding which substring(from a set of substrings) comes first in a string?

Question

4 answers

solution1
4 2016-04-21 10:05:18

solution2
2 ACCPTED 2016-04-21 10:00:30

solution3
1 2016-04-21 10:11:52

Demo: http://codepad.org/qoFtkQys

solution4
1 2016-04-21 10:14:37

Python: Is there a shortcut to finding which substring(from a set of substrings) comes first in a string?

Question

4 answers

solution1 4 2016-04-21 10:05:18

solution2 2 ACCPTED 2016-04-21 10:00:30

solution3 1 2016-04-21 10:11:52

Demo: http://codepad.org/qoFtkQys

solution4 1 2016-04-21 10:14:37

solution1
4 2016-04-21 10:05:18

solution2
2 ACCPTED 2016-04-21 10:00:30

solution3
1 2016-04-21 10:11:52

solution4
1 2016-04-21 10:14:37