Let's say, I have a string:
s = "Hello, stack exchange. Let's solve my query"
And let's say I have 3 substrings
s1 = "solve"
s2 = "stack"
s3 = "Not present"
Is there a shortcut to determine which substring comes first in s?
I know, I can write a function which can find indexes of substrs, probably store substr-index pair in a dictionary and then compare all non negative index but is there a shorter way or pythonic way of doing this?
Another way of getting this using regex is:
import re
s = "Hello, stack exchange. Let's solve my query"
s1 = "solve"
s2 = "stack"
s3 = "Not present"
r1 = re.compile('|'.join([s1,s2,s3]))
r1.findall(s)
this will return a list like this:
['stack', 'solve']
from the index of the list you can get which of the search string occured first.
You could use generators to find all positions, and min()
to locate the left-most:
positions = (s.find(sub), sub) for sub in (s1, s2, s3))
leftmost = min((pos, sub) for pos, sub in positions if pos > -1)[1]
This runs s.find()
just once for each substring, filtering out any substring not present. If there are no substring matches at all, min()
will throw a ValueError
exception; you may want to catch that.
This does scan the string 3 times; if the number of substrings tested is large enough, you'd want to build a trie structure instead, loop over indices into s
and test if the characters at that position are present in the trie:
def make_trie(*words):
root = {}
for word in words:
current = root
for letter in word:
current = current.setdefault(letter, {})
# insert sentinel at the end
current[None] = None
return root
def find_first(s, trie):
for i in range(len(s)):
pos, current, found = i, trie, []
while pos < len(s) and s[pos] in current:
found.append(s[pos])
current = current[s[pos]]
if None in current: # whole substring detected
return ''.join(found)
pos += 1
leftmost = find_first(s, make_trie(s1, s2, s3))
The trie can be re-used for multiple strings.
This is the shortest way to do this . Create a regex, and use re.search to break at first match.
import re
inputs = ['solve','stack','Not present']
s = "Hello, stack exchange. Let's solve my query"
match = re.search(re.compile('|'.join(inputs)),s)
print(match.group())
#returns 'stack'
you can try this:
first_substr = min([(s.find(substr),substr) for substr in [s1, s2, s3] if s.find(substr)!=-1])[1]
Thanks
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.