简体   繁体   中英

Regex find greedy and lazy matches and all in-between

I have a sequence like such '01 02 09 02 09 02 03 05 09 08 09 ' , and I want to find a sequence that starts with 01 and ends with 09 , and in-between there can be one to nine double-digit, such as 02 , 03 , 04 etc. This is what I have tried so far.

I'm using w{2}\s ( w{2} for matching the two digits, and \s for the whitespace). This can occur one to nine times, which leads to (\w{2}\s){1,9} . The whole regex becomes (01\s(\w{2}\s){1,9}09\s) . This returns the following result:

<regex.Match object; span=(0, 33), match='01 02 09 02 09 02 03 05 09 08 09 '>

If I use the lazy quantifier ? , it returns the following result:

<regex.Match object; span=(0, 9), match='01 02 09 '>

How can I obtain the results in-between too. The desired result would include all the following:

<regex.Match object; span=(0, 9), match='01 02 09 '>
<regex.Match object; span=(0, 15), match='01 02 09 02 09 '>
<regex.Match object; span=(0, 27), match='01 02 09 02 09 02 03 05 09 '>
<regex.Match object; span=(0, 33), match='01 02 09 02 09 02 03 05 09 08 09 '>

You can extract these strings using

import re
s = "01 02 09 02 09 02 03 05 09 08 09 "
m = re.search(r'01(?:\s\w{2})+\s09', s)
if m:
    print( [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])] )
# => ['01 02 09 02 09 02 03 05 09 08 09', '01 02 09 02 09 02 03 05 09', '01 02 09 02 09', '01 02 09']

See the Python demo .

With the 01(?:\s\w{2})+\s09 pattern and re.search , you can extract the substrings from 01 to the last 09 (with any space separated two word char chunks in between).

The second step - [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])] - is to reverse the string and the pattern to get all overlapping matches from 09 to 01 and then reverse them to get final strings.

You may also reverse the final list if you add [::-1] at the end of the list comprehension: print( [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])][::-1] ) .

Here would be a non-regex answer that post-processes the matching elements:

s = '01 02 09 02 09 02 03 05 09 08 09 '.trim().split()
assert s[0] == '01'        \
   and s[-1] == '09'       \
   and (3 <= len(s) <= 11) \
   and len(s) == len([elem for elem in s if len(elem) == 2 and elem.isdigit() and elem[0] == '0'])
[s[:i+1] for i in sorted({s.index('09', i) for i in range(2,len(s))})]
# [
#    ['01', '02', '09'], 
#    ['01', '02', '09', '02', '09'], 
#    ['01', '02', '09', '02', '09', '02', '03', '05', '09'],
#    ['01', '02', '09', '02', '09', '02', '03', '05', '09', '08', '09']
# ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM