[英]Regex find greedy and lazy matches and all in-between
I have a sequence like such '01 02 09 02 09 02 03 05 09 08 09 '
, and I want to find a sequence that starts with 01
and ends with 09
, and in-between there can be one to nine double-digit, such as 02
, 03
, 04
etc. This is what I have tried so far.我有一个像这样的序列
'01 02 09 02 09 02 03 05 09 08 09 '
,我想找到一个以01
开头并以09
结尾的序列,并且中间可以有 1 到 9 个两位数,例如02
, 03
, 04
等。这是我到目前为止尝试过的。
I'm using w{2}\s
( w{2}
for matching the two digits, and \s
for the whitespace).我正在使用
w{2}\s
( w{2}
用于匹配两个数字,而\s
用于空白)。 This can occur one to nine times, which leads to (\w{2}\s){1,9}
.这可能会发生一到九次,从而导致
(\w{2}\s){1,9}
。 The whole regex becomes (01\s(\w{2}\s){1,9}09\s)
.整个正则表达式变为
(01\s(\w{2}\s){1,9}09\s)
。 This returns the following result:这将返回以下结果:
<regex.Match object; span=(0, 33), match='01 02 09 02 09 02 03 05 09 08 09 '>
If I use the lazy quantifier ?
如果我使用惰性量词
?
, it returns the following result: ,它返回以下结果:
<regex.Match object; span=(0, 9), match='01 02 09 '>
How can I obtain the results in-between too.我怎样才能获得中间的结果。 The desired result would include all the following:
期望的结果将包括以下所有内容:
<regex.Match object; span=(0, 9), match='01 02 09 '>
<regex.Match object; span=(0, 15), match='01 02 09 02 09 '>
<regex.Match object; span=(0, 27), match='01 02 09 02 09 02 03 05 09 '>
<regex.Match object; span=(0, 33), match='01 02 09 02 09 02 03 05 09 08 09 '>
You can extract these strings using您可以使用提取这些字符串
import re
s = "01 02 09 02 09 02 03 05 09 08 09 "
m = re.search(r'01(?:\s\w{2})+\s09', s)
if m:
print( [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])] )
# => ['01 02 09 02 09 02 03 05 09 08 09', '01 02 09 02 09 02 03 05 09', '01 02 09 02 09', '01 02 09']
See the Python demo .请参阅Python 演示。
With the 01(?:\s\w{2})+\s09
pattern and re.search
, you can extract the substrings from 01
to the last 09
(with any space separated two word char chunks in between).使用
01(?:\s\w{2})+\s09
模式和re.search
,您可以提取从01
到最后一个09
的子字符串(中间有任何空格分隔两个单词字符块)。
The second step - [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])]
- is to reverse the string and the pattern to get all overlapping matches from 09
to 01
and then reverse them to get final strings.第二步——
[x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])]
- 是将字符串和模式反转得到从09
到01
的所有重叠匹配,然后反转它们得到最终的字符串。
You may also reverse the final list if you add [::-1]
at the end of the list comprehension: print( [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])][::-1] )
.如果在列表理解的末尾添加
[::-1]
,也可以反转最终列表: print( [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])][::-1] )
。
Here would be a non-regex answer that post-processes the matching elements:这将是一个非正则表达式的答案,它对匹配元素进行后处理:
s = '01 02 09 02 09 02 03 05 09 08 09 '.trim().split()
assert s[0] == '01' \
and s[-1] == '09' \
and (3 <= len(s) <= 11) \
and len(s) == len([elem for elem in s if len(elem) == 2 and elem.isdigit() and elem[0] == '0'])
[s[:i+1] for i in sorted({s.index('09', i) for i in range(2,len(s))})]
# [
# ['01', '02', '09'],
# ['01', '02', '09', '02', '09'],
# ['01', '02', '09', '02', '09', '02', '03', '05', '09'],
# ['01', '02', '09', '02', '09', '02', '03', '05', '09', '08', '09']
# ]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.