Here is what I am trying:
import re
r = re.compile(r'(?P<label>(?:[^_]+)+)(_r(?P<repeat_num>\d+))?')
def main():
s1 = 'abc_123'
s2 = 'abc_123_r1'
m1 = r.match(s1)
m2 = r.match(s2)
print(m1.groups())
print(m2.groups())
if __name__ == "__main__":
main()
I am expecting the first string s1
to match abc_123
for the label
group and nothing for repeat_num
.
And I am expecting the second string s2
to match abc_123
for the label
group and '1' for repeat_num
.
The actual result stops at abc
in both cases.
It looks like it's partially due to the [^_]
bit, which matches "any character except underscore".
I couldn't immediately figure out a solution that would properly capture these tokens; I highly recommend using RegExr to play with your regular expression in order to figure out how to match the pieces correctly.
Your pattern is not matching the _
between the abc
and 123
pieces of your input strings. You need to modify your first capturing group in order to be able to handle those.
A direct translation though may run into difficulties, because it's a bit difficult to distinguish the last _r1
block from a normal extra block like _123
. I think the pattern below does it correctly, but you should double check that it always does what you expect:
(?P<label>[^_]+(?:_[^_]+)*?)(?:_r(?P<repeat_num>\d+))?
If you always require at least two underlined separated groups in the first part of the text (eg abc_123
, but never just abc
or 123
by itself), you should replace the *?
with +?
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.