简体   繁体   中英

What am I doing wrong with this Python regex that is supposed to match repeats of a pattern, followed by an optional pattern?

Here is what I am trying:

import re

r = re.compile(r'(?P<label>(?:[^_]+)+)(_r(?P<repeat_num>\d+))?')

def main():
    s1 = 'abc_123'
    s2 = 'abc_123_r1'

    m1 = r.match(s1)
    m2 = r.match(s2)

    print(m1.groups())
    print(m2.groups())

if __name__ == "__main__":
    main()

I am expecting the first string s1 to match abc_123 for the label group and nothing for repeat_num .

And I am expecting the second string s2 to match abc_123 for the label group and '1' for repeat_num .

The actual result stops at abc in both cases.

It looks like it's partially due to the [^_] bit, which matches "any character except underscore".

I couldn't immediately figure out a solution that would properly capture these tokens; I highly recommend using RegExr to play with your regular expression in order to figure out how to match the pieces correctly.

Your pattern is not matching the _ between the abc and 123 pieces of your input strings. You need to modify your first capturing group in order to be able to handle those.

A direct translation though may run into difficulties, because it's a bit difficult to distinguish the last _r1 block from a normal extra block like _123 . I think the pattern below does it correctly, but you should double check that it always does what you expect:

(?P<label>[^_]+(?:_[^_]+)*?)(?:_r(?P<repeat_num>\d+))?

If you always require at least two underlined separated groups in the first part of the text (eg abc_123 , but never just abc or 123 by itself), you should replace the *? with +? .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM