How to match alternatives with python regex

Question

given string 1:

'''TOM likes to go swimming MARY loves to go to the playground JANE likes going shopping'''

I want to capture the text between only 2 names. Either Tom and Mary or Tom and Jane. If Mary appears before Jane, I would like to capture the text between Tom and Mary. However, if Jane appears first, I would like to capture the text between Tom and Jane.

I have written the following code:

text = re.compile(r'''(
            TOM\s*
            ([\w\W]+)\s*
            JANE|MARY
            )''', re.VERBOSE)

text_out = text.search(string).group(1)

However, this code would give me the text between Tom and Jane, even though Mary appears first. I understand that this is because the pipe function reads from left to right and therefore will match Jane first. Is there a way to code this such that it depends on who appears first in the text?

for example, in string2: "'''TOM likes to go swimming JANE likes going shopping MARY loves to go to the playground '''

I would like to capture the text between Tom and Jane for string2.

Answer 1

You need to fix your alternation, it must be enclosed with a non-capturing group (?:JANE|MARY) , and use a lazy quantifier with [\\w\\W] (that I would replace with .* and use re.DOTALL modifier to make the dot to also match line breaks):

(?s)TOM\s*(.+?)\s*(?:JANE|MARY)

See the regex demo

Without the (?:...|...) , your regex matched Tom , then any 1+ chars as many as possible (that is, the regex grabbed the whole string, and then backtracked to match the last occurrence of the subsequent subpattern, JANE ) and JANE , or MARY substring. Now, the fixed regex matches:

(?s) - DOTALL inline modifier
TOM - a literal char sequence
\\s* - 0+ whitespaces
(.+?) - Group 1 (capturing): any 1+ chars, as few as possible, up to the first occurrence of the subsequent subpatterns....
\\s* - 0+ whitespaces
(?:JANE|MARY) - either JANE or MARY substring.

How to match alternatives with python regex

Question

1 answers

solution1
4 ACCPTED 2017-03-18 19:18:49

How to match alternatives with python regex

Question

1 answers

solution1 4 ACCPTED 2017-03-18 19:18:49

solution1
4 ACCPTED 2017-03-18 19:18:49