简体   繁体   中英

How to match alternatives with python regex

given string 1:

'''TOM likes to go swimming MARY loves to go to the playground JANE likes going shopping'''

I want to capture the text between only 2 names. Either Tom and Mary or Tom and Jane. If Mary appears before Jane, I would like to capture the text between Tom and Mary. However, if Jane appears first, I would like to capture the text between Tom and Jane.

I have written the following code:

text = re.compile(r'''(
            TOM\s*
            ([\w\W]+)\s*
            JANE|MARY
            )''', re.VERBOSE)

text_out = text.search(string).group(1)

However, this code would give me the text between Tom and Jane, even though Mary appears first. I understand that this is because the pipe function reads from left to right and therefore will match Jane first. Is there a way to code this such that it depends on who appears first in the text?

for example, in string2: "'''TOM likes to go swimming JANE likes going shopping MARY loves to go to the playground '''

I would like to capture the text between Tom and Jane for string2.

You need to fix your alternation, it must be enclosed with a non-capturing group (?:JANE|MARY) , and use a lazy quantifier with [\\w\\W] (that I would replace with .* and use re.DOTALL modifier to make the dot to also match line breaks):

(?s)TOM\s*(.+?)\s*(?:JANE|MARY)

See the regex demo

Without the (?:...|...) , your regex matched Tom , then any 1+ chars as many as possible (that is, the regex grabbed the whole string, and then backtracked to match the last occurrence of the subsequent subpattern, JANE ) and JANE , or MARY substring. Now, the fixed regex matches:

  • (?s) - DOTALL inline modifier
  • TOM - a literal char sequence
  • \\s* - 0+ whitespaces
  • (.+?) - Group 1 (capturing): any 1+ chars, as few as possible, up to the first occurrence of the subsequent subpatterns....
  • \\s* - 0+ whitespaces
  • (?:JANE|MARY) - either JANE or MARY substring.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM