given string 1:
'''TOM likes to go swimming MARY loves to go to the playground JANE likes going shopping'''
I want to capture the text between only 2 names. Either Tom and Mary or Tom and Jane. If Mary appears before Jane, I would like to capture the text between Tom and Mary. However, if Jane appears first, I would like to capture the text between Tom and Jane.
I have written the following code:
text = re.compile(r'''(
TOM\s*
([\w\W]+)\s*
JANE|MARY
)''', re.VERBOSE)
text_out = text.search(string).group(1)
However, this code would give me the text between Tom and Jane, even though Mary appears first. I understand that this is because the pipe function reads from left to right and therefore will match Jane first. Is there a way to code this such that it depends on who appears first in the text?
for example, in string2: "'''TOM likes to go swimming JANE likes going shopping MARY loves to go to the playground '''
I would like to capture the text between Tom and Jane for string2.
You need to fix your alternation, it must be enclosed with a non-capturing group (?:JANE|MARY)
, and use a lazy quantifier with [\\w\\W]
(that I would replace with .*
and use re.DOTALL
modifier to make the dot to also match line breaks):
(?s)TOM\s*(.+?)\s*(?:JANE|MARY)
See the regex demo
Without the (?:...|...)
, your regex matched Tom
, then any 1+ chars as many as possible (that is, the regex grabbed the whole string, and then backtracked to match the last occurrence of the subsequent subpattern, JANE
) and JANE
, or MARY
substring. Now, the fixed regex matches:
(?s)
- DOTALL inline modifier TOM
- a literal char sequence \\s*
- 0+ whitespaces (.+?)
- Group 1 (capturing): any 1+ chars, as few as possible, up to the first occurrence of the subsequent subpatterns.... \\s*
- 0+ whitespaces (?:JANE|MARY)
- either JANE
or MARY
substring.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.