如何使用python正则表达式匹配替代项

Question

given string 1: 给定字符串1：

'''TOM likes to go swimming MARY loves to go to the playground JANE likes going shopping''' '''TOM喜欢游泳MARY喜欢去游乐场JANE喜欢购物'''

I want to capture the text between only 2 names. 我只想捕获两个名称之间的文本。 Either Tom and Mary or Tom and Jane. 汤姆和玛丽或汤姆和简。 If Mary appears before Jane, I would like to capture the text between Tom and Mary. 如果Mary出现在Jane之前，我想捕捉Tom和Mary之间的文本。 However, if Jane appears first, I would like to capture the text between Tom and Jane. 但是，如果简首先出现，我想捕捉汤姆和简之间的文字。

I have written the following code: 我写了以下代码：

text = re.compile(r'''(
            TOM\s*
            ([\w\W]+)\s*
            JANE|MARY
            )''', re.VERBOSE)

text_out = text.search(string).group(1)

However, this code would give me the text between Tom and Jane, even though Mary appears first. 但是，即使玛丽首先出现，此代码也会给我提供Tom和Jane之间的文本。 I understand that this is because the pipe function reads from left to right and therefore will match Jane first. 我知道这是因为管道函数从左到右读取，因此将首先匹配Jane。 Is there a way to code this such that it depends on who appears first in the text? 有没有一种方法可以对此进行编码，以使其取决于谁首先出现在文本中？

for example, in string2: "'''TOM likes to go swimming JANE likes going shopping MARY loves to go to the playground ''' 例如，在string2中：“'''TOM喜欢去游泳JANE喜欢去购物MARY喜欢去游乐场'''

I would like to capture the text between Tom and Jane for string2. 我想要捕获Tom和Jane之间的string2文本。

Answer 1

You need to fix your alternation, it must be enclosed with a non-capturing group (?:JANE|MARY) , and use a lazy quantifier with [\\w\\W] (that I would replace with .* and use re.DOTALL modifier to make the dot to also match line breaks): 您需要修复替换，它必须包含在一个非捕获组(?:JANE|MARY) ，并使用带有[\\w\\W]的惰性量词（我将用.*替换并使用re.DOTALL使点也与换行符匹配的修饰符）：

(?s)TOM\s*(.+?)\s*(?:JANE|MARY)

See the regex demo 见正则表达式演示

Without the (?:...|...) , your regex matched Tom , then any 1+ chars as many as possible (that is, the regex grabbed the whole string, and then backtracked to match the last occurrence of the subsequent subpattern, JANE ) and JANE , or MARY substring. 如果不使用(?:...|...) ，则您的正则表达式匹配Tom ，那么任何1个以上的字符都应尽可能多（即，正则表达式捕获了整个字符串，然后回溯以匹配后续出现的最后一个字符）子模式（ JANE ）和JANE或MARY子字符串。 Now, the fixed regex matches: 现在，固定的正则表达式匹配：

(?s) - DOTALL inline modifier (?s) -DOTALL内联修饰符
TOM - a literal char sequence TOM文字字符序列
\\s* - 0+ whitespaces \\s* -0+空格
(.+?) - Group 1 (capturing): any 1+ chars, as few as possible, up to the first occurrence of the subsequent subpatterns.... (.+?) -组1（捕获）：直到后继子模式的第一个出现为止，尽可能少的 1+个字符。
\\s* - 0+ whitespaces \\s* -0+空格
(?:JANE|MARY) - either JANE or MARY substring. (?:JANE|MARY) JANE或MARY子字符串。

如何使用python正则表达式匹配替代项

问题描述

1 个解决方案

解决方案1
4 已采纳 2017-03-18 19:18:49

如何使用python正则表达式匹配替代项

问题描述

1 个解决方案

解决方案1 4 已采纳 2017-03-18 19:18:49

解决方案1
4 已采纳 2017-03-18 19:18:49