python正则表达式：匹配多个正则表达式之一

Question

I have a string and three patterns that I want to match and I use the python re package. 我有一个字符串和三个要匹配的模式，我使用python re包。 Specifically, if one of the pattern is found, output "Dislikes", otherwise, output "Likes". 具体来说，如果找到模式之一，则输出“不喜欢”，否则，输出“喜欢”。 Brief info about the three patterns: 有关这三种模式的简要信息：

pattern 1: check if all character in string is uppercase letter 模式1：检查字符串中的所有字符是否均为大写字母

pattern 2: check if consecutive character are the same, for example, AA , BB ... 模式2：检查连续字符是否相同，例如AA ， BB ...

pattern3 : check if pattern XYXY exist, X and Y can be same and letters in this pattern do not need to be next to each other. pattern3：检查是否存在XYXY模式， X和Y可以相同，并且该模式中的字母不必彼此相邻。

When I write the pattern separately, the program runs as expected. 当我分别编写模式时，程序将按预期运行。 But when I combine the 3 patterns using alternation | 但是，当我使用交替组合3种模式时| , the result is wrong. ，结果是错误的。 I have check the stackoverflow post, for example, here and here . 我已经检查了stackoverflow帖子，例如，在这里和这里。 Solution provided there do not work for me. 提供的解决方案对我不起作用。

Here is the original code that works fine: 这是可以正常工作的原始代码：

import sys
import re

if __name__ == "__main__":
    pattern1 = re.compile(r"[^A-Z]+")
    pattern2 = re.compile(r"([A-Z])\1")
    pattern3 = re.compile(r"([A-Z])[A-Z]*([A-Z])[A-Z]*\1[A-Z]*\2")

    word = sys.stdin.readline()
    word = word.rstrip('\n')
    if pattern1.search(word) or pattern2.search(word) or pattern3.search(word):
        print("Dislikes")
    else:
        print("Likes")

If I combine the 3 pattern to one using the following code, something is wrong: 如果我使用以下代码将3种模式组合为一种，则可能是错误的：

import sys
import re

if __name__ == "__main__":

    pattern = r"([A-Z])[A-Z]*([A-Z])[A-Z]*\1[A-Z]*\2|([A-Z])\1|[^A-Z]+"

    word = sys.stdin.readline()

    word = word.rstrip('\n')
    if re.search(word, pattern):
        print("Dislikes")
    else:
       print("Likes")

If we call the 3 patterns p1 , p2 , and p3 , I also tried the following combination: 如果我们将3种模式分别称为p1 ， p2和p3 ，我还尝试了以下组合：

pattern = r"(p1|p2|p3)"
pattern = r"(p1)|(p2)|(p3)"

But they also do not work as expected. 但是它们也无法按预期工作。 What is the correct to combine them? 结合它们的正确方法是什么？

Test cases: 测试用例：

"Likes": ABC , ABCD , A , ABCBA “喜欢”： ABC ， ABCD ， A ， ABCBA
"Dislikes": ABBC (pattern2), THETXH (pattern3), ABACADA (pattern3), AbCD (pattern1) “不喜欢”： ABBC （模式2）， THETXH （pattern3）， ABACADA （pattern3）， AbCD （模式1）

Answer 1

Here is a single pattern that joins yours: 这是一个加入您的模式：

([^A-Z]+|([A-Z])\2|([A-Z])[A-Z]*([A-Z])[A-Z]*\3[A-Z]*\4)

So, why does it work? 那么，为什么行得通呢？

It consists of a simple (p1|p2|p3) pattern, where p1 , p2 and p3 are those you defined before: 它由一个简单的(p1|p2|p3)模式组成，其中p1 ， p2和p3是您之前定义的模式：

[^A-Z]+
([A-Z])\1
([A-Z])[A-Z]*([A-Z])[A-Z]*\1[A-Z]*\2

It can be decomposed as: 它可以分解为：

(
  [^A-Z]+
 |([A-Z])\2
 |([A-Z])[A-Z]*([A-Z])[A-Z]*\3[A-Z]*\
)

The problem you were encoutering is the numbering of the groups. 您遇到麻烦的问题是组的编号。

First off, when you combine p2 and p3 , both refer to \\1 , but the latter represents different things across the two patterns. 首先，当您组合p2和p3 ，都引用\\1 ，但是后者在两种模式中表示不同的事物。 Therefore, p3 should become ...\\2...\\3 , since there is an additional group before. 因此， p3应该成为...\\2...\\3 ，因为之前还有一个附加组。

Furthermore, the group indices refered to by \\number are indexed in the order in which they are opened. 此外，由\\number引用的组索引按打开顺序进行索引。 As a consequence, the very first parenthesis, corresponding to the opening of the outer (...|...|...) , is counted as the first group, and \\1 will refer to it. 因此，与外部(...|...|...)的开口相对应的第一个括号被计为第一组， \\1将被引用为第一组。 Of course, this is not what you want. 当然，这不是您想要的。 But in addition, this gives you an error, because then, \\1 refers to a group that has not been closed yet, and thus not defined. 但是此外，这还会给您带来错误，因为\\1指向尚未关闭的组，因此尚未定义。

Therefore, the indices should be shifted by one, becoming \\2 , \\3 and \\4 . 因此，索引应移位一个，分别变为\\2 ， \\3和\\4 。

Such A|B regexes are usually nested into parentheses, but the outer ones could actually be dropped, and the indices shifted back by one: 此类A|B表达式通常嵌套在括号中，但实际上可以将其删除，并将索引移回一个：

[^A-Z]+|([A-Z])\1|([A-Z])[A-Z]*([A-Z])[A-Z]*\2[A-Z]*\3

Here is a small demonstration of this pattern: 这是此模式的一个小例子：

import sys
import re

if __name__ == "__main__":
    pattern1 = re.compile(r"[^A-Z]+")
    pattern2 = re.compile(r"([A-Z])\1")
    pattern3 = re.compile(r"([A-Z])[A-Z]*([A-Z])[A-Z]*\1[A-Z]*\2")    
    pattern = re.compile(r"([^A-Z]+|([A-Z])\2|([A-Z])[A-Z]*([A-Z])[A-Z]*\3[A-Z]*\4)")

    while True:
        try:
            word = input("> ")
            print(pattern1.search(word))
            print(pattern2.search(word))
            print(pattern3.search(word))
            print(pattern.search(word))
        except Exception as error:
            print(error)

Interactive session: 互动环节：

> ABC    # Matches no pattern
None
None
None
None

> ABCBA  # Matches no pattern
None
None
None
None

> ABBC   # Matches p2
None
<_sre.SRE_Match object; span=(1, 3), match='BB'> # p2 is matched
None
<_sre.SRE_Match object; span=(1, 3), match='BB'> # Jointure gives the same match

> ABACADA # Matches p3
None
None
<_sre.SRE_Match object; span=(0, 7), match='ABACADA'> # p3 is matched
<_sre.SRE_Match object; span=(0, 7), match='ABACADA'> # Jointure gives the same match

python正则表达式：匹配多个正则表达式之一

问题描述

Test cases: 测试用例：

1 个解决方案

解决方案1
5 已采纳 2017-09-04 15:05:50

python正则表达式：匹配多个正则表达式之一

问题描述

Test cases: 测试用例：

1 个解决方案

解决方案1 5 已采纳 2017-09-04 15:05:50

解决方案1
5 已采纳 2017-09-04 15:05:50