[英]python regular expression: match either one of several regular expressions
I have a string and three patterns that I want to match and I use the python re
package. 我有一个字符串和三个要匹配的模式,我使用python re
包。 Specifically, if one of the pattern is found, output "Dislikes", otherwise, output "Likes". 具体来说,如果找到模式之一,则输出“不喜欢”,否则,输出“喜欢”。 Brief info about the three patterns: 有关这三种模式的简要信息:
pattern 1: check if all character in string is uppercase letter 模式1:检查字符串中的所有字符是否均为大写字母
pattern 2: check if consecutive character are the same, for example,
AA
,BB
... 模式2:检查连续字符是否相同,例如AA
,BB
...pattern3 : check if pattern
XYXY
exist,X
andY
can be same and letters in this pattern do not need to be next to each other. pattern3:检查是否存在XYXY
模式,X
和Y
可以相同,并且该模式中的字母不必彼此相邻。
When I write the pattern separately, the program runs as expected. 当我分别编写模式时,程序将按预期运行。 But when I combine the 3 patterns using alternation |
但是,当我使用交替组合3种模式时|
, the result is wrong. ,结果是错误的。 I have check the stackoverflow post, for example, here and here . 我已经检查了stackoverflow帖子,例如, 在这里和这里 。 Solution provided there do not work for me. 提供的解决方案对我不起作用。
Here is the original code that works fine: 这是可以正常工作的原始代码:
import sys
import re
if __name__ == "__main__":
pattern1 = re.compile(r"[^A-Z]+")
pattern2 = re.compile(r"([A-Z])\1")
pattern3 = re.compile(r"([A-Z])[A-Z]*([A-Z])[A-Z]*\1[A-Z]*\2")
word = sys.stdin.readline()
word = word.rstrip('\n')
if pattern1.search(word) or pattern2.search(word) or pattern3.search(word):
print("Dislikes")
else:
print("Likes")
If I combine the 3 pattern to one using the following code, something is wrong: 如果我使用以下代码将3种模式组合为一种,则可能是错误的:
import sys
import re
if __name__ == "__main__":
pattern = r"([A-Z])[A-Z]*([A-Z])[A-Z]*\1[A-Z]*\2|([A-Z])\1|[^A-Z]+"
word = sys.stdin.readline()
word = word.rstrip('\n')
if re.search(word, pattern):
print("Dislikes")
else:
print("Likes")
If we call the 3 patterns p1
, p2
, and p3
, I also tried the following combination: 如果我们将3种模式分别称为p1
, p2
和p3
,我还尝试了以下组合:
pattern = r"(p1|p2|p3)"
pattern = r"(p1)|(p2)|(p3)"
But they also do not work as expected. 但是它们也无法按预期工作。 What is the correct to combine them? 结合它们的正确方法是什么?
ABC
, ABCD
, A
, ABCBA
“喜欢”: ABC
, ABCD
, A
, ABCBA
ABBC
(pattern2), THETXH
(pattern3), ABACADA
(pattern3), AbCD
(pattern1) “不喜欢”: ABBC
(模式2), THETXH
(pattern3), ABACADA
(pattern3), AbCD
(模式1) Here is a single pattern that joins yours: 这是一个加入您的模式:
([^A-Z]+|([A-Z])\2|([A-Z])[A-Z]*([A-Z])[A-Z]*\3[A-Z]*\4)
So, why does it work? 那么,为什么行得通呢?
It consists of a simple (p1|p2|p3)
pattern, where p1
, p2
and p3
are those you defined before: 它由一个简单的(p1|p2|p3)
模式组成,其中p1
, p2
和p3
是您之前定义的模式:
[^A-Z]+
([A-Z])\1
([A-Z])[A-Z]*([A-Z])[A-Z]*\1[A-Z]*\2
It can be decomposed as: 它可以分解为:
(
[^A-Z]+
|([A-Z])\2
|([A-Z])[A-Z]*([A-Z])[A-Z]*\3[A-Z]*\
)
The problem you were encoutering is the numbering of the groups. 您遇到麻烦的问题是组的编号。
First off, when you combine p2
and p3
, both refer to \\1
, but the latter represents different things across the two patterns. 首先,当您组合p2
和p3
,都引用\\1
,但是后者在两种模式中表示不同的事物。 Therefore, p3
should become ...\\2...\\3
, since there is an additional group before. 因此, p3
应该成为...\\2...\\3
,因为之前还有一个附加组。
Furthermore, the group indices refered to by \\number
are indexed in the order in which they are opened. 此外,由\\number
引用的组索引按打开顺序进行索引。 As a consequence, the very first parenthesis, corresponding to the opening of the outer (...|...|...)
, is counted as the first group, and \\1
will refer to it. 因此,与外部(...|...|...)
的开口相对应的第一个括号被计为第一组, \\1
将被引用为第一组。 Of course, this is not what you want. 当然,这不是您想要的。 But in addition, this gives you an error, because then, \\1
refers to a group that has not been closed yet, and thus not defined. 但是此外,这还会给您带来错误,因为\\1
指向尚未关闭的组,因此尚未定义。
Therefore, the indices should be shifted by one, becoming \\2
, \\3
and \\4
. 因此,索引应移位一个,分别变为\\2
, \\3
和\\4
。
Such A|B
regexes are usually nested into parentheses, but the outer ones could actually be dropped, and the indices shifted back by one: 此类A|B
表达式通常嵌套在括号中,但实际上可以将其删除,并将索引移回一个:
[^A-Z]+|([A-Z])\1|([A-Z])[A-Z]*([A-Z])[A-Z]*\2[A-Z]*\3
Here is a small demonstration of this pattern: 这是此模式的一个小例子:
import sys
import re
if __name__ == "__main__":
pattern1 = re.compile(r"[^A-Z]+")
pattern2 = re.compile(r"([A-Z])\1")
pattern3 = re.compile(r"([A-Z])[A-Z]*([A-Z])[A-Z]*\1[A-Z]*\2")
pattern = re.compile(r"([^A-Z]+|([A-Z])\2|([A-Z])[A-Z]*([A-Z])[A-Z]*\3[A-Z]*\4)")
while True:
try:
word = input("> ")
print(pattern1.search(word))
print(pattern2.search(word))
print(pattern3.search(word))
print(pattern.search(word))
except Exception as error:
print(error)
Interactive session: 互动环节:
> ABC # Matches no pattern
None
None
None
None
> ABCBA # Matches no pattern
None
None
None
None
> ABBC # Matches p2
None
<_sre.SRE_Match object; span=(1, 3), match='BB'> # p2 is matched
None
<_sre.SRE_Match object; span=(1, 3), match='BB'> # Jointure gives the same match
> ABACADA # Matches p3
None
None
<_sre.SRE_Match object; span=(0, 7), match='ABACADA'> # p3 is matched
<_sre.SRE_Match object; span=(0, 7), match='ABACADA'> # Jointure gives the same match
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.