简体   繁体   English

如何返回匹配重复模式的整个非拉丁字符串,例如 AAB 或 ABB

[英]How to return whole non-latin strings matching a reduplication pattern, such as AAB or ABB

I am working with strings of non-latin characters.我正在处理非拉丁字符的字符串。 I want to match strings with reduplication patterns, such as AAB, ABB, ABAB, etc. I tried out the following code:我想将字符串与重复模式匹配,例如 AAB、ABB、ABAB 等。我尝试了以下代码:

import re

patternAAB = re.compile(r'\b(\w)\1\w\b')
match = patternAAB.findall(rawtext)
print(match) 

However, it reurns only the first character of the matched string.但是,它只返回匹配字符串的第一个字符。 I know this happens because of the capturing parenthesis around the first \\w.我知道这是因为第一个 \\w 周围的捕获括号。

I tried to add capturing parenthesis around the whole matched block, but Python gives我试图在整个匹配块周围添加捕获括号,但 Python 给出了

error: cannot refer to an open group at position 7

I also found this method,but didn't work for me:我也找到了这种方法,但对我不起作用:

patternAAB = re.compile(r'\b(\w)\1\w\b')
match = patternAAB.search(rawtext)
if match:
    print(match.group(1))

How could I match the pattern and return the whole matching string?如何匹配模式并返回整个匹配字符串?

# Ex. 哈哈笑 
# string matches AAB pattern so my code returns 哈 
# but not the entire string

The message:消息:

error: cannot refer to an open group at position 7

is telling you that \\1 refers to the group with parentheses all around, because its opening parenthesis comes first.告诉你\\1指的是周围有括号的组,因为它的左括号在前。 The group you want to backreference is number 2, so this code works:您要反向引用的组是 2 号,因此此代码有效:

import re

rawtext = 'abc 哈哈笑 def'

patternAAB = re.compile(r'\b((\w)\2\w)\b')
match = patternAAB.findall(rawtext)
print(match)

Each item in match has both groups: match每个项目都有两个组:

[('哈哈笑', '哈')]

I also found this method, but didn't work for me:我也找到了这种方法,但对我不起作用:

You were close here as well.你也离这里很近。 You can use match.group(0) to get the full match, not just a group in parentheses.您可以使用match.group(0)获得完整匹配,而不仅仅是括号中的组。 So this code works:所以这段代码有效:

import re

rawtext = 'abc 哈哈笑 def'

patternAAB = re.compile(r'\b(\w)\1\w\b')
match = patternAAB.search(rawtext)
if match:
    print(match.group(0))   # 哈哈笑

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM