简体   繁体   English


[英]Matching an object and a specific regex with Python

Given a text, I need to check for each char if it has exactly (edited) 3 capital letters on both sides and if there are, add it to a string of such characters that is retured. 给定一个文本,我需要检查每个字符的两边是否都有(已编辑的) 3个大写字母,如果有,请将其添加到一个这样显示的字符串中。

I wrote the following: m = re.match("[AZ]{3}.[AZ]{3}", text) (let's say text="AAAbAAAcAAA") 我写了以下代码: m = re.match("[AZ]{3}.[AZ]{3}", text) (比如说text =“ AAAbAAAcAAA”)

I expected to get two groups in the match object: "AAAbAAA" and "AAAcAAA" 我希望在匹配对象中获得两个组:“ AAAbAAA”和“ AAAcAAA”

Now, When i invoke m.group(0) I get "AAAbAAA" which is right. 现在,当我调用m.group(0)我得到了“ AAAbAAA”,这是正确的。 Yet, when invoking m.group(1) , I find that there is no such group, meaning "AAAcAAA" wasn't a match. 但是,当调用m.group(1) ,我发现没有这样的组,这意味着“ AAAcAAA”不是匹配项。 Why? 为什么?

Also, when invoking m.groups() , I get an empty tuple although I should get a tuple of the matches, meaning that in my case I should have gotten a tuple with "AAAbAAA". 另外,在调用m.groups() ,我得到一个空的元组,尽管我应该得到一个匹配的元组,这意味着在我的情况下,我应该得到一个带有“ AAAbAAA”的元组。 Why doesn't that work? 为什么不起作用?

You don't have any groups in your pattern. 您的模式中没有任何组。 To capture something in a group, you have to surround it with parentheses: 要捕获组中的某物,必须用括号将其括起来:


The exception is m.group(0) , which will always contain the entire match. m.group(0)是例外,它将始终包含整个匹配项。

Looking over your question, it sounds like you aren't actually looking for capture groups, but rather overlapping matches. 查看您的问题,听起来您实际上并不是在寻找捕获组,而是重叠的匹配项。 In regex, a group means a smaller part of the match that is set aside for later use. 在正则表达式中,组是指比赛中较小的部分,留作以后使用。 For example, if you're trying to match phone numbers with something like 例如,如果您尝试将电话号码与类似


then the area code would be in group(1) , the local part in group(2) , and the entire thing would be in group(0) . 那么区号将在group(1) ,本地部分在group(2) ,而整个事物将在group(0)

What you want is to find overlapping matches. 您要查找重叠的匹配项。 Here's a Stack Overflow answer that explains how to do overlapping matches in Python regex , and here's my favorite reference for capture groups and regex in general. 这是一个Stack Overflow答案,解释了如何在Python regex中进行重叠匹配这是我最喜欢的捕获组和regex参考。

One, you are using match when it looks like you want findall . 第一,看起来像findall时,您正在使用match It won't grab the enclosing capital triplets, but re.findall('[AZ]{3}([az])(?=[AZ]{3})', search_string) will get you all single lower case characters surrounded on both sides by 3 caps. 它不会抓住封闭的大写re.findall('[AZ]{3}([az])(?=[AZ]{3})', search_string)三元组,但是re.findall('[AZ]{3}([az])(?=[AZ]{3})', search_string)将为您提供所有小写字母两侧各有3个盖帽。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM