简体   繁体   English

使用Python匹配对象和特定正则表达式

[英]Matching an object and a specific regex with Python

Given a text, I need to check for each char if it has exactly (edited) 3 capital letters on both sides and if there are, add it to a string of such characters that is retured. 给定一个文本,我需要检查每个字符的两边是否都有(已编辑的) 3个大写字母,如果有,请将其添加到一个这样显示的字符串中。

I wrote the following: m = re.match("[AZ]{3}.[AZ]{3}", text) (let's say text="AAAbAAAcAAA") 我写了以下代码: m = re.match("[AZ]{3}.[AZ]{3}", text) (比如说text =“ AAAbAAAcAAA”)

I expected to get two groups in the match object: "AAAbAAA" and "AAAcAAA" 我希望在匹配对象中获得两个组:“ AAAbAAA”和“ AAAcAAA”

Now, When i invoke m.group(0) I get "AAAbAAA" which is right. 现在,当我调用m.group(0)我得到了“ AAAbAAA”,这是正确的。 Yet, when invoking m.group(1) , I find that there is no such group, meaning "AAAcAAA" wasn't a match. 但是,当调用m.group(1) ,我发现没有这样的组,这意味着“ AAAcAAA”不是匹配项。 Why? 为什么?

Also, when invoking m.groups() , I get an empty tuple although I should get a tuple of the matches, meaning that in my case I should have gotten a tuple with "AAAbAAA". 另外,在调用m.groups() ,我得到一个空的元组,尽管我应该得到一个匹配的元组,这意味着在我的情况下,我应该得到一个带有“ AAAbAAA”的元组。 Why doesn't that work? 为什么不起作用?

You don't have any groups in your pattern. 您的模式中没有任何组。 To capture something in a group, you have to surround it with parentheses: 要捕获组中的某物,必须用括号将其括起来:

([A-Z]{3}).[A-Z]{3}

The exception is m.group(0) , which will always contain the entire match. m.group(0)是例外,它将始终包含整个匹配项。

Looking over your question, it sounds like you aren't actually looking for capture groups, but rather overlapping matches. 查看您的问题,听起来您实际上并不是在寻找捕获组,而是重叠的匹配项。 In regex, a group means a smaller part of the match that is set aside for later use. 在正则表达式中,组是指比赛中较小的部分,留作以后使用。 For example, if you're trying to match phone numbers with something like 例如,如果您尝试将电话号码与类似

([0-9]{3})-([0-9]{3}-[0-9]{4})

then the area code would be in group(1) , the local part in group(2) , and the entire thing would be in group(0) . 那么区号将在group(1) ,本地部分在group(2) ,而整个事物将在group(0)

What you want is to find overlapping matches. 您要查找重叠的匹配项。 Here's a Stack Overflow answer that explains how to do overlapping matches in Python regex , and here's my favorite reference for capture groups and regex in general. 这是一个Stack Overflow答案,解释了如何在Python regex中进行重叠匹配这是我最喜欢的捕获组和regex参考。

One, you are using match when it looks like you want findall . 第一,看起来像findall时,您正在使用match It won't grab the enclosing capital triplets, but re.findall('[AZ]{3}([az])(?=[AZ]{3})', search_string) will get you all single lower case characters surrounded on both sides by 3 caps. 它不会抓住封闭的大写re.findall('[AZ]{3}([az])(?=[AZ]{3})', search_string)三元组,但是re.findall('[AZ]{3}([az])(?=[AZ]{3})', search_string)将为您提供所有小写字母两侧各有3个盖帽。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM