简体   繁体   English

正则表达式组捕获多个匹配

[英]Regular expression group capture with multiple matches

Quick regular expression question. 快速正则表达式问题。
I'm trying to capture multiple instances of a capture group in python (don't think it's python specific), but the subsequent captures seems to overwrite the previous. 我试图在python中捕获捕获组的多个实例(不要认为它是特定于python的),但后续捕获似乎覆盖了之前的捕获。

In this over-simplified example, I'm essentially trying to split a string: 在这个过于简化的示例中,我实际上是在尝试拆分字符串:

x = 'abcdef'
r = re.compile('(\w){6}')
m = r.match(x)
m.groups()     # = ('f',) ?!?
I want to get ('a', 'b', 'c', 'd', 'e', 'f') , but because regex overwrites subsequent captures, I get ('f',) 我想得到('a', 'b', 'c', 'd', 'e', 'f') ,但因为正则表达式会覆盖后续的捕获,我得到('f',)

Is this how regex is supposed to behave? 这是正则表达式应该如何表现? Is there a way to do what I want without having to repeat the syntax six times? 有没有办法做我想要的,而不必重复六次语法?

Thanks in advance! 提前致谢!
Andrew 安德鲁

You can't use groups for this, I'm afraid. 我担心你不能使用群组。 Each group can match only once, I believe all regexes work this way. 每个组只能匹配一次,我相信所有的正则表达式都是这样的。 A possible solution is to try to use findall() or similar. 一种可能的解决方案是尝试使用findall()或类似方法。

r=re.compile(r'\w')
r.findall(x)
# 'a', 'b', 'c', 'd', 'e', 'f'

The regex module can do this. 正则表达式模块可以执行此操作。

> m = regex.match('(\w){6}', "abcdef")
> m.captures(1)
['a', 'b', 'c', 'd', 'e', 'f']

Also works with named captures: 也适用于命名捕获:

> m = regex.match('(?P<letter>)\w)', "abcdef")
> m.capturesdict()
{'letter': ['a', 'b', 'c', 'd', 'e', 'f']}

The regex module is expected to replace the 're' module - it is a drop-in replacement that acts identically, except it has many more features and capabilities. 预期正则表达式模块将取代're'模块 - 它是一个直接替换模块,除了具有更多的特性和功能外,它们的行为相同。

To find all matches in a given string use re.findall(regex, string) . 要查找给定字符串中的所有匹配项,请使用re.findall(regex,string) Also, if you want to obtain every letter here, your regex should be either '(\\w){1}' or just '(\\w)' . 此外,如果你想获得这里的每一个字母,你的正则表达式应该是'(\\w){1}'或者只是'(\\w)'

See: 看到:

r = re.compile('(\w)')
l = re.findall(r, x)

l == ['a', 'b', 'c', 'd', 'e', 'f']

I suppose your question is a simplified presentation of your need. 我想你的问题是对你的需求的简化表述。

Then, I take an exemple a little more complex: 然后,我举了一个更复杂的例子:

import re

pat = re.compile('[UI][bd][ae]')

ch = 'UbaUdeIbaIbeIdaIdeUdeUdaUdeUbeIda'

print [mat.group() for mat in pat.finditer(ch)]

result 结果

['Uba', 'Ude', 'Iba', 'Ibe', 'Ida', 'Ide', 'Ude', 'Uda', 'Ude', 'Ube', 'Ida']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM