简体   繁体   English

如何使用Python正则表达式重复n次带括号的组?

[英]How do I repeat a parenthesized group n times using a Python regular expression?

I'm trying to get pairs of |<digit><whitespace> out of a string with many of them. 我正在尝试从许多字符串中获取|<digit><whitespace>对。 I'm using the regex (\\|\\d+\\s+){2} to do this, ie: 我正在使用正则表达式(\\|\\d+\\s+){2}执行此操作,即:

>>> import re
>>> s = '|11 |22    |\n|33  |444 |\n'
>>> re.findall('(\|\d+\s+){2}', s)
['|22    ', '|444 ']

I expected instead is: 我期望的是:

['|11 |22    |', '|33  |444 |']

because () should define a group and {2} should repeat it twice. 因为()应该定义一个组,而{2}应该重复两次。 Why doesn't it do that, and what's a better way of doing it? 它为什么不这样做,还有什么更好的方法呢?

Turn the capturing group to non-capturing group and add a \\| 将捕获组更改为非捕获组并添加\\| at the last in your regex. 在您的正则表达式的最后。 Because re.findall will return the captured characters if there any capturing group else it would return all the matched characters. 因为如果有任何捕获组,则re.findall将返回捕获的字符,否则它将返回所有匹配的字符。 since your regex contain one greedy capturing group, it captures only the last repeat but matches all the previous repeats. 由于您的正则表达式包含一个贪婪的捕获组,因此它仅捕获最后一个重复,但匹配所有先前的重复。

>>> s = '|11 |22    |\n|33  |444 |\n'
>>> re.findall('(?:\|\d+\s+){2}\|', s)
['|11 |22    |', '|33  |444 |']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM