Python正则表达式。匹配一次或两次模式

Question

I have a bunch of lines in a file with either one or two occurences of same pattern (id=): 我在文件中有一堆线，有一个或两个相同模式（id =）的出现：

Linetype1 : ...id=1234...id=4321...value=5678... # "..." means whatever
Linetype2 : ...id=7890...value=8765

I thought I could write such a regex to grep all my ids and associated values: 我以为可以编写这样的正则表达式来grep所有我的ID和相关值：

>>> l="...id=1234...id=4321...value=5678...\n...id=7890...value=8765\n"
>>> ret = re.findall('(id=[0-9]+).*?(id=[0-9]+)*.*?(value=[0-9]+)',l)
[('id=1234', '', 'value=5678'), ('id=7890', '', 'value=8765')]

I can't get the second "id=4321" part. 我无法获得第二个“ id = 4321”部分。 This looks very strange to me since I use the non-greedy .*? 这对我来说很奇怪，因为我使用了非贪婪的。*？ between first id=[0-9]+ and second. 在第一个id = [0-9] +和第二个之间。

Answer 1

The middle of your regex has 正则表达式的中间有

(id=[0-9]+)*

The empty string matches this, since it is under the Kleene star * . 空字符串与此匹配，因为它在Kleene星号* 。 So the regex engine proceeds through the string as follows: 因此，正则表达式引擎通过字符串进行如下操作：

find the first id=[0-9]+ group 找到第一个id=[0-9]+组
expand .*? 扩展.*? to the empty string, since it matches 到空字符串，因为它匹配
expand (id=[0-9]+) * to the empty string, since it matches 将(id=[0-9]+) *扩展为空字符串，因为它与
expand .*? 展开.*? to the rest of the string 到字符串的其余部分

If you replace the middle group's quantifier with + , or just remove it entirely, then it works. 如果您用+替换中间组的量词，或者将其完全删除，则它可以工作。

Python正则表达式。匹配一次或两次模式

问题描述

1 个解决方案

解决方案1
0 2015-07-06 15:32:57

Python正则表达式。 匹配一次或两次模式

问题描述

1 个解决方案

解决方案1 0 2015-07-06 15:32:57

Python正则表达式。匹配一次或两次模式

解决方案1
0 2015-07-06 15:32:57