[英]Python Regular Expression. Matching once or twice pattern
I have a bunch of lines in a file with either one or two occurences of same pattern (id=): 我在文件中有一堆线,有一个或两个相同模式(id =)的出现:
Linetype1 : ...id=1234...id=4321...value=5678... # "..." means whatever
Linetype2 : ...id=7890...value=8765
I thought I could write such a regex to grep all my ids and associated values: 我以为可以编写这样的正则表达式来grep所有我的ID和相关值:
>>> l="...id=1234...id=4321...value=5678...\n...id=7890...value=8765\n"
>>> ret = re.findall('(id=[0-9]+).*?(id=[0-9]+)*.*?(value=[0-9]+)',l)
[('id=1234', '', 'value=5678'), ('id=7890', '', 'value=8765')]
I can't get the second "id=4321" part. 我无法获得第二个“ id = 4321”部分。 This looks very strange to me since I use the non-greedy .*?
这对我来说很奇怪,因为我使用了非贪婪的。*? between first id=[0-9]+ and second.
在第一个id = [0-9] +和第二个之间。
The middle of your regex has 正则表达式的中间有
(id=[0-9]+)*
The empty string matches this, since it is under the Kleene star *
. 空字符串与此匹配,因为它在Kleene星号
*
。 So the regex engine proceeds through the string as follows: 因此,正则表达式引擎通过字符串进行如下操作:
id=[0-9]+
group id=[0-9]+
组 .*?
.*?
to the empty string, since it matches (id=[0-9]+)
* to the empty string, since it matches (id=[0-9]+)
*扩展为空字符串,因为它与 .*?
.*?
to the rest of the string If you replace the middle group's quantifier with +
, or just remove it entirely, then it works. 如果您用
+
替换中间组的量词,或者将其完全删除,则它可以工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.