简体   繁体   English

Python正则表达式。 匹配一次或两次模式

[英]Python Regular Expression. Matching once or twice pattern

I have a bunch of lines in a file with either one or two occurences of same pattern (id=): 我在文件中有一堆线,有一个或两个相同模式(id =)的出现:

Linetype1 : ...id=1234...id=4321...value=5678... # "..." means whatever
Linetype2 : ...id=7890...value=8765

I thought I could write such a regex to grep all my ids and associated values: 我以为可以编写这样的正则表达式来grep所有我的ID和相关值:

>>> l="...id=1234...id=4321...value=5678...\n...id=7890...value=8765\n"
>>> ret = re.findall('(id=[0-9]+).*?(id=[0-9]+)*.*?(value=[0-9]+)',l)
[('id=1234', '', 'value=5678'), ('id=7890', '', 'value=8765')]

I can't get the second "id=4321" part. 我无法获得第二个“ id = 4321”部分。 This looks very strange to me since I use the non-greedy .*? 这对我来说很奇怪,因为我使用了非贪婪的。*? between first id=[0-9]+ and second. 在第一个id = [0-9] +和第二个之间。

The middle of your regex has 正则表达式的中间有

(id=[0-9]+)*

The empty string matches this, since it is under the Kleene star * . 空字符串与此匹配,因为它在Kleene星号* So the regex engine proceeds through the string as follows: 因此,正则表达式引擎通过字符串进行如下操作:

  • find the first id=[0-9]+ group 找到第一个id=[0-9]+
  • expand .*? 扩展.*? to the empty string, since it matches 到空字符串,因为它匹配
  • expand (id=[0-9]+) * to the empty string, since it matches (id=[0-9]+) *扩展为空字符串,因为它与
  • expand .*? 展开.*? to the rest of the string 到字符串的其余部分

If you replace the middle group's quantifier with + , or just remove it entirely, then it works. 如果您用+替换中间组的量词,或者将其完全删除,则它可以工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM