Python非贪婪正则表达式的行为

Question

I'm using python version 3.4.1 and I don't understand the result of the following regular expression: 我正在使用python版本3.4.1，但无法理解以下正则表达式的结果：

import re
print(re.match("\[{E=(.*?),Q=(.*?)}\]","[{E=KT,Q=P1.p01},{E=KT2,Q=P2.p02}]").groups())
('KT', 'P1.p01},{E=KT2,Q=P2.p02')

I would expect the result to be 我希望结果是

('KT', 'P1.p01')

but apparently the second .*? 但显然是第二个*。 'eats' all characters until '}]' at the end. “吃掉”所有字符，直到末尾的“}]”为止。 I would expect to stop at the first '}" character. 我希望在第一个'}“字符处停止。

If I leave out the '[' and ']' characters the behavior is as I expect: 如果我省略'['和']'字符，则行为符合我的预期：

print(re.match("{E=(.*?),Q=(.*?)}","{E=KT,Q=P1.p01},{E=KT2,Q=P2.p02}").groups())

('KT', 'P1.p01')

Answer 1

The \\] forces a square bracket to be present in the match - and there only is one at the end of the string. \\]强制在比赛中出现一个方括号-字符串的末尾只有一个。 The regex engine has to other option to match. 正则表达式引擎必须与其他选项匹配。 If you remove it or make it optional ( \\]? ), it stops at the closest } . 如果将其删除或使其成为可选项（ \\]?则它停在最接近的} 。

Answer 2

What you seem to want is everything between '{E=' and the next comma ',' , then everything between 'Q=' and the next closing brace '}' . 您似乎想要的是'{E='和下一个逗号','之间'Q='所有内容，然后是'Q='和下一个右括号'}' 。 One expression to do this would be: 一种表达方式是：

{E=([^,]*),Q=([^}]*)}

Here eg [^,]* means "as many non-comma characters as possible" . 在这里，例如[^,]*表示“尽可能多的非逗号字符” 。

Example usage: 用法示例：

>>> import re
>>> re.findall("{E=([^,]*),Q=([^}]*)}", 
               "{E=KT,Q=P1.p01},{E=KT2,Q=P2.p02}")
[('KT', 'P1.p01'), ('KT2', 'P2.p02')]

You can see the full explanation in this regex101 demo . 您可以在此regex101演示中看到完整的说明。

Python非贪婪正则表达式的行为

问题描述

2 个解决方案

解决方案1
4 2014-06-23 14:20:20

解决方案2
2 2014-06-23 14:30:57

Python非贪婪正则表达式的行为

问题描述

2 个解决方案

解决方案1 4 2014-06-23 14:20:20

解决方案2 2 2014-06-23 14:30:57

解决方案1
4 2014-06-23 14:20:20

解决方案2
2 2014-06-23 14:30:57