Behaviour of Python non-greedy regular expression

Question

I'm using python version 3.4.1 and I don't understand the result of the following regular expression:

import re
print(re.match("\[{E=(.*?),Q=(.*?)}\]","[{E=KT,Q=P1.p01},{E=KT2,Q=P2.p02}]").groups())
('KT', 'P1.p01},{E=KT2,Q=P2.p02')

I would expect the result to be

('KT', 'P1.p01')

but apparently the second .*? 'eats' all characters until '}]' at the end. I would expect to stop at the first '}" character.

If I leave out the '[' and ']' characters the behavior is as I expect:

print(re.match("{E=(.*?),Q=(.*?)}","{E=KT,Q=P1.p01},{E=KT2,Q=P2.p02}").groups())

('KT', 'P1.p01')

Answer 1

The \\] forces a square bracket to be present in the match - and there only is one at the end of the string. The regex engine has to other option to match. If you remove it or make it optional ( \\]? ), it stops at the closest } .

Answer 2

What you seem to want is everything between '{E=' and the next comma ',' , then everything between 'Q=' and the next closing brace '}' . One expression to do this would be:

{E=([^,]*),Q=([^}]*)}

Here eg [^,]* means "as many non-comma characters as possible" .

Example usage:

>>> import re
>>> re.findall("{E=([^,]*),Q=([^}]*)}", 
               "{E=KT,Q=P1.p01},{E=KT2,Q=P2.p02}")
[('KT', 'P1.p01'), ('KT2', 'P2.p02')]

You can see the full explanation in this regex101 demo .

Behaviour of Python non-greedy regular expression

Question

2 answers

solution1
4 2014-06-23 14:20:20

solution2
2 2014-06-23 14:30:57

Behaviour of Python non-greedy regular expression

Question

2 answers

solution1 4 2014-06-23 14:20:20

solution2 2 2014-06-23 14:30:57

solution1
4 2014-06-23 14:20:20

solution2
2 2014-06-23 14:30:57