简体   繁体   中英

Behaviour of Python non-greedy regular expression

I'm using python version 3.4.1 and I don't understand the result of the following regular expression:

import re
print(re.match("\[{E=(.*?),Q=(.*?)}\]","[{E=KT,Q=P1.p01},{E=KT2,Q=P2.p02}]").groups())
('KT', 'P1.p01},{E=KT2,Q=P2.p02')

I would expect the result to be

('KT', 'P1.p01')

but apparently the second .*? 'eats' all characters until '}]' at the end. I would expect to stop at the first '}" character.

If I leave out the '[' and ']' characters the behavior is as I expect:

print(re.match("{E=(.*?),Q=(.*?)}","{E=KT,Q=P1.p01},{E=KT2,Q=P2.p02}").groups())

('KT', 'P1.p01')

The \\] forces a square bracket to be present in the match - and there only is one at the end of the string. The regex engine has to other option to match. If you remove it or make it optional ( \\]? ), it stops at the closest } .

What you seem to want is everything between '{E=' and the next comma ',' , then everything between 'Q=' and the next closing brace '}' . One expression to do this would be:

{E=([^,]*),Q=([^}]*)}

Here eg [^,]* means "as many non-comma characters as possible" .

Example usage:

>>> import re
>>> re.findall("{E=([^,]*),Q=([^}]*)}", 
               "{E=KT,Q=P1.p01},{E=KT2,Q=P2.p02}")
[('KT', 'P1.p01'), ('KT2', 'P2.p02')]

You can see the full explanation in this regex101 demo .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM