I'm using python version 3.4.1 and I don't understand the result of the following regular expression:
import re
print(re.match("\[{E=(.*?),Q=(.*?)}\]","[{E=KT,Q=P1.p01},{E=KT2,Q=P2.p02}]").groups())
('KT', 'P1.p01},{E=KT2,Q=P2.p02')
I would expect the result to be
('KT', 'P1.p01')
but apparently the second .*? 'eats' all characters until '}]' at the end. I would expect to stop at the first '}" character.
If I leave out the '[' and ']' characters the behavior is as I expect:
print(re.match("{E=(.*?),Q=(.*?)}","{E=KT,Q=P1.p01},{E=KT2,Q=P2.p02}").groups())
('KT', 'P1.p01')
The \\]
forces a square bracket to be present in the match - and there only is one at the end of the string. The regex engine has to other option to match. If you remove it or make it optional ( \\]?
), it stops at the closest }
.
What you seem to want is everything between '{E='
and the next comma ','
, then everything between 'Q='
and the next closing brace '}'
. One expression to do this would be:
{E=([^,]*),Q=([^}]*)}
Here eg [^,]*
means "as many non-comma characters as possible" .
Example usage:
>>> import re
>>> re.findall("{E=([^,]*),Q=([^}]*)}",
"{E=KT,Q=P1.p01},{E=KT2,Q=P2.p02}")
[('KT', 'P1.p01'), ('KT2', 'P2.p02')]
You can see the full explanation in this regex101 demo .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.