简体   繁体   中英

How to use {} in regex pattern with findall + Python

I'm creating a regex as below:

import re
asd = re.compile(r"(blah){2}")
mo = asd.search("blahblahblahblahblahblah ll2l 21HeHeHeHeHeHe lllo")
mo1 = asd.findall("blahblahblahblahblahblah")
print(mo.group())
print("findall output: ", mo1)

This returns output blahblah findall output: ['blah', 'blah', 'blah']

-Why findall output matches 'blah' three times, when its specified {2} times only in the pattern?

If I change to {4}, then findall matches:

asd = re.compile(r"(blah){4}")
findall output:  ['blah']

-How is {m} treated with re.search and re.findall ?

Thanks a lot.

If you want to catch the (blah){2} (the 2 blah you have there) you should wrap it:

asd = re.compile(r"((?:blah){2})")

Note that I made sure not to catch the inside blah (using ?: )

>>>asd = re.compile(r"((?:blah){2})")
>>>mo = asd.search("blahblahblahblahblahblah ll2l 21HeHeHeHeHeHe lllo")
>>>mo1 = asd.findall("blahblahblahblahblahblah")
>>>print(mo.group())
blahblah
>>>print("findall output: ", mo1)
findall output:  ['blahblah', 'blahblah', 'blahblah']

Exactly the same goes with the {4} you have there. The regex will find it, but will not catch it. if you want to catch it you should wrap it.

(blah){2} captures and exhausts the string blahblah but only returns the last blah in blahblah . Since you have three blahblah s in your string, it will output ['blah', 'blah', 'blah']

(blah){4} can only match once so it gives you ['blah']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM