简体   繁体   中英

Regex pattern confusion

I am learning regex using Python and am a little confused by this tutorial I am following. Here is the example:

rand_str_2 = "doctor doctors doctor's"

# Match doctor doctors or doctor's
regex = re.compile("[doctor]+['s]*")
matches = re.findall(regex, rand_str_2)
print("Matches :", len(matches))

I get 3 matches

When I do the same thing but replace the * with a? I still get three matches

regex = re.compile("[doctor]+['s]?")

When I look into the documentation I see that the * finds 0 or more and? finds 0 or 1

My understanding of this is that it would not return "3 matches" because it is only looking for 0 or 1.

Can someone offer a better understanding of what I should expect out of these two Quantifiers?

Thank you

You are correct about the behavior of the two quantifiers. When using the *, the three matches are "doctor", "doctor", "doctor's". When using the?, the three matches are "doctor", "doctor" and "doctor'". With the * it tries to match the characters in the character class (' and s) 0 or more times. Thus, for the final match it is greedy and matches as many times as possible, matching both ' and s. However, the? will only match at most one character in the character class, so it matches to '.

The reason this happens is because of the grouping in that specific expression. The square brackets are telling whatever is reading the expression to "match any single character in this list". This means that it is looking for either a ' or a s to satisfy the expression.

Now you can see how the quantifier effects this. Doing ['s]? is telling the pattern to "match ' or s between 0 and 1 times, as many times as possible", so it matches the ' and stops right before the s .

Doing ['s]* on the other hand is telling it to "match ' or s between 0 and infinity, as many times as possible". In this case it will match both the ' and the s because they're both in the list of characters it's trying to match.

I hope this makes sense. If not, feel free to leave a comment and I'll try my best to clarify it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM