简体   繁体   中英

Why can't I match the last part of my regular expression in python?

I want to match a sentence with an optional end 'other (\\\\w+)' . For example, the regular expression should match both sentence as follows and extract the word 'things':

  • The apple and other things.
  • The apple is big.

I wrote a regular expression as below. However, I got a result (None,) . If I remove the last ? . I will get the right answer. Why?

>>> re.search('\w+(?: other (\\w+))?', 'A and other things').groups()
(None,)
>>> re.search('\w+(?: other (\\w+))', 'A and other things').groups()
('things',)

If you use:

re.search(r'\w+(?: other (\w+))?', 'A and other things').group()

You will see what is happening. Since anything after \\w+ is optional your search matches first word A .

As per official documentation :

.groups()

Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern.

And your search call doesn't return any subgroup hence you get:

re.search(r'\w+(?: other (\w+))?', 'A and other things').groups()
(None,)

To solve your problem you can use this alternation based regex:

r'\w+(?: other (\w+)|$)'

Examples:

>>> re.search(r'\w+(?: other (\w+)|$)', 'A and other things').group()
'and'
>>> re.search(r'\w+(?: other (\w+)|$)', 'The apple is big').group()
'big'

The rule for regular expression searches is that they produce the leftmost longest match. Yes, it tries to give you longer matches if possible, but most importantly, when it finds the first successful match, it will stop looking further.

In the first regular expression, the leftmost point where \\w+ matches is A . The optional portion doesn't match there, so it's done.

In the second regular expression, the parenthesized expression is mandatory, so A is not a match. Therefore, it continues looking. The \\w+ matches and , then the second \\\\w+ matches things .


Note that for regular expressions in Python, especially those containing backslashes, it's a good idea to write them using r'raw strings' .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM