简体   繁体   中英

Ignore specific caracter in a python regex match

I've been trying to extract some values from strings like these: '5 bucks' and also be able to get '5bucks' but ignore the word bucks when it comes alone without any number in front of it. I've been trying with this regex:

(\d*)(?:\s?)(?=bucks|dollars)

and testing on https://regex101.com/ . It's giving me two matches instead of one, using the very same string. Why is that? That's what im getting:

Match 1:

Full match: 5

Group 1: 5

Match 2:

Full match:

Group 1:

On the second match it appears to be both empty. Is there a way to prevent my regex on finding these len 0 matches? Or any way i could treat that?

You get those matches because you match optional digits \d* and an optional whitespace char \s? where the positive lookahead assertion it true as bucks or dollars is on the right.

To get both variations, you could use an alternation | with a non capturing group. To prevent the words being part of a larger word, you could add word boundaries \b

\b\d+ ?(?:bucks|dollars)\b

Regex demo

'(\d+)\s*(bucks|dollars)?'

And then pick the first item matched.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM