简体   繁体   English

忽略 python 正则表达式匹配中的特定字符

[英]Ignore specific caracter in a python regex match

I've been trying to extract some values from strings like these: '5 bucks' and also be able to get '5bucks' but ignore the word bucks when it comes alone without any number in front of it.我一直在尝试从这样的字符串中提取一些值:'5 bucks' 并且也能够得到 '5bucks' 但是当它单独出现时忽略单词 bucks 前面没有任何数字。 I've been trying with this regex:我一直在尝试使用这个正则表达式:

(\d*)(?:\s?)(?=bucks|dollars)

and testing on https://regex101.com/ .并在https://regex101.com/上进行测试。 It's giving me two matches instead of one, using the very same string.它给了我两个匹配而不是一个,使用相同的字符串。 Why is that?这是为什么? That's what im getting:这就是我得到的:

Match 1:第一场比赛:

Full match: 5全场比赛:5

Group 1: 5第 1 组:5

Match 2:比赛2:

Full match:全场比赛:

Group 1:第一组:

On the second match it appears to be both empty.在第二场比赛中,它似乎都是空的。 Is there a way to prevent my regex on finding these len 0 matches?有没有办法阻止我的正则表达式找到这些 len 0 匹配项? Or any way i could treat that?或者我可以用什么方法治疗它?

You get those matches because you match optional digits \d* and an optional whitespace char \s?你得到这些匹配是因为你匹配可选数字\d*和可选的空白字符\s? where the positive lookahead assertion it true as bucks or dollars is on the right.正确的前瞻断言是正确的美元或美元。

To get both variations, you could use an alternation |要获得这两种变体,您可以使用交替| with a non capturing group.与非捕获组。 To prevent the words being part of a larger word, you could add word boundaries \b为了防止单词成为更大单词的一部分,您可以添加单词边界\b

\b\d+ ?(?:bucks|dollars)\b

Regex demo正则表达式演示

'(\d+)\s*(bucks|dollars)?'

And then pick the first item matched.然后选择第一个匹配的项目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM