简体   繁体   English

为什么我的Python正则表达式无法捕获整数?

[英]Why does my regular expression in Python not capture integers?

I am using a regular expression to find sequences of numbers that aren't refixed with 0x . 我正在使用正则表达式查找未用0x的数字序列。

eg. 例如。

0
50505
20201
0012

My regular expression is (?:[^(0x)]\\d+) , which (in my head) translates to match any sequence of digits not started with 0x . 我的正则表达式是(?:[^(0x)]\\d+) ,该表达式(在我的脑海中)转换为匹配不是以0x任何数字序列。 But this doesn't work - what assumption am I incorrectly making? 但这不起作用-我错误地做出了什么假设?

[^(0x)] in your regular expression match any character that is not ( , 0 , x , ) . 正则表达式中的[^(0x)]匹配非(0x)任何字符。

Use negative lookbehind: 在后面使用负数:

>>> re.findall(r'(?<!0x)\d+\b', '0 0x111 50505 20201 0012')
['0', '11', '50505', '20201', '0012']

From http://docs.python.org/2/library/re.html 来自http://docs.python.org/2/library/re.html

(?<!...) (?<!...)

Matches if the current position in the string is not preceded by a match for .... This is called a negative lookbehind assertion. 如果字符串中的当前位置之前没有...的匹配项,则匹配。这称为否定性后向断言。 Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length. 类似于肯定的后置断言,所包含的模式必须仅匹配某个固定长度的字符串。 Patterns which start with negative lookbehind assertions may match at the beginning of the string being searched. 以否定的后向断言开头的模式可以在要搜索的字符串的开头匹配。


UPDATE UPDATE

Use following regular expression: 使用以下正则表达式:

>>> re.findall(r'\b\d+\b', '0 0x111 50505 20201 0012')
['0', '50505', '20201', '0012']

That looks like a non-capture group for anything that doesn't start with [0 or x]. 对于任何不以[0或x]开头的内容,它看起来像是一个非捕获组。

Did you try a negative look-behind? 您是否尝试过负面观察?

(r'(?<!0x)\d+')

(Took me a helluva long time to type this on the phone and now I see someone's already sorted it. Lesson learned: no regex on the phone.) (花了我很长时间在电话上键入此消息,现在我看到有人已经对其进行了排序。经验教训:电话上没有正则表达式。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM