简体   繁体   中英

Why does my regular expression in Python not capture integers?

I am using a regular expression to find sequences of numbers that aren't refixed with 0x .

eg.

0
50505
20201
0012

My regular expression is (?:[^(0x)]\\d+) , which (in my head) translates to match any sequence of digits not started with 0x . But this doesn't work - what assumption am I incorrectly making?

[^(0x)] in your regular expression match any character that is not ( , 0 , x , ) .

Use negative lookbehind:

>>> re.findall(r'(?<!0x)\d+\b', '0 0x111 50505 20201 0012')
['0', '11', '50505', '20201', '0012']

From http://docs.python.org/2/library/re.html

(?<!...)

Matches if the current position in the string is not preceded by a match for .... This is called a negative lookbehind assertion. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length. Patterns which start with negative lookbehind assertions may match at the beginning of the string being searched.


UPDATE

Use following regular expression:

>>> re.findall(r'\b\d+\b', '0 0x111 50505 20201 0012')
['0', '50505', '20201', '0012']

That looks like a non-capture group for anything that doesn't start with [0 or x].

Did you try a negative look-behind?

(r'(?<!0x)\d+')

(Took me a helluva long time to type this on the phone and now I see someone's already sorted it. Lesson learned: no regex on the phone.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM