简体   繁体   中英

Python REGEX How to extract particular numbers from variable

I have the following problem:

var a = ' 15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123

I'd like a regex to extract only the numbers: 15159970, 15615115, 11224455, 55441123

What a have so far:

re.findall(r'(\d+\s)\(', a)

which only extracts the first 2 numbers: 15159970, 15615115

Having also a second var b = 15159970, 15615115, 11224455, 55441126 I would like to compare the 2 vars and if they differ then a print("vars are different!")

Thanks!

You may extract all chunks of digits not preceded with a digit or digit + dot and not followed with a dot + digit or a digit :

(?<!\d)(?<!\d\.)\d+(?!\.?\d)

See the regex demo

Details

  • (?<!\\d) - a negative lookbehind that fails a location immediately preceded with a digit
  • (?<!\\d\\.) - a negative lookbehind that fails a location immediately preceded with a digit and a dot
  • \\d+ - 1+ digits
  • (?!\\.?\\d) - a negative lookahead that fails a location immediately followed with a digit or a dot + a digit.

Python demo :

import re
a = ' 15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123 '
print( re.findall(r'(?<!\d)(?<!\d\.)\d+(?!\.?\d)', a) )
# => ['15159970', '15615115', '11224455', '55441123']

Another solution: only extract the digit chunks outside of parentheses .

See this Python demo :

import re
text = "15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123 (28.11.2014 12:43:14)"
print( list(filter(None, re.findall(r'\([^()]+\)|(\d+)', text))) )
# => ['15159970', '15615115', '11224455', '55441123']

Here, \\([^()]+\\)|(\\d+) matches

  • \\([^()]+\\) - ( , any 1+ chars other than ( and ) and then )
  • | - or
  • (\\d+) - matches and captures into Group 1 one or more digits ( re.findall only includes captured substrings if there is a capturing group in the pattern).

Empty items appear in the result when the non-parenthesized match occurs, thus, we need to remove them (either with list(filter(None, results)) or with [x for x in results if x] ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM