简体   繁体   中英

re pattern to match all n-digit numbers in front of non-digit text

I want to construct a regular expression for this task with Python 3.7.5. The input texts are like following:

alkdj flajf
123 adlf ja;ld fj 999
423 234 2359 kalfji lkja;lkd999

my goal is retrieve all the numbers in leading positions, a space character after each number, and get a list like following

[]
[123]
[423, 234, 2359]

Any advice is appreciated!

import re

data = '''
alkdj flajf
123 adlf ja;ld fj 999
423 234 2359 kalfji lkja;lkd999
'''
pattern = re.compile(r'([0-9 ]+) \w.*?')

pattern.findall(data)

Outputs:

['123', '423 234 2359']

If you want to capture numbers separately, we could use the fancy \\G continue operator:

import regex as re
rgx = r"(?|^(\d+)|\G \K(\d+))"
test_str = ("alkdj flajf\n"
    "123 adlf ja;ld fj 999\n"
    "423 234 2359 kalfji lkja;lkd999")

matches = re.finditer(rgx, test_str, re.MULTILINE)
for match in matches:
    print(match.group(1))

Demo (the demo requires PCRE, this is why I import the alternative regex module)

I also use a Branch Reset (?|) and the \\K discard operator to make things work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM