Python | Regex | get numbers from the text

Question

I have text of the form

Refer to Annex 1.1, 1.2 and 2.0 containing information etc,

or

Refer to Annex 1.0.1, 1.1.1 containing information etc,

I need to extract the numbers that the Annex is referring to. I have tried lookbehind regex as below.

m = re.search("(?<=Annex)\s*[\d+.\d+,]+", text)

print(m)
>>> <re.Match object; span=(11, 15), match=' 1.1'>

I get output as just 1.1, but I don't get remaining. How do I get all the numbers followed by keyword Annex ?

Answer 1

You can use the following two-step solution:

import re
texts = ['Refer to Annex 1.1, 1.2 and 2.0 containing information etc,', 'Refer to Annex 1.0.1, 1.1.1 containing information etc,']
rx = re.compile(r'Annex\s*(\d+(?:(?:\W|and)+\d)*)')
for text in texts:
    match = rx.search(text)
    if match:
        print(re.findall(r'\d+(?:\.\d+)*', match.group(1)) )

See the Python and the regex demo , the output is

['1.1', '1.2', '2.0']
['1.0.1', '1.1.1']

The Annex\s*(\d+(?:(?:\W|and)+\d)*) regex matches

Annex - the string Annex
\s* - zero or more whitespaces
(\d+(?:(?:\W|and)+\d)*) - Group 1: one or more digits and then zero or more occurrences of a non-word char or and string and then a digit.

Then, when the match is found, all dot-separated digit sequences are extracted with \d+(?:\.\d+)* .

Python | Regex | get numbers from the text

Question

1 answers

solution1
1 ACCPTED 2022-07-05 21:01:54

Python | Regex | get numbers from the text

Question

1 answers

solution1 1 ACCPTED 2022-07-05 21:01:54

solution1
1 ACCPTED 2022-07-05 21:01:54