regex unicode multiline problem in Python

Question

I have some strings containing Unicode characters like bellow:

رده سنی مجاز : 
 10.2-15.3
 8.71-9.13
 25.08 - 31.2

زده های سنی غیرمجاز:
 16.5-18.4
 9.15 - 10.02
 20.02-21.30

I want to match the first number ranges like bellow:

10.2-15.3
8.71-9.13
25.08-31.2

and I'm using the following code:

print(re.findall('رده سنی مجاز :.*(.*\d+.\d+-\d+.\d+.*)', string, re.DOTALL))

but it returns:

['25.08-31.2']

Answer 1

I suggest extracting all strings after the fixed text till a blank line, and then split the extracted part into separate lines:

import re
 
p = r"رده سنی مجاز :\s*\n(.+(?:\n.+)*)"
text = "رده سنی مجاز : \n 10.2-15.3\n 8.71-9.13\n 25.08 - 31.2\n\nزده های سنی غیرمجاز:\n 16.5-18.4\n 9.15 - 10.02\n 20.02-21.30"
m = re.search(p, text)
if m:
    print([x.strip() for x in m.group(1).splitlines()])

# => ['10.2-15.3', '8.71-9.13', '25.08 - 31.2']

See the Python demo and the regex demo .

Details :

رده سنی مجاز: - a fixed string
\s* - zero or more whitespaces
\n - a newline
(.+(?:\n.+)*) - one or more non-empty lines captured into Group 1.

regex unicode multiline problem in Python

Question

1 answers

solution1
1 ACCPTED 2021-11-29 12:09:09

regex unicode multiline problem in Python

Question

1 answers

solution1 1 ACCPTED 2021-11-29 12:09:09

solution1
1 ACCPTED 2021-11-29 12:09:09