简体   繁体   中英

regex unicode multiline problem in Python

I have some strings containing Unicode characters like bellow:

رده سنی مجاز : 
 10.2-15.3
 8.71-9.13
 25.08 - 31.2

زده های سنی غیرمجاز:
 16.5-18.4
 9.15 - 10.02
 20.02-21.30

I want to match the first number ranges like bellow:

10.2-15.3
8.71-9.13
25.08-31.2

and I'm using the following code:

print(re.findall('رده سنی مجاز :.*(.*\d+.\d+-\d+.\d+.*)', string, re.DOTALL))

but it returns:

['25.08-31.2']

I suggest extracting all strings after the fixed text till a blank line, and then split the extracted part into separate lines:

import re
 
p = r"رده سنی مجاز :\s*\n(.+(?:\n.+)*)"
text = "رده سنی مجاز : \n 10.2-15.3\n 8.71-9.13\n 25.08 - 31.2\n\nزده های سنی غیرمجاز:\n 16.5-18.4\n 9.15 - 10.02\n 20.02-21.30"
m = re.search(p, text)
if m:
    print([x.strip() for x in m.group(1).splitlines()])

# => ['10.2-15.3', '8.71-9.13', '25.08 - 31.2']

See the Python demo and the regex demo .

Details :

  • رده سنی مجاز: - a fixed string
  • \s* - zero or more whitespaces
  • \n - a newline
  • (.+(?:\n.+)*) - one or more non-empty lines captured into Group 1.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM