简体   繁体   中英

How to Match mutilple line string using Python Regex?

I have a below 2 lines :

/begin MEASUREMENT ANYNAME1 "Unterstützungskraft Softwaremodul "

SWORD ANYNAME2 1 100 - Randomdigits1 Randomdigits2

and I want to match ANYNAME1 , ANYNAME2 , Randomdigits1 and Randomdigits2

So far I am able to match ANYNAME1 in first line using below regex :

_regex_struct = re.compile(r'/begin MEASUREMENT (.*)(.*)\n')

but i am not able to go to the second line. How to match the expression on second line??

I just make an assumption with your input. You may check the RegexDemo .

inputstr = '''/begin MEASUREMENT ANYNAME1 "Unterstützungskraft Softwaremodul "  
SWORD ANYNAME2 1 100 -2342342523 2432343535654
'''
_regex_struct = re.compile(r'/begin\s+MEASUREMENT\s+(?P<name1>[\w.]+)\W.*\nSWORD\s+(?P<name2>[\w.]+)\W.+\s+(?P<digit1>-\d.+|\d.+)\s+(?P<digit2>-\d.+|\d.+)')
_regex_struct.findall(inputstr)

Output:

[('ANYNAME1', 'ANYNAME2', '-2342342523', '2432343535654')]

Explanation of the expression:

\\s = any whitespace character

(?P<>) = to create a group of the expected output

\\w = any word character

\\W = any non-word character

\\d = any digit

+ = to express one or more

In [20]: s = '''/begin MEASUREMENT ANYNAME1 "Unterstützungskraft Softwaremodul "
    ...: SWORD ANYNAME2 1 100 -Randomdigits1 Randomdigits2'''

In [31]: re_struct = re.compile(r'/begin MEASUREMENT (\w+)[\s\S]*?SWORD (\w+).*?100 -(\w+) (\w+)')

In [32]: m = re_struct.search(s)

In [33]: m.group(1), m.group(2), m.group(3), m.group(4)
Out[33]: ('ANYNAME1', 'ANYNAME2', 'Randomdigits1', 'Randomdigits2')

You could match ANYNAME1 in a capturing group in the first line, then use .* to get to the end of the line and use \\n to match a new line to get to the second line. There you could match and capture your values using 3 groups.

/begin MEASUREMENT ([\w.]+).*\nSWORD ([\w.]+) \d+ \d+ (-?\d+(?:\.\d+)?) (-?\d+(?:\.\d+)?)

Regex demo | Python demo

Explanation

  • /begin MEASUREMENT Match literally followed by a space
  • ([\\w.]+).*\\n Capture 1+ word chars or a dot in group 1 and match until the end of the string. Then match a newline
  • SWORD ([\\w.]+) Match SWORD and capture in group 2 1+ times a word char or dot
  • \\d+ \\d+ Match space, 1+ digits, space, 1+ digits space
  • (-?\\d+(?:\\.\\d+)?) (-?\\d+(?:\\.\\d+)?) Capture in group 3 and 4 an optional minus sign, 1+ digits and an optional decimal part whith a space in between

For example:

import re

regex = r"/begin MEASUREMENT ([\w.]+).*\nSWORD ([\w.]+) \d+ \d+ (-?\d+(?:\.\d+)?) (-?\d+(?:\.\d+)?)"
test_str = ("/begin MEASUREMENT ANY.NAME1 \"Unterstützungskraft Softwaremodul \"\n"
    "SWORD ANYN.AME2 1 100 -2342342523 -14.29")
print(re.findall(regex, test_str))

# [('ANY.NAME1', 'ANYN.AME2', '-2342342523', '-14.29')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM