简体   繁体   中英

How to match this regular expression in python?

I have the following string s = "~ VERSION 11 11 11.1 222 22 22.222"

I Want to extract the following into the following variables:

string Variable1 = "11 11 11.1"
string Variable2 = "222 22 22.222"

How do I extract this with regular expression? Or is there a better alternative way? (note, There may be variable spacing in between the the tokens I want to extract and the leading character may be something other than a ~, but it will definitely be a symbol:

eg could be:

~   VERSION   11 11 11.1  222 22 22.222
$   VERSION 11 11 11.1      222 22 22.222
@      VERSION    11 11 11.1          222 22 22.222

If regular expression does not make sense for this or if there is a better way, please recommend. How do I preform the extraction into those two variables in python?

Try this:

import re

test_lines = """
~   VERSION   11 11 11.1  222 22 22.222
$   VERSION 11 11 11.1      222 22 22.222
@      VERSION    11 11 11.1          222 22 22.222
"""

version_pattern = re.compile(r"""
[~!@#$%^&*()]               # Starting symbol
\s+                         # Some amount of whitespace
VERSION                     # the specific word "VERSION"
\s+                         # Some amount of whitespace
(\d+\s+\d+\s+\d+\.\d+)      # First capture group
\s+                         # Some amount of whitespace
(\d+\s+\d+\s+\d+\.\d+)      # Second capture group
""", re.VERBOSE)

lines = test_lines.split('\n')

for line in lines:
    m = re.match(version_pattern, line)
    if (m):
        print (line)
        print (m.groups())

which gives output:

~   VERSION   11 11 11.1  222 22 22.222
('11 11 11.1', '222 22 22.222')
$   VERSION 11 11 11.1      222 22 22.222
('11 11 11.1', '222 22 22.222')
@      VERSION    11 11 11.1          222 22 22.222
('11 11 11.1', '222 22 22.222')

Note the use of verbose regular expressions with comments.

To convert the extracted version numbers to their numeric representation (ie int, float) use the regexp in @Preet Kukreti's answer and convert using int() or float() as suggested.

You can use split method of String.

v1 = "~ VERSION 11 11 11.1 222 22 22.222"
res_arr = v1.split(' ') # get ['~', 'VERSION', '11', '11', '11.1', '222', '22', '22.222']

and then use elements 2-4 and 5-7 as you want.

import re
pattern_string = r"(\d+)\s+(\d+)\s+([\d\.]+)" #is the regex you are probably after
m = re.match(pattern_string, "222 22 22.222")
groups = None
if m:
    groups = m.groups()
    # groups is ('222', '22', '22.222')

after which you could use int() and float() to convert to primitive numeric types if needed. For performant code you might want to precompile the regex beforehand with re.compile(...) , and calling match(...) or search(...) on the resulting precompiled regex object

It is definitely easy with regular expression. Here would be one way to do it

>>> st="~ VERSION 11 11 11.1 222 22 22.222 333 33 33.3333"
>>> re.findall(r"(\d+[ ]+\d+[ ]+\d+\.\d+)",st)
['11 11 11.1', '222 22 22.222', '333 33 33.3333']

Once you get the result(s) in a list you can index and get the individual strings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM