简体   繁体   English

如何在python中匹配此正则表达式?

[英]How to match this regular expression in python?

I have the following string s = "~ VERSION 11 11 11.1 222 22 22.222" 我有以下字符串s =“〜版本11 11 11.1 222 22 22.222”

I Want to extract the following into the following variables: 我想将以下内容提取到以下变量中:

string Variable1 = "11 11 11.1"
string Variable2 = "222 22 22.222"

How do I extract this with regular expression? 如何使用正则表达式提取此内容? Or is there a better alternative way? 还是有更好的替代方法? (note, There may be variable spacing in between the the tokens I want to extract and the leading character may be something other than a ~, but it will definitely be a symbol: (请注意,我要提取的令牌之间可能会有可变的间距,并且前导字符可能不是〜,但肯定是一个符号:

eg could be: 例如可能是:

~   VERSION   11 11 11.1  222 22 22.222
$   VERSION 11 11 11.1      222 22 22.222
@      VERSION    11 11 11.1          222 22 22.222

If regular expression does not make sense for this or if there is a better way, please recommend. 如果正则表达式对此没有意义,或者有更好的方法,请推荐。 How do I preform the extraction into those two variables in python? 如何在python中将提取预执行为这两个变量?

Try this: 尝试这个:

import re

test_lines = """
~   VERSION   11 11 11.1  222 22 22.222
$   VERSION 11 11 11.1      222 22 22.222
@      VERSION    11 11 11.1          222 22 22.222
"""

version_pattern = re.compile(r"""
[~!@#$%^&*()]               # Starting symbol
\s+                         # Some amount of whitespace
VERSION                     # the specific word "VERSION"
\s+                         # Some amount of whitespace
(\d+\s+\d+\s+\d+\.\d+)      # First capture group
\s+                         # Some amount of whitespace
(\d+\s+\d+\s+\d+\.\d+)      # Second capture group
""", re.VERBOSE)

lines = test_lines.split('\n')

for line in lines:
    m = re.match(version_pattern, line)
    if (m):
        print (line)
        print (m.groups())

which gives output: 给出输出:

~   VERSION   11 11 11.1  222 22 22.222
('11 11 11.1', '222 22 22.222')
$   VERSION 11 11 11.1      222 22 22.222
('11 11 11.1', '222 22 22.222')
@      VERSION    11 11 11.1          222 22 22.222
('11 11 11.1', '222 22 22.222')

Note the use of verbose regular expressions with comments. 请注意使用带注释的详细正则表达式。

To convert the extracted version numbers to their numeric representation (ie int, float) use the regexp in @Preet Kukreti's answer and convert using int() or float() as suggested. 要将提取的版本号转换为其数字表示形式(即,int,float),请使用@Preet Kukreti的答案中的regexp,并根据建议使用int()float()转换。

You can use split method of String. 您可以使用String的split方法。

v1 = "~ VERSION 11 11 11.1 222 22 22.222"
res_arr = v1.split(' ') # get ['~', 'VERSION', '11', '11', '11.1', '222', '22', '22.222']

and then use elements 2-4 and 5-7 as you want. 然后根据需要使用元素2-4和5-7。

import re
pattern_string = r"(\d+)\s+(\d+)\s+([\d\.]+)" #is the regex you are probably after
m = re.match(pattern_string, "222 22 22.222")
groups = None
if m:
    groups = m.groups()
    # groups is ('222', '22', '22.222')

after which you could use int() and float() to convert to primitive numeric types if needed. 之后,可以根据需要使用int()float()转换为原始数字类型。 For performant code you might want to precompile the regex beforehand with re.compile(...) , and calling match(...) or search(...) on the resulting precompiled regex object 对于高性能代码,您可能需要预先使用re.compile(...)预编译正则表达式,然后在生成的预编译正则表达式对象上调用match(...)search(...)

It is definitely easy with regular expression. 使用正则表达式绝对容易。 Here would be one way to do it 这将是一种方法

>>> st="~ VERSION 11 11 11.1 222 22 22.222 333 33 33.3333"
>>> re.findall(r"(\d+[ ]+\d+[ ]+\d+\.\d+)",st)
['11 11 11.1', '222 22 22.222', '333 33 33.3333']

Once you get the result(s) in a list you can index and get the individual strings. 一旦在列表中获得结果,就可以索引并获取各个字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM