简体   繁体   English

Python如何匹配正则表达式模式

[英]Python how to match regular expression pattern

I want to parse a log and find the below line using regex pattern, eg我想解析日志并使用正则表达式模式找到以下行,例如

r"(  *)"C-R-A-R(  *).( *)"

which doesn't work, how to write this regex pattern?哪个不起作用,如何编写这个正则表达式模式? The key is to find CRAR and then some numbers(should be substring) separated by spaces.关键是找到CRAR ,然后是一些用空格分隔的数字(应该是子字符串)。 Please note the spaces between each are several spaces, not only one space.请注意每个之间的空格是几个空格,而不是一个空格。

[0]:      C-R-A-R              4                 1              85.4        86.1        90.8        76.1        92.3          0.000       0.000" 

If we consider this test data:如果我们考虑这个测试数据:

text = """C-R-A-R              4                 1              85.4        86.1        90.8        76.1        92.3          0.000       0.000
B-D-D-D 0                    0  1 1 2
"""

You want to extract the first line, but not the second because it doesn't start with CRAR (have I understood correctly?)您想提取第一行,而不是第二行,因为它不是以 CRAR 开头的(我理解正确吗?)

Try this regular expression试试这个正则表达式

import re

pattern = re.compile(r'( *)(C-R-A-R)(?P<digits>[ \d\.]+)')

Apply the pattern on each line:在每一行上应用模式:

matches = [pattern.search(line) for line in text.split('\n')]

Keep only lines that have matched:只保留匹配的行:

matched_lines = [m for m in matches if m is not None]

You get:你得到:

print(matched_lines)
>>> [<re.Match object; span=(0, 133), match='C-R-A-R              4                 1         >]

You can then extract the number part of the string for processing if needed, using the group name digits (defined with the syntax ?P<digits> )然后,如果需要,您可以提取字符串的数字部分进行处理,使用组名digits (使用语法定义?P<digits>

digits = matched_lines[0].group('digits').strip()

print(digits)

>>> '4                 1              85.4        86.1        90.8        76.1        92.3          0.000       0.000'



声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM