简体   繁体   English

Python多行正则表达式忽略字符串中的n行

[英]Python multiline regex ignore n lines in string

I have a problem with writing correct regex. 我在编写正确的正则表达式时遇到问题。 Maybe someone can help me? 也许有人可以帮我吗?

I have output from two network devices: 我有两个网络设备的输出:

1 1

VRF NAME1 (VRF Id = 2); default RD 9200:1; default VPNID <not set>
Old CLI format, supports IPv4 only
Flags: 0xC
Interfaces:
Gi1/1/1                 Gi1/1/4

2 2

VRF NAME2 (VRF Id = 2); default RD 101:2; default VPNID <not set>
Interfaces:
Gi0/0/3                  Gi0/0/4                  Gi0/1/4

I need extract interface name from both. 我需要从两者中提取接口名称。

I have regex: 我有正则表达式:

 rx = re.compile("""
              VRF\s(.+?)\s\(.*RD\s(.*);.*[\n\r]
              ^.*$[\n\r]
              ^.*$[\n\r]
              ^.*$[\n\r]
              (^.*)
              """,re.MULTILINE|re.VERBOSE)

But it is only works for first text, it skips 4 lines and 5 line is exactly what I need. 但这仅适用于第一个文本,它跳过了4行,而5行正是我所需要的。 However there are many routers that returning output like 2. The question is how ignore unknown amount of line and for example find line with Interfaces word and extract next line after "Interfaces:" 但是,有许多路由器会返回类似2的输出。问题是如何忽略未知的行数,例如查找带有接口字的行并提取“接口:”之后的下一行。

Positive lookbehind 正向后看

(?<=...) Ensures that the given pattern will match, ending at the current position in the expression. (?<= ...)确保给定的模式匹配,在表达式的当前位置结束。 The pattern must have a fixed width. 图案必须具有固定的宽度。 Does not consume any characters. 不消耗任何字符。

From https://regex101.com/ https://regex101.com/

The regex (?<=Interfaces:\\n).+ matches the whole line after each line "Interfaces:" 正则表达式(?<=Interfaces:\\n).+匹配每行“ Interfaces:”之后的整行

I tested it on regex101.com and it perfectly worked with both of your examples. 我在regex101.com上进行了测试,它可以完美地与您的两个示例一起使用。

EDIT: after providing us with more input, the answer is corrected. 编辑:在为我们提供更多输入后,答案将得到纠正。

There are many ways to solve this. 有很多解决方法。 Look at regex101 . 看看regex101 The regex 正则表达式

(?s)VRF\s([^\s]+)\s.*?(?:RD\s([\d.]+:\d|<not\sset>));.*?Interfaces:(?:\r*\n)\s*(.*?)(?:\r*\n)

read in a complete record and captures the Name, RD value and line following Interfaces . 读入完整的记录并捕获Names,RD值和Interfaces行。

Explanation: 说明:

(?s)                           # single line mode: make "." read anything,
                               # including line breaks
VRF                            # every records start with VRF
\s                             # read " "
([^\s]+)                       # group 1: capture NAME VRF
\s                             # read " "
.*?                            # lazy read anything
(?:                            # start non-capture group
 RD\s                          # read "RD "
(                              # group 2
  [\d.]+:\d                    # number or ip, followed by ":" and a digit
  |                            # OR
  <not\sset>                   # value "<not set>"
)                              # group 2 end
)                              # non-caputure group end
;                              # read ";"
.*?                            # lazy read anything
Interfaces:                    # read "Interfaces:"
(?:\r*\n)                      # read newline
\s*                            # read spaces
(.*?)                          # group 3: read line after "Interfaces:"
(?:\r*\n)                      # read newline

Let's look at a test script. 让我们看一个测试脚本。 I've cut down on the length of the records in the script a bit, but the message still stands. 我已经减少了脚本中记录的长度,但是消息仍然存在。

$ cat test.py
import os
import re

pattern = r"(?s)VRF\s([^\s]+)\s.*?(?:RD\s([\d.]+:\d|<not\sset>));.*?Interfaces:(?:\r*\n)\s*(.*?)(?:\r*\n)"

text = '''\
VRF BLA1 (VRF Id = 2); default RD 9200:1; default VPNID <not set>
Old CLI format, supports IPv4 only
Flags: 0xC
Interfaces:
  Gi1/1/1.451              Gi1/1/4.2019
Address family ipv4 unicast (Table ID = 0x2):
  VRF label allocation mode: per-prefix
Address family ipv6 unicast not active
Address family ipv4 multicast not active

VRF BLA2 (VRF Id = 1); default RD <not set>; default VPNID <not set>
New CLI format, supports multiple address-families
Flags: 0x1808
Interfaces:
  Gi0
Address family ipv4 unicast (Table ID = 0x1):
  Flags: 0x0
Address family ipv6 unicast (Table ID = 0x1E000001):
  Flags: 0x0
Address family ipv4 multicast not active\
'''

for rec in text.split( os.linesep + os.linesep):
    m = re.match(pattern, rec)
    if m:
        print("%s\tRD: %s\tInterfaces: %s" % (m.group(1), m.group(2), m.group(3)))

which results in: 结果是:

$ python test.py
BLA1    RD: 9200:1  Interfaces: Gi1/1/1.451              Gi1/1/4.2019
BLA2    RD: <not set>   Interfaces: Gi0

There are multiple options, but the one that is closest to your initial attempt uses optional uncaptured lines: 有多种选择,但是最接近您的初次尝试的是使用可选的未捕获行:

rx = re.compile("""
VRF\s(.+?)\s\(.*RD\s(.*);.*[\n\r]
(?:^.*$[\n\r])?
(?:^.*$[\n\r])?
Interfaces:[\n\r]
(.*)""",re.MULTILINE|re.VERBOSE)

However, the first line also looks strange to me and does not compile (missing closing brace), but the (?:^.*$[\\n\\r])? 但是,第一行对我来说也很奇怪,并且不会编译(缺少右括号),而是(?:^.*$[\\n\\r])? work in your application. 在您的应用程序中工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM