简体   繁体   English

python-正则表达式将重复模式放在单个组中

[英]python - regex putting repeating patterns in a single group

I'm trying to parse a string in regex and am 99% there. 我正在尝试解析正则表达式中的字符串,并且在那里达到99%。

my test string is 我的测试字符串是

 1
  1234 1111 5555 88945
    172.255.255.255 from 172.255.255.255 (1.1.1.1)
      Origin IGP, localpref 300, valid, external, best
      rx pathid: 0, tx pathid: 0x0

my current regex pattern is: 我当前的正则表达式模式是:

(?P<as_path>(\d{4,10}\s){1,20})\s+(?P<peer_addr>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}).*\((?P<peer_rid>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})\)\s+.*localpref\s(?P<local_pref>\d+),\s(?P<attribs>\S+,\s{0,4})

im using regex101 to test and have a link to the test here https://regex101.com/r/iGM8ye/1 我正在使用regex101进行测试,并在此处链接到测试https://regex101.com/r/iGM8ye/1

So currently i have a group2 I don't want this group, could someone tell me why im getting this group and how to remove it? 所以目前我有一个group2我不想要这个组,有人可以告诉我为什么我要获得这个组以及如何删除它吗?

and the second is, in the attributes I want to match the words, "valid, external, best" currently my pattern only matches "valid," I thought adding the repeat of within the group would of matched all three of those but it hasn't. 第二个是,在我要匹配单词“ valid,external,best”的属性中,当前我的模式仅匹配“ valid”,我以为在组中添加重复项将匹配所有三个,但它没有“T。

How would I achieve matching the repeat of "string, string, string," (string comma space) into one group? 我如何实现将“字符串,字符串,字符串”(字符串逗号空间)的重复匹配到一组?

Thanks 谢谢

EDIT 编辑

Desired output 所需的输出

as_path : 1234 1111 5555 88945
peer_addr : 172.255.255.255
peer_rid : 1.1.1.1
local_pref : 300
attribs : valid, external, best 

attiribs may also just be valid, external, or just external, or another entry in the format (stringcommaspace) 附属机构也可能只是有效的,外部的或外部的,或者是格式为(stringcommaspace)的另一个条目

Try Regex: (?P<as_path>(?:\\d{4,10}\\s){1,20})\\s+(?P<peer_addr>\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}).*\\((?P<peer_rid>\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3})\\)\\s+.*localpref\\s(?P<local_pref>\\d+),\\s(?P<attribs>[\\S]+,(?: [\\S]+,?)*){0,4} 试试正则表达式:( (?P<as_path>(?:\\d{4,10}\\s){1,20})\\s+(?P<peer_addr>\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}).*\\((?P<peer_rid>\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3})\\)\\s+.*localpref\\s(?P<local_pref>\\d+),\\s(?P<attribs>[\\S]+,(?: [\\S]+,?)*){0,4}

Demo 演示

Regex in the question had a capturing group (Group 2) for (\\d{4,10}\\s) . 问题中的正则表达式具有(\\d{4,10}\\s)的捕获组(组2 (\\d{4,10}\\s) it is changed to a non capturing group now (?:\\d{4,10}\\s) 现在已将其更改为非捕获组(?:\\d{4,10}\\s)

See regex in use here . 请参阅此处使用的正则表达式。

(?P<as_path>(?:\d{4,10}\s){1,20})\s+(?P<peer_addr>\d{0,3}(?:\.\d{0,3}){3}).*\((?P<peer_rid>\d{0,3}(?:\.\d{0,3}){3})\)\s+.*localpref\s(?P<local_pref>\d+),\s+(?P<attribs>\S+(?:,\s+\S+){2})
  1. You were getting group 2 because your as_path group contained a group. 您之所以进入第2组,是因为您的as_path组包含一个组。 I changed that to a non-capturing group. 我将其更改为非捕获组。
  2. I changed attribs to \\S+(?:,\\s+\\S+){2} 我将attribs更改为\\S+(?:,\\s+\\S+){2}
    • This will match any non-space character one or more times \\S+ , followed by the following exactly twice: 这将匹配任何非空格字符一次或多次\\S+ ,然后将以下两次精确匹配:
      • ,\\s+\\S+ the comma character, followed by the space character one or more times, followed by any non-space character one or more times ,\\s+\\S+逗号字符,然后是空格字符一次或多次,然后是任何非空格字符一次或多次
  3. I changed peer_addr and peer_rid to \\d{0,3}(?:\\.\\d{0,3}){3} instead of \\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3} . 我将peer_addrpeer_rid更改为\\d{0,3}(?:\\.\\d{0,3}){3}而不是\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3} This is a preference, but shortens the expression. 这是首选项,但是会缩短表达式。

Without that last modification, you can use the following regex (it performs slightly better anyway (as seen here ): 如果没有这最后的修改,你可以使用下面的正则表达式(它稍微好一点的执行(就像看到这里 ):

(?P<as_path>(?:\d{4,10}\s){1,20})\s+(?P<peer_addr>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}).*\((?P<peer_rid>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})\)\s+.*localpref\s(?P<local_pref>\d+),\s+(?P<attribs>\S+(?:,\s+\S+){2})

You can also improve the performance by using more specific tokens as the following suggests (notice I also added the x modifier to make it more legible) and as seen here : 您还可以通过为下建议使用更具体的令牌提高性能(请注意,我还添加了x修改,以使其更清晰),并为看到这里

(?P<as_path>\d{4,10}(?:\s\d{4,10}){0,19})\s+
(?P<peer_addr>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})[^)]*
\((?P<peer_rid>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})\)\s+
.*localpref\s(?P<local_pref>\d+),\s+
(?P<attribs>\w+(?:,\s+\w+){2})

You get that separate group because your are repeating a capturing group were the last iteration will be the capturing group, in this case 88945 You could make it non capturing instead (?: 之所以会得到该单独的组,是因为您要重复一个捕获组,而最后一次迭代将是该捕获组,在这种情况下,您可以将其88945而不捕获(?:

For the second part you could use an alternation to exactly match one of the options (?:valid|external|best) 对于第二部分,您可以使用轮换方式来完全匹配其中一个选项(?:valid|external|best)

Your pattern might look like: 您的模式可能如下所示:

(?P<as_path>(?:\d{4,10}\s){1,20})\s+(?P<peer_addr>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}).*\((?P<peer_rid>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})\)\s+.*localpref\s(?P<local_pref>\d+),\s(?P<attribs>(?:valid|external|best)(?:,\s{0,4}(?:valid|external|best))+)

regex101 demo regex101演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM