[英]python - regex putting repeating patterns in a single group
I'm trying to parse a string in regex and am 99% there. 我正在尝试解析正则表达式中的字符串,并且在那里达到99%。
my test string is 我的测试字符串是
1
1234 1111 5555 88945
172.255.255.255 from 172.255.255.255 (1.1.1.1)
Origin IGP, localpref 300, valid, external, best
rx pathid: 0, tx pathid: 0x0
my current regex pattern is: 我当前的正则表达式模式是:
(?P<as_path>(\d{4,10}\s){1,20})\s+(?P<peer_addr>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}).*\((?P<peer_rid>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})\)\s+.*localpref\s(?P<local_pref>\d+),\s(?P<attribs>\S+,\s{0,4})
im using regex101 to test and have a link to the test here https://regex101.com/r/iGM8ye/1 我正在使用regex101进行测试,并在此处链接到测试https://regex101.com/r/iGM8ye/1
So currently i have a group2 I don't want this group, could someone tell me why im getting this group and how to remove it? 所以目前我有一个group2我不想要这个组,有人可以告诉我为什么我要获得这个组以及如何删除它吗?
and the second is, in the attributes I want to match the words, "valid, external, best" currently my pattern only matches "valid," I thought adding the repeat of within the group would of matched all three of those but it hasn't. 第二个是,在我要匹配单词“ valid,external,best”的属性中,当前我的模式仅匹配“ valid”,我以为在组中添加重复项将匹配所有三个,但它没有“T。
How would I achieve matching the repeat of "string, string, string," (string comma space) into one group? 我如何实现将“字符串,字符串,字符串”(字符串逗号空间)的重复匹配到一组?
Thanks 谢谢
EDIT 编辑
Desired output 所需的输出
as_path : 1234 1111 5555 88945
peer_addr : 172.255.255.255
peer_rid : 1.1.1.1
local_pref : 300
attribs : valid, external, best
attiribs may also just be valid, external, or just external, or another entry in the format (stringcommaspace) 附属机构也可能只是有效的,外部的或外部的,或者是格式为(stringcommaspace)的另一个条目
Try Regex: (?P<as_path>(?:\\d{4,10}\\s){1,20})\\s+(?P<peer_addr>\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}).*\\((?P<peer_rid>\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3})\\)\\s+.*localpref\\s(?P<local_pref>\\d+),\\s(?P<attribs>[\\S]+,(?: [\\S]+,?)*){0,4}
试试正则表达式:(
(?P<as_path>(?:\\d{4,10}\\s){1,20})\\s+(?P<peer_addr>\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}).*\\((?P<peer_rid>\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3})\\)\\s+.*localpref\\s(?P<local_pref>\\d+),\\s(?P<attribs>[\\S]+,(?: [\\S]+,?)*){0,4}
Regex in the question had a capturing group (Group 2) for (\\d{4,10}\\s)
. 问题中的正则表达式具有
(\\d{4,10}\\s)
的捕获组(组2 (\\d{4,10}\\s)
。 it is changed to a non capturing group now (?:\\d{4,10}\\s)
现在已将其更改为非捕获组
(?:\\d{4,10}\\s)
See regex in use here . 请参阅此处使用的正则表达式。
(?P<as_path>(?:\d{4,10}\s){1,20})\s+(?P<peer_addr>\d{0,3}(?:\.\d{0,3}){3}).*\((?P<peer_rid>\d{0,3}(?:\.\d{0,3}){3})\)\s+.*localpref\s(?P<local_pref>\d+),\s+(?P<attribs>\S+(?:,\s+\S+){2})
as_path
group contained a group. as_path
组包含一个组。 I changed that to a non-capturing group. attribs
to \\S+(?:,\\s+\\S+){2}
attribs
更改为\\S+(?:,\\s+\\S+){2}
\\S+
, followed by the following exactly twice: \\S+
,然后将以下两次精确匹配:
,\\s+\\S+
the comma character, followed by the space character one or more times, followed by any non-space character one or more times ,\\s+\\S+
逗号字符,然后是空格字符一次或多次,然后是任何非空格字符一次或多次 peer_addr
and peer_rid
to \\d{0,3}(?:\\.\\d{0,3}){3}
instead of \\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}
. peer_addr
和peer_rid
更改为\\d{0,3}(?:\\.\\d{0,3}){3}
而不是\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}
。 This is a preference, but shortens the expression. Without that last modification, you can use the following regex (it performs slightly better anyway (as seen here ): 如果没有这最后的修改,你可以使用下面的正则表达式(它稍微好一点的执行(就像看到这里 ):
(?P<as_path>(?:\d{4,10}\s){1,20})\s+(?P<peer_addr>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}).*\((?P<peer_rid>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})\)\s+.*localpref\s(?P<local_pref>\d+),\s+(?P<attribs>\S+(?:,\s+\S+){2})
You can also improve the performance by using more specific tokens as the following suggests (notice I also added the x
modifier to make it more legible) and as seen here : 您还可以通过为下建议使用更具体的令牌提高性能(请注意,我还添加了
x
修改,以使其更清晰),并为看到这里 :
(?P<as_path>\d{4,10}(?:\s\d{4,10}){0,19})\s+
(?P<peer_addr>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})[^)]*
\((?P<peer_rid>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})\)\s+
.*localpref\s(?P<local_pref>\d+),\s+
(?P<attribs>\w+(?:,\s+\w+){2})
You get that separate group because your are repeating a capturing group were the last iteration will be the capturing group, in this case 88945
You could make it non capturing instead (?:
之所以会得到该单独的组,是因为您要重复一个捕获组,而最后一次迭代将是该捕获组,在这种情况下,您可以将其
88945
而不捕获(?:
For the second part you could use an alternation to exactly match one of the options (?:valid|external|best)
对于第二部分,您可以使用轮换方式来完全匹配其中一个选项
(?:valid|external|best)
Your pattern might look like: 您的模式可能如下所示:
(?P<as_path>(?:\d{4,10}\s){1,20})\s+(?P<peer_addr>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}).*\((?P<peer_rid>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})\)\s+.*localpref\s(?P<local_pref>\d+),\s(?P<attribs>(?:valid|external|best)(?:,\s{0,4}(?:valid|external|best))+)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.