python-正则表达式将重复模式放在单个组中

Question

I'm trying to parse a string in regex and am 99% there. 我正在尝试解析正则表达式中的字符串，并且在那里达到99％。

my test string is 我的测试字符串是

 1
  1234 1111 5555 88945
    172.255.255.255 from 172.255.255.255 (1.1.1.1)
      Origin IGP, localpref 300, valid, external, best
      rx pathid: 0, tx pathid: 0x0

my current regex pattern is: 我当前的正则表达式模式是：

(?P<as_path>(\d{4,10}\s){1,20})\s+(?P<peer_addr>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}).*\((?P<peer_rid>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})\)\s+.*localpref\s(?P<local_pref>\d+),\s(?P<attribs>\S+,\s{0,4})

im using regex101 to test and have a link to the test here https://regex101.com/r/iGM8ye/1 我正在使用regex101进行测试，并在此处链接到测试https://regex101.com/r/iGM8ye/1

So currently i have a group2 I don't want this group, could someone tell me why im getting this group and how to remove it? 所以目前我有一个group2我不想要这个组，有人可以告诉我为什么我要获得这个组以及如何删除它吗？

and the second is, in the attributes I want to match the words, "valid, external, best" currently my pattern only matches "valid," I thought adding the repeat of within the group would of matched all three of those but it hasn't. 第二个是，在我要匹配单词“ valid，external，best”的属性中，当前我的模式仅匹配“ valid”，我以为在组中添加重复项将匹配所有三个，但它没有“T。

How would I achieve matching the repeat of "string, string, string," (string comma space) into one group? 我如何实现将“字符串，字符串，字符串”（字符串逗号空间）的重复匹配到一组？

Thanks 谢谢

EDIT 编辑

Desired output 所需的输出

as_path : 1234 1111 5555 88945
peer_addr : 172.255.255.255
peer_rid : 1.1.1.1
local_pref : 300
attribs : valid, external, best

attiribs may also just be valid, external, or just external, or another entry in the format (stringcommaspace) 附属机构也可能只是有效的，外部的或外部的，或者是格式为（stringcommaspace）的另一个条目

Answer 1

Try Regex: (?P<as_path>(?:\\d{4,10}\\s){1,20})\\s+(?P<peer_addr>\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}).*\\((?P<peer_rid>\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3})\\)\\s+.*localpref\\s(?P<local_pref>\\d+),\\s(?P<attribs>[\\S]+,(?: [\\S]+,?)*){0,4} 试试正则表达式：（ (?P<as_path>(?:\\d{4,10}\\s){1,20})\\s+(?P<peer_addr>\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}).*\\((?P<peer_rid>\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3})\\)\\s+.*localpref\\s(?P<local_pref>\\d+),\\s(?P<attribs>[\\S]+,(?: [\\S]+,?)*){0,4}

Demo 演示

Regex in the question had a capturing group (Group 2) for (\\d{4,10}\\s) . 问题中的正则表达式具有(\\d{4,10}\\s)的捕获组（组2 (\\d{4,10}\\s) 。 it is changed to a non capturing group now (?:\\d{4,10}\\s) 现在已将其更改为非捕获组(?:\\d{4,10}\\s)

Answer 2

See regex in use here . 请参阅此处使用的正则表达式。

(?P<as_path>(?:\d{4,10}\s){1,20})\s+(?P<peer_addr>\d{0,3}(?:\.\d{0,3}){3}).*\((?P<peer_rid>\d{0,3}(?:\.\d{0,3}){3})\)\s+.*localpref\s(?P<local_pref>\d+),\s+(?P<attribs>\S+(?:,\s+\S+){2})

You were getting group 2 because your as_path group contained a group. 您之所以进入第2组，是因为您的as_path组包含一个组。 I changed that to a non-capturing group. 我将其更改为非捕获组。
I changed attribs to \\S+(?:,\\s+\\S+){2} 我将attribs更改为\\S+(?:,\\s+\\S+){2}
- This will match any non-space character one or more times \\S+ , followed by the following exactly twice: 这将匹配任何非空格字符一次或多次\\S+ ，然后将以下两次精确匹配：
  - ,\\s+\\S+ the comma character, followed by the space character one or more times, followed by any non-space character one or more times ,\\s+\\S+逗号字符，然后是空格字符一次或多次，然后是任何非空格字符一次或多次
I changed peer_addr and peer_rid to \\d{0,3}(?:\\.\\d{0,3}){3} instead of \\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3} . 我将peer_addr和peer_rid更改为\\d{0,3}(?:\\.\\d{0,3}){3}而不是\\d{0,3}\\.\\d{0,3}\\.\\d{0,3}\\.\\d{0,3} 。 This is a preference, but shortens the expression. 这是首选项，但是会缩短表达式。

Without that last modification, you can use the following regex (it performs slightly better anyway (as seen here ): 如果没有这最后的修改，你可以使用下面的正则表达式（它稍微好一点的执行（就像看到这里）：

(?P<as_path>(?:\d{4,10}\s){1,20})\s+(?P<peer_addr>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}).*\((?P<peer_rid>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})\)\s+.*localpref\s(?P<local_pref>\d+),\s+(?P<attribs>\S+(?:,\s+\S+){2})

You can also improve the performance by using more specific tokens as the following suggests (notice I also added the x modifier to make it more legible) and as seen here : 您还可以通过为下建议使用更具体的令牌提高性能（请注意，我还添加了x修改，以使其更清晰），并为看到这里：

(?P<as_path>\d{4,10}(?:\s\d{4,10}){0,19})\s+
(?P<peer_addr>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})[^)]*
\((?P<peer_rid>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})\)\s+
.*localpref\s(?P<local_pref>\d+),\s+
(?P<attribs>\w+(?:,\s+\w+){2})

Answer 3

You get that separate group because your are repeating a capturing group were the last iteration will be the capturing group, in this case 88945 You could make it non capturing instead (?: 之所以会得到该单独的组，是因为您要重复一个捕获组，而最后一次迭代将是该捕获组，在这种情况下，您可以将其88945而不捕获(?:

For the second part you could use an alternation to exactly match one of the options (?:valid|external|best) 对于第二部分，您可以使用轮换方式来完全匹配其中一个选项(?:valid|external|best)

Your pattern might look like: 您的模式可能如下所示：

(?P<as_path>(?:\d{4,10}\s){1,20})\s+(?P<peer_addr>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}).*\((?P<peer_rid>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})\)\s+.*localpref\s(?P<local_pref>\d+),\s(?P<attribs>(?:valid|external|best)(?:,\s{0,4}(?:valid|external|best))+)

regex101 demo regex101演示

python-正则表达式将重复模式放在单个组中

问题描述

3 个解决方案

解决方案1
2 2019-04-03 16:24:28

解决方案2
1 已采纳 2019-04-03 16:25:21

解决方案3
1 2019-04-03 16:25:51

python-正则表达式将重复模式放在单个组中

问题描述

3 个解决方案

解决方案1 2 2019-04-03 16:24:28

解决方案2 1 已采纳 2019-04-03 16:25:21

解决方案3 1 2019-04-03 16:25:51

解决方案1
2 2019-04-03 16:24:28

解决方案2
1 已采纳 2019-04-03 16:25:21

解决方案3
1 2019-04-03 16:25:51