使用正则表达式创建具有正向回溯的字典列表

Question

I am trying to create a list of dictionaries using regex positive lookbehind.我正在尝试使用正则表达式肯定后向创建字典列表。 I tried two different codes:我尝试了两种不同的代码：

Variation 1变体 1

string = '146.204.224.152 - lubo233'

for item in re.finditer( "(?P<host>[0-9]*[.][0-9]*[.][0-9]*[.][0-9]*)(?P<user_name>(?<= - )[a-z]*[0-9]*)", string ):
    print(item.groupdict())

Variation 2变体 2

string = '146.204.224.152 - lubo233'
for item in re.finditer( "(?P<host>[0-9]*[.][0-9]*[.][0-9]*[.][0-9]*)(?<= - )(?P<user_name>[a-z]*[0-9]*)", string ):
    print(item.groupdict())

Desired Output所需 Output

{'host': '146.204.224.152', 'user_name': 'lubo233'}

Question/Issue问题/问题

In both cases, I am unable to eliminate the substring " - ".在这两种情况下，我都无法消除 substring“-”。

The use of positive lookbehind (?<= - ) renders my code wrong.使用积极的后视(?<= - )会使我的代码出错。

Can anyone assist to identify my mistake?任何人都可以帮助确定我的错误吗？ Thanks.谢谢。

Answer 1

I'd suggest you remove the positive lookbehind and just put the join character normally, between each parts我建议您删除积极的后视，并在每个部分之间正常放置连接字符

Also some improvements还有一些改进

\. instead of [.]而不是[.]
[0-9]{,3} instead of [0-9]* [0-9]{,3}而不是[0-9]*
(?:\.[0-9]{,3}){3} instead of \.[0-9]{,3}\.[0-9]{,3}\.[0-9]{,3} (?:\.[0-9]{,3}){3}而不是\.[0-9]{,3}\.[0-9]{,3}\.[0-9]{,3}

Add a .* along with the - to handle any word that could be there添加.*和-以处理可能存在的任何单词

rgx = re.compile(r"(?P<host>[0-9]{,3}(?:\.[0-9]{,3}){3}).* - (?P<user_name>[a-z]*[0-9]*)")

vals = ['146.204.224.152 aw0123 abc - lubo233',
        '146.204.224.152 as003443af - lubo233',
        '146.204.224.152 - lubo233']

for val in vals:
    for item in rgx.finditer(val):
        print(item.groupdict())

# Gives
{'host': '146.204.224.152', 'user_name': 'lubo233'}
{'host': '146.204.224.152', 'user_name': 'lubo233'}
{'host': '146.204.224.152', 'user_name': 'lubo233'}

Answer 2

The reason that the positive lookbehind is not working is that you are trying to match:积极向后看不起作用的原因是您正在尝试匹配：

(?P<host>[0-9]*[.][0-9]*[.][0-9]*[.][0-9]*) an IP address (?P<host>[0-9]*[.][0-9]*[.][0-9]*[.][0-9]*)一个IP 地址
immediately followed by a user name pattern : (?P<user_name>(?<= - )[az]*[0-9]*) that should be preceded by (?<= - )紧随其后的用户名模式： (?P<user_name>(?<= - )[az]*[0-9]*)前面应该是(?<= - )

So once the regex engine has consumed the IP address pattern you are telling that should match a user name pattern preceded by (?<= - ) but what is preceding is the IP address pattern.因此，一旦正则表达式引擎使用了IP 地址模式，您就会告诉它应该匹配一个以(?<= - )开头的用户名模式，但前面的是IP 地址模式。 In other terms, once the IP pattern has been matched the string left is:换句话说，一旦匹配了IP 模式，左边的字符串就是：

- lubo233

The pattern that should be immediately matched, as in re.match , is:应该立即匹配的模式，如re.match ，是：

(?P<user_name>(?<= - )[a-z]*[0-9]*)

that obviously does not match.那显然不匹配。 To illustrate my point, see that this pattern works:为了说明我的观点，请查看此模式是否有效：

import re

string = '146.204.224.152 - lubo233'
for item in re.finditer(r"((?P<host>[0-9]*[.][0-9]*[.][0-9]*[.][0-9]*)( - ))(?P<user_name>(?<= - )[a-z]*[0-9]*)", string):
    print(item.groupdict())

Output Output

{'host': '146.204.224.152', 'user_name': 'lubo233'}

If you need to match an arbitrary number of characters between the two patterns, you could do:如果您需要在两种模式之间匹配任意数量的字符，您可以这样做：

import re

string = '146.204.224.152 adfadfa - lubo233'
for item in re.finditer(r"((?P<host>\d{3,}[.]\d{3,}[.]\d{3,})(.* - ))(?P<user_name>(?<= - )[a-z]*[0-9]*)", string):
    print(item.groupdict())

Output Output

{'host': '146.204.224', 'user_name': 'lubo233'}

使用正则表达式创建具有正向回溯的字典列表

问题描述

2 个解决方案

解决方案1
2 2020-11-28 10:11:38

解决方案2
1 2020-11-28 10:28:08

使用正则表达式创建具有正向回溯的字典列表

问题描述

2 个解决方案

解决方案1 2 2020-11-28 10:11:38

解决方案2 1 2020-11-28 10:28:08

解决方案1
2 2020-11-28 10:11:38

解决方案2
1 2020-11-28 10:28:08