[英]Using regex to create a list of dictionaries with positive lookbehind
I am trying to create a list of dictionaries using regex positive lookbehind.我正在尝试使用正则表达式肯定后向创建字典列表。 I tried two different codes:
我尝试了两种不同的代码:
Variation 1变体 1
string = '146.204.224.152 - lubo233'
for item in re.finditer( "(?P<host>[0-9]*[.][0-9]*[.][0-9]*[.][0-9]*)(?P<user_name>(?<= - )[a-z]*[0-9]*)", string ):
print(item.groupdict())
Variation 2变体 2
string = '146.204.224.152 - lubo233'
for item in re.finditer( "(?P<host>[0-9]*[.][0-9]*[.][0-9]*[.][0-9]*)(?<= - )(?P<user_name>[a-z]*[0-9]*)", string ):
print(item.groupdict())
Desired Output所需 Output
{'host': '146.204.224.152', 'user_name': 'lubo233'}
Question/Issue问题/问题
In both cases, I am unable to eliminate the substring " - ".在这两种情况下,我都无法消除 substring“-”。
The use of positive lookbehind (?<= - )
renders my code wrong.使用积极的后视
(?<= - )
会使我的代码出错。
Can anyone assist to identify my mistake?任何人都可以帮助确定我的错误吗? Thanks.
谢谢。
I'd suggest you remove the positive lookbehind and just put the join character normally, between each parts我建议您删除积极的后视,并在每个部分之间正常放置连接字符
Also some improvements还有一些改进
\.
instead of [.]
而不是
[.]
[0-9]{,3}
instead of [0-9]*
[0-9]{,3}
而不是[0-9]*
(?:\.[0-9]{,3}){3}
instead of \.[0-9]{,3}\.[0-9]{,3}\.[0-9]{,3}
(?:\.[0-9]{,3}){3}
而不是\.[0-9]{,3}\.[0-9]{,3}\.[0-9]{,3}
Add a .*
along with the -
to handle any word that could be there添加
.*
和-
以处理可能存在的任何单词
rgx = re.compile(r"(?P<host>[0-9]{,3}(?:\.[0-9]{,3}){3}).* - (?P<user_name>[a-z]*[0-9]*)")
vals = ['146.204.224.152 aw0123 abc - lubo233',
'146.204.224.152 as003443af - lubo233',
'146.204.224.152 - lubo233']
for val in vals:
for item in rgx.finditer(val):
print(item.groupdict())
# Gives
{'host': '146.204.224.152', 'user_name': 'lubo233'}
{'host': '146.204.224.152', 'user_name': 'lubo233'}
{'host': '146.204.224.152', 'user_name': 'lubo233'}
The reason that the positive lookbehind is not working is that you are trying to match:积极向后看不起作用的原因是您正在尝试匹配:
(?P<host>[0-9]*[.][0-9]*[.][0-9]*[.][0-9]*)
an IP address (?P<host>[0-9]*[.][0-9]*[.][0-9]*[.][0-9]*)
一个IP 地址(?P<user_name>(?<= - )[az]*[0-9]*)
that should be preceded by (?<= - )
(?P<user_name>(?<= - )[az]*[0-9]*)
前面应该是(?<= - )
So once the regex engine has consumed the IP address pattern you are telling that should match a user name pattern preceded by (?<= - )
but what is preceding is the IP address pattern.因此,一旦正则表达式引擎使用了IP 地址模式,您就会告诉它应该匹配一个以
(?<= - )
开头的用户名模式,但前面的是IP 地址模式。 In other terms, once the IP pattern has been matched the string left is:换句话说,一旦匹配了IP 模式,左边的字符串就是:
- lubo233
The pattern that should be immediately matched, as in re.match , is:应该立即匹配的模式,如re.match ,是:
(?P<user_name>(?<= - )[a-z]*[0-9]*)
that obviously does not match.那显然不匹配。 To illustrate my point, see that this pattern works:
为了说明我的观点,请查看此模式是否有效:
import re
string = '146.204.224.152 - lubo233'
for item in re.finditer(r"((?P<host>[0-9]*[.][0-9]*[.][0-9]*[.][0-9]*)( - ))(?P<user_name>(?<= - )[a-z]*[0-9]*)", string):
print(item.groupdict())
Output Output
{'host': '146.204.224.152', 'user_name': 'lubo233'}
If you need to match an arbitrary number of characters between the two patterns, you could do:如果您需要在两种模式之间匹配任意数量的字符,您可以这样做:
import re
string = '146.204.224.152 adfadfa - lubo233'
for item in re.finditer(r"((?P<host>\d{3,}[.]\d{3,}[.]\d{3,})(.* - ))(?P<user_name>(?<= - )[a-z]*[0-9]*)", string):
print(item.groupdict())
Output Output
{'host': '146.204.224', 'user_name': 'lubo233'}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.