[英]merge multiple dictionaries with python
I have a list of strings from which I want to extract all relevant information using regex
.我有一个字符串列表,我想使用
regex
从中提取所有相关信息。 I have written a pattern to extract the information I need.我写了一个模式来提取我需要的信息。 the pattern is as follows
模式如下
pattern1 = "(?P<host>\d*\.\d*\.\d*\.\d*[^0-9]*) - (?P<user_name>\w*\d*)|(?P<time>\d*/\w*/\d*:\d*:\d*:\d* -\d*)|\"(?P<request>[A-Z]* (/[a-z+]*)+ [A-Z]*/\d\.\d)"
result = [item.groupdict() for item in re.finditer(pattern1,logdata)]
Multiple dictionaries are generated as follows.多个字典生成如下。 This is
sort of the answer I am looking for
这是我正在寻找的答案
[{'host': '146.204.224.152',
'user_name': 'feest6811',
'time': None,
'request': None},
{'host': None,
'user_name': None,
'time': '21/Jun/2019:15:45:24 -0700',
'request': None},
{'host': None,
'user_name': None,
'time': None,
'request': 'POST /incentivize HTTP/1.1'},
...
]
In the output 3 dictionaries
are formed each containing a piece of the required information.在 output 中,形成了
3 dictionaries
都包含一条所需的信息。 I want a single dictionary with all the information as follows我想要一本包含所有信息的字典,如下所示
{
'host': '146.204.224.152',
'user_name': 'feest6811',
'time': '21/Jun/2019:15:45:24 -0700',
'request': 'POST /incentivize HTTP/1.1'
}
I am not sure what I am doing wrong.我不确定我做错了什么。 Is there any way to merge the dictionaries as they are formed?
有没有办法在字典形成时合并它们?
these are few samples from logdata这些是来自 logdata 的几个样本
'146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622',
'197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554',
'156.127.178.177 - okuneva5222[21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701',
This is one approach using groupdict
这是使用
groupdict
的一种方法
Ex:前任:
logdata = ['146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622',
'197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554',
'156.127.178.177 - okuneva5222[21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701']
result = []
pattern1 = re.compile('(?P<host>\d*\.\d*\.\d*\.\d*[^0-9]*)\s*-\s*(?P<user_name>\w*\d*)\s*\[(?P<time>\d*/\w*/\d*:\d*:\d*:\d* -\d*)\]\s*"(?P<request>[A-Z]* (/[a-z+]*)+ [A-Z]*/\d\.\d)')
for log in logdata:
m = pattern1.match(log)
if m:
result.append(m.groupdict())
print(result)
Output: Output:
[{'host': '146.204.224.152',
'request': 'POST /incentivize HTTP/1.1',
'time': '21/Jun/2019:15:45:24 -0700',
'user_name': 'feest6811'},
{'host': '197.109.77.178',
'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0',
'time': '21/Jun/2019:15:45:25 -0700',
'user_name': 'kertzmann3129'},
{'host': '156.127.178.177',
'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1',
'time': '21/Jun/2019:15:45:27 -0700',
'user_name': 'okuneva5222'}]
You can just modify your regex pattern slightly, so that it matches all groups at the same time instead of matching either of the 3 capturing groups.您可以稍微修改您的正则表达式模式,以便它同时匹配所有组,而不是匹配 3 个捕获组中的任何一个。
pattern1 = "(?P<host>\d*\.\d*\.\d*\.\d*[^0-9]*) - (?P<user_name>\w*\d*) ?\[(?P<time>\d*/\w*/\d*:\d*:\d*:\d* -\d*)\] \"(?P<request>[A-Z]* (/[a-z+]*)+ [A-Z]*/\d\.\d)"
After changing your pattern like this, it seems to work:像这样改变你的模式后,它似乎工作:
# modified matching pattern
pattern1 = "(?P<host>\d*\.\d*\.\d*\.\d*[^0-9]*) - (?P<user_name>\w*\d*).*(?P<time>\d*/\w*/\d*:\d*:\d*:\d* -\d*).*\"(?P<request>[A-Z]* (/[a-z+]*)+ [A-Z]*/\d\.\d)"
logdata = ['146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622', '197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554', '156.127.178.177 - okuneva5222[21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701']
result = []
for log in logdata:
result.append([item.groupdict() for item in re.finditer(pattern1, log)][0])
gives给
{'host': '146.204.224.152', 'user_name': 'feest6811', 'time': '/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}
{'host': '197.109.77.178', 'user_name': 'kertzmann3129', 'time': '/Jun/2019:15:45:25 -0700', 'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0'}
{'host': '156.127.178.177', 'user_name': 'okuneva5222', 'time': '/Jun/2019:15:45:27 -0700', 'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1'}
You can make the pattern a bit more specific using {n}
as a quantifier instead of mostly *
with only 4 named capture groups.您可以使用
{n}
作为量词来使模式更加具体,而不是使用只有 4 个命名捕获组的大多数*
。
(?P<host>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s*-\s*(?P<user_name>\w+)\s*\[(?P<time>\d+/[a-zA-Z]+/\d{4}:\d{2}:\d{2}:\d{2} -\d{4})\]\s*"(?P<request>[A-Z]+ /[^"]*)"
import re
pattern1 = r"(?P<host>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s*-\s*(?P<user_name>\w+)\s*\[(?P<time>\d+/[a-zA-Z]+/\d{4}:\d{2}:\d{2}:\d{2} -\d{4})\]\s*\"(?P<request>[A-Z]+ /[^\"]*)\""
logdata = ("146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] \"POST /incentivize HTTP/1.1\" 302 4622\n"
"197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] \"DELETE /virtual/solutions/target/web+services HTTP/2.0\" 203 26554\n"
"156.127.178.177 - okuneva5222[21/Jun/2019:15:45:27 -0700] \"DELETE /interactive/transparent/niches/revolutionize HTTP/1.1\" 416 14701")
result = [item.groupdict() for item in re.finditer(pattern1, logdata)]
print(result)
Output Output
[
{'host': '146.204.224.152',
'user_name': 'feest6811',
'time': '21/Jun/2019:15:45:24 -0700',
'request': 'POST /incentivize HTTP/1.1'
},
{
'host': '197.109.77.178',
'user_name': 'kertzmann3129',
'time': '21/Jun/2019:15:45:25 -0700',
'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0'
},
{
'host': '156.127.178.177',
'user_name': 'okuneva5222',
'time': '21/Jun/2019:15:45:27 -0700',
'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1'
}
]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.