简体   繁体   English

将多个字典与 python 合并

[英]merge multiple dictionaries with python

I have a list of strings from which I want to extract all relevant information using regex .我有一个字符串列表,我想使用regex从中提取所有相关信息。 I have written a pattern to extract the information I need.我写了一个模式来提取我需要的信息。 the pattern is as follows模式如下

pattern1 = "(?P<host>\d*\.\d*\.\d*\.\d*[^0-9]*) - (?P<user_name>\w*\d*)|(?P<time>\d*/\w*/\d*:\d*:\d*:\d* -\d*)|\"(?P<request>[A-Z]* (/[a-z+]*)+ [A-Z]*/\d\.\d)"

result = [item.groupdict() for item in re.finditer(pattern1,logdata)]

Multiple dictionaries are generated as follows.多个字典生成如下。 This is sort of the answer I am looking for这是我正在寻找答案

[{'host': '146.204.224.152',
  'user_name': 'feest6811',
  'time': None,
  'request': None},
 {'host': None,
  'user_name': None,
  'time': '21/Jun/2019:15:45:24 -0700',
  'request': None},
 {'host': None,
  'user_name': None,
  'time': None,
  'request': 'POST /incentivize HTTP/1.1'},

 ...

]

In the output 3 dictionaries are formed each containing a piece of the required information.在 output 中,形成了3 dictionaries都包含一条所需的信息。 I want a single dictionary with all the information as follows我想要一本包含所有信息的字典,如下所示

{
  'host': '146.204.224.152',
  'user_name': 'feest6811',
  'time': '21/Jun/2019:15:45:24 -0700',
  'request': 'POST /incentivize HTTP/1.1'
}

I am not sure what I am doing wrong.我不确定我做错了什么。 Is there any way to merge the dictionaries as they are formed?有没有办法在字典形成时合并它们?

these are few samples from logdata这些是来自 logdata 的几个样本

'146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622',
 '197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554',
 '156.127.178.177 - okuneva5222[21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701',

This is one approach using groupdict这是使用groupdict的一种方法

Ex:前任:

logdata = ['146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622',
 '197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554',
 '156.127.178.177 - okuneva5222[21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701']

result = []
pattern1 = re.compile('(?P<host>\d*\.\d*\.\d*\.\d*[^0-9]*)\s*-\s*(?P<user_name>\w*\d*)\s*\[(?P<time>\d*/\w*/\d*:\d*:\d*:\d* -\d*)\]\s*"(?P<request>[A-Z]* (/[a-z+]*)+ [A-Z]*/\d\.\d)')
for log in logdata:
    m = pattern1.match(log)
    if m:
        result.append(m.groupdict())

print(result)

Output: Output:

[{'host': '146.204.224.152',
  'request': 'POST /incentivize HTTP/1.1',
  'time': '21/Jun/2019:15:45:24 -0700',
  'user_name': 'feest6811'},
 {'host': '197.109.77.178',
  'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0',
  'time': '21/Jun/2019:15:45:25 -0700',
  'user_name': 'kertzmann3129'},
 {'host': '156.127.178.177',
  'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1',
  'time': '21/Jun/2019:15:45:27 -0700',
  'user_name': 'okuneva5222'}]

You can just modify your regex pattern slightly, so that it matches all groups at the same time instead of matching either of the 3 capturing groups.您可以稍微修改您的正则表达式模式,以便它同时匹配所有组,而不是匹配 3 个捕获组中的任何一个。

pattern1 = "(?P<host>\d*\.\d*\.\d*\.\d*[^0-9]*) - (?P<user_name>\w*\d*) ?\[(?P<time>\d*/\w*/\d*:\d*:\d*:\d* -\d*)\] \"(?P<request>[A-Z]* (/[a-z+]*)+ [A-Z]*/\d\.\d)"

After changing your pattern like this, it seems to work:像这样改变你的模式后,它似乎工作:

# modified matching pattern
pattern1 = "(?P<host>\d*\.\d*\.\d*\.\d*[^0-9]*) - (?P<user_name>\w*\d*).*(?P<time>\d*/\w*/\d*:\d*:\d*:\d* -\d*).*\"(?P<request>[A-Z]* (/[a-z+]*)+ [A-Z]*/\d\.\d)"
logdata = ['146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622', '197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554', '156.127.178.177 - okuneva5222[21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701']

result = []

for log in logdata:
     result.append([item.groupdict() for item in re.finditer(pattern1, log)][0])

gives

{'host': '146.204.224.152', 'user_name': 'feest6811', 'time': '/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}
{'host': '197.109.77.178', 'user_name': 'kertzmann3129', 'time': '/Jun/2019:15:45:25 -0700', 'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0'}
{'host': '156.127.178.177', 'user_name': 'okuneva5222', 'time': '/Jun/2019:15:45:27 -0700', 'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1'}

You can make the pattern a bit more specific using {n} as a quantifier instead of mostly * with only 4 named capture groups.您可以使用{n}作为量词来使模式更加具体,而不是使用只有 4 个命名捕获组的大多数*

(?P<host>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s*-\s*(?P<user_name>\w+)\s*\[(?P<time>\d+/[a-zA-Z]+/\d{4}:\d{2}:\d{2}:\d{2} -\d{4})\]\s*"(?P<request>[A-Z]+ /[^"]*)"

Regex demo正则表达式演示

import re
pattern1 = r"(?P<host>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s*-\s*(?P<user_name>\w+)\s*\[(?P<time>\d+/[a-zA-Z]+/\d{4}:\d{2}:\d{2}:\d{2} -\d{4})\]\s*\"(?P<request>[A-Z]+ /[^\"]*)\""
logdata = ("146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] \"POST /incentivize HTTP/1.1\" 302 4622\n"
            "197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] \"DELETE /virtual/solutions/target/web+services HTTP/2.0\" 203 26554\n"
            "156.127.178.177 - okuneva5222[21/Jun/2019:15:45:27 -0700] \"DELETE /interactive/transparent/niches/revolutionize HTTP/1.1\" 416 14701")
result = [item.groupdict() for item in re.finditer(pattern1, logdata)]
print(result)

Output Output

[
{'host': '146.204.224.152',
'user_name': 'feest6811',
'time': '21/Jun/2019:15:45:24 -0700',
'request': 'POST /incentivize HTTP/1.1'
},
{
'host': '197.109.77.178',
'user_name': 'kertzmann3129',
'time': '21/Jun/2019:15:45:25 -0700',
'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0'
},
{
'host': '156.127.178.177',
'user_name': 'okuneva5222',
'time': '21/Jun/2019:15:45:27 -0700',
'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1'
}
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM