将多个字典与 python 合并

Question

I have a list of strings from which I want to extract all relevant information using regex .我有一个字符串列表，我想使用regex从中提取所有相关信息。 I have written a pattern to extract the information I need.我写了一个模式来提取我需要的信息。 the pattern is as follows模式如下

pattern1 = "(?P<host>\d*\.\d*\.\d*\.\d*[^0-9]*) - (?P<user_name>\w*\d*)|(?P<time>\d*/\w*/\d*:\d*:\d*:\d* -\d*)|\"(?P<request>[A-Z]* (/[a-z+]*)+ [A-Z]*/\d\.\d)"

result = [item.groupdict() for item in re.finditer(pattern1,logdata)]

Multiple dictionaries are generated as follows.多个字典生成如下。 This is sort of the answer I am looking for这是我正在寻找的答案

[{'host': '146.204.224.152',
  'user_name': 'feest6811',
  'time': None,
  'request': None},
 {'host': None,
  'user_name': None,
  'time': '21/Jun/2019:15:45:24 -0700',
  'request': None},
 {'host': None,
  'user_name': None,
  'time': None,
  'request': 'POST /incentivize HTTP/1.1'},

 ...

]

In the output 3 dictionaries are formed each containing a piece of the required information.在 output 中，形成了3 dictionaries都包含一条所需的信息。 I want a single dictionary with all the information as follows我想要一本包含所有信息的字典，如下所示

{
  'host': '146.204.224.152',
  'user_name': 'feest6811',
  'time': '21/Jun/2019:15:45:24 -0700',
  'request': 'POST /incentivize HTTP/1.1'
}

I am not sure what I am doing wrong.我不确定我做错了什么。 Is there any way to merge the dictionaries as they are formed?有没有办法在字典形成时合并它们？

these are few samples from logdata这些是来自 logdata 的几个样本

'146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622',
 '197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554',
 '156.127.178.177 - okuneva5222[21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701',

Answer 1

This is one approach using groupdict这是使用groupdict的一种方法

Ex:前任：

logdata = ['146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622',
 '197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554',
 '156.127.178.177 - okuneva5222[21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701']

result = []
pattern1 = re.compile('(?P<host>\d*\.\d*\.\d*\.\d*[^0-9]*)\s*-\s*(?P<user_name>\w*\d*)\s*\[(?P<time>\d*/\w*/\d*:\d*:\d*:\d* -\d*)\]\s*"(?P<request>[A-Z]* (/[a-z+]*)+ [A-Z]*/\d\.\d)')
for log in logdata:
    m = pattern1.match(log)
    if m:
        result.append(m.groupdict())

print(result)

Output: Output：

[{'host': '146.204.224.152',
  'request': 'POST /incentivize HTTP/1.1',
  'time': '21/Jun/2019:15:45:24 -0700',
  'user_name': 'feest6811'},
 {'host': '197.109.77.178',
  'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0',
  'time': '21/Jun/2019:15:45:25 -0700',
  'user_name': 'kertzmann3129'},
 {'host': '156.127.178.177',
  'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1',
  'time': '21/Jun/2019:15:45:27 -0700',
  'user_name': 'okuneva5222'}]

Answer 2

You can just modify your regex pattern slightly, so that it matches all groups at the same time instead of matching either of the 3 capturing groups.您可以稍微修改您的正则表达式模式，以便它同时匹配所有组，而不是匹配 3 个捕获组中的任何一个。

pattern1 = "(?P<host>\d*\.\d*\.\d*\.\d*[^0-9]*) - (?P<user_name>\w*\d*) ?\[(?P<time>\d*/\w*/\d*:\d*:\d*:\d* -\d*)\] \"(?P<request>[A-Z]* (/[a-z+]*)+ [A-Z]*/\d\.\d)"

Answer 3

After changing your pattern like this, it seems to work:像这样改变你的模式后，它似乎工作：

# modified matching pattern
pattern1 = "(?P<host>\d*\.\d*\.\d*\.\d*[^0-9]*) - (?P<user_name>\w*\d*).*(?P<time>\d*/\w*/\d*:\d*:\d*:\d* -\d*).*\"(?P<request>[A-Z]* (/[a-z+]*)+ [A-Z]*/\d\.\d)"
logdata = ['146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622', '197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554', '156.127.178.177 - okuneva5222[21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701']

result = []

for log in logdata:
     result.append([item.groupdict() for item in re.finditer(pattern1, log)][0])

gives给

{'host': '146.204.224.152', 'user_name': 'feest6811', 'time': '/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}
{'host': '197.109.77.178', 'user_name': 'kertzmann3129', 'time': '/Jun/2019:15:45:25 -0700', 'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0'}
{'host': '156.127.178.177', 'user_name': 'okuneva5222', 'time': '/Jun/2019:15:45:27 -0700', 'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1'}

Answer 4

You can make the pattern a bit more specific using {n} as a quantifier instead of mostly * with only 4 named capture groups.您可以使用{n}作为量词来使模式更加具体，而不是使用只有 4 个命名捕获组的大多数* 。

(?P<host>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s*-\s*(?P<user_name>\w+)\s*\[(?P<time>\d+/[a-zA-Z]+/\d{4}:\d{2}:\d{2}:\d{2} -\d{4})\]\s*"(?P<request>[A-Z]+ /[^"]*)"

Regex demo正则表达式演示

import re
pattern1 = r"(?P<host>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s*-\s*(?P<user_name>\w+)\s*\[(?P<time>\d+/[a-zA-Z]+/\d{4}:\d{2}:\d{2}:\d{2} -\d{4})\]\s*\"(?P<request>[A-Z]+ /[^\"]*)\""
logdata = ("146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] \"POST /incentivize HTTP/1.1\" 302 4622\n"
            "197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] \"DELETE /virtual/solutions/target/web+services HTTP/2.0\" 203 26554\n"
            "156.127.178.177 - okuneva5222[21/Jun/2019:15:45:27 -0700] \"DELETE /interactive/transparent/niches/revolutionize HTTP/1.1\" 416 14701")
result = [item.groupdict() for item in re.finditer(pattern1, logdata)]
print(result)

Output Output

[
{'host': '146.204.224.152',
'user_name': 'feest6811',
'time': '21/Jun/2019:15:45:24 -0700',
'request': 'POST /incentivize HTTP/1.1'
},
{
'host': '197.109.77.178',
'user_name': 'kertzmann3129',
'time': '21/Jun/2019:15:45:25 -0700',
'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0'
},
{
'host': '156.127.178.177',
'user_name': 'okuneva5222',
'time': '21/Jun/2019:15:45:27 -0700',
'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1'
}
]

将多个字典与 python 合并

问题描述

4 个解决方案

解决方案1
0 2021-01-09 03:55:50

解决方案2
0 2021-01-09 04:03:00

解决方案3
0 2021-01-09 04:08:24

解决方案4
0 2021-01-09 12:01:52

将多个字典与 python 合并

问题描述

4 个解决方案

解决方案1 0 2021-01-09 03:55:50

解决方案2 0 2021-01-09 04:03:00

解决方案3 0 2021-01-09 04:08:24

解决方案4 0 2021-01-09 12:01:52

解决方案1
0 2021-01-09 03:55:50

解决方案2
0 2021-01-09 04:03:00

解决方案3
0 2021-01-09 04:08:24

解决方案4
0 2021-01-09 12:01:52