![](/img/trans.png)
[英]Adding together multiple dictionary key values that have the same key string
[英]How to parse multiple dictionary values that have the same key
我有很多行数据(我无法手动修改它),它们被表示为字典作为键/值对。 问题是有一个字典键可以多次出现(对于未定义的数字:可能是两次、三次、十次等),并且具有不同的值。
我需要提取所有这些值。
这是一个简单的记录,包含两个键Key-Word
:
{"Date": "Fri, 19 Apr 2019 00:54:46 GMT", "Vary": "Host,Accept-Encoding", "Key-Word": "00a", "Cache-Control": "private" , "关键字": "xn"}
我编写了这个 python 脚本来提取记录的值。
import ast
import re
import json
inFile = open("sample.txt","r",errors="replace")
cP=0 # key found flag
cV=0 # hold the key's value
try:
myDict = {"Date": "Fri, 19 Apr 2019 00:54:46 GMT", "Vary": "Host,Accept-Encoding", "Key-Word": "00a", "Cache-Control": "private", "Key-Word": "xn"}
smallmyDict= {}
except (ValueError, SyntaxError) as E:
cV="error"
except Exception as E:
cV="error"
# convert the header's key to small letter
for key, value in myDict.items():
smallmyDict[key.lower()] = value
# store all keys
smallmyDictKeys =smallmyDict.keys()
# search for a specific key
if 'key-word' in smallmyDictKeys:
cP=1
cV = smallmyDict['key-word']
print("Found!")
print(cV) #print the key's value
else:
print("NOT Found!")
我得到的输出是:
成立! xn
问题是它只打印最后一个键的值。
如果我正在寻找的键出现多次并单独打印每个值,而不是用最后一个值覆盖它,我该如何让我的代码迭代它?
你可以使用json
来解析你的数据,使用json.loads的object_pairs_hook
参数对数据进行个性化的处理。 在下面的示例中,我将相同键的不同值分组在一个列表中(并且,按照您的评论的要求,将它们连接到一个字符串中):
import json
from collections import Counter, defaultdict
data = """{"Date": "Fri, 19 Apr 2019 00:54:46 GMT", "Vary": "Host,Accept-Encoding", "Key-Word": "00a", "Cache-Control": "private", "Key-Word": "xn"}
"""
def duplicate_keys(pairs):
out = {}
dups = defaultdict(list)
key_count = Counter(key for key, value in pairs)
for key, value in pairs:
if key_count[key] == 1:
out[key] = value
else:
dups[key].append(value)
# Concatenate the lists of values in a string, enclosed in {} and separated by ';'
# rather than in a list:
dups = {key: ';'.join('{' + v + '}' for v in values) for key, values in dups.items()}
out.update(dups)
return out
decoded = json.loads(data, object_pairs_hook=duplicate_keys)
print(decoded)
# {'Date': 'Fri, 19 Apr 2019 00:54:46 GMT',
# 'Vary': 'Host,Accept-Encoding',
# 'Cache-Control': 'private',
# 'Key-Word': '{00a};{xn}'}
您可以解析字符串并将值作为列表存储在字典中:
import ast
from pprint import pprint
def parse_dict_multikey(s):
p = ast.parse(s)
exp_dict = p.body[0].value
keys = list(map(ast.literal_eval, exp_dict.keys))
values = list(map(ast.literal_eval, exp_dict.values))
d = {}
for k, v in zip(keys, values):
d.setdefault(k, []).append(v)
return d
s = ('{"Date": "Fri, 19 Apr 2019 00:54:46 GMT",'
' "Vary": "Host,Accept-Encoding",'
' "Key-Word": "00a",'
' "Cache-Control": "private",'
' "Key-Word": "xn"}')
pprint(parse_dict_multikey(s))
# {'Cache-Control': ['private'],
# 'Date': ['Fri, 19 Apr 2019 00:54:46 GMT'],
# 'Key-Word': ['00a', 'xn'],
# 'Vary': ['Host,Accept-Encoding']}
但是,这使每个值都变成了一个列表,而不仅仅是那些具有重复键的值。 如果您使用Counter
,则可以避免这种情况,正如Thierry Lathuille建议的那样:
def parse_dict_multikey(s):
p = ast.parse(s)
exp_dict = p.body[0].value
keys = list(map(ast.literal_eval, exp_dict.keys))
values = list(map(ast.literal_eval, exp_dict.values))
c = Counter(keys)
d = {}
for k, v in zip(keys, values):
if c[k] > 1:
d.setdefault(k, []).append(v)
else:
d[k] = v
return d
这会给你:
{'Cache-Control': 'private',
'Date': 'Fri, 19 Apr 2019 00:54:46 GMT',
'Key-Word': ['00a', 'xn'],
'Vary': 'Host,Accept-Encoding'}
你还可以研究更高级的东西,比如multidict 。
字典中不能有 2 个同名的键。 一个会覆盖另一个。 在运行时,只有一对该密钥将存在(最后一个条目)。
https://www.python-course.eu/dictionaries.php - 是阅读字典的好资源。
由于您的数据由于重复键而无法直接加载到 json 中,请尝试以下操作:
from collections import defaultdict
string = '{"Date": "Fri, 19 Apr 2019 00:54:46 GMT", "Vary": "Host,Accept-Encoding", "Key-Word": "00a", "Cache-Control": "private", "Key-Word": "xn"}'
pieces = string.split('",')
for each_piece in pieces:
key, value = each_piece.split(':', maxsplit=1)
actual_key = key.strip('{"')
actual_value = value.strip(' "')
data[actual_key].append(actual_value)
print(data)
输出:
defaultdict(list,
{' "Cache-Control': ['private'],
' "Key-Word': ['00a', 'xn"}'],
' "Vary': ['Host,Accept-Encoding'],
'Date': ['Fri, 19 Apr 2019 00:54:46 GMT']})
当您定义 dict myDict = {"Date": "Fri, 19 Apr 2019 00:54:46 GMT", "Vary": "Host,Accept-Encoding", "Key-Word": "00a", "Cache-Control": "private", "Key-Word": "xn"}
你需要有不同的键值: 00a
和xn
。
您可以使用/转换为字符串some_str = '{"Date": "Fri, 19 Apr 2019 00:54:46 GMT", "Vary": "Host,Accept-Encoding", "Key-Word": "00a", "Cache-Control": "private", "Key-Word": "xn"}'
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.