![](/img/trans.png)
[英]In python, what's the most efficient way to combine 3 dicts and to sort by one of the dict's keys?
[英]Most efficient way to build a new dict from a list of dicts with 2 common keys?
我試圖找出從 dict 列表構建新 dict 的最佳方法是什么,列表中的每個 dict 都有 2 個公共鍵,一個公共鍵值將是新 dict 的鍵,另一個公共鍵value 將是來自新字典的值列表中的一個元素。
我設法提供了 4 種不同的解決方案,這是我的示例:
list_of_dicts = [
{'key_1': 'v2', 'key_2': 'some data 1', 'key_3': 'some random data'},
{'key_1': 'v1', 'key_2': 'some data 2'},
{'key_1': 'v1', 'key_2': 'some data 1'},
{'key_1': 'v2', 'key_2': 'some data 2'}]
使用 collections.defaultdict 的解決方案 1:
from collections import defaultdict
group_by_key_1 = defaultdict(list)
for d in list_of_dicts:
group_by_key_1[d['key_1']].append(d['key_2'])
group_by_key_1
輸出 1:
defaultdict(list,
{'v2': ['some data 1', 'some data 2'],
'v1': ['some data 2', 'some data 1']})
使用 dict.setdefault 的解決方案 2:
group_by_key_1 = {}
for d in list_of_dicts:
group_by_key_1.setdefault(d['key_1'], []).append(d['key_2'])
group_by_key_1
輸出 2:
{'v2': ['some data 1', 'some data 2'], 'v1': ['some data 2', 'some data 1']}
方案三,如果有元素就追加或者用第一個元素添加一個列表:
group_by_key_1 = {}
for d in list_of_dicts:
if d['key_1'] not in group_by_key_1:
group_by_key_1[d['key_1']] = [d['key_2']]
else:
group_by_key_1[d['key_1']].append(d['key_2'])
group_by_key_1
輸出 3:
{'v2': ['some data 1', 'some data 2'], 'v1': ['some data 2', 'some data 1']}
解決方案 4,使用 itertools.groupby :
from itertools import groupby
from operator import itemgetter
list_of_dicts.sort(key=itemgetter('key_1'))
group = groupby(list_of_dicts, key=itemgetter('key_1'))
group_by_key_1 = dict((k, [e['key_2'] for e in v]) for k, v in group)
group_by_key_1
輸出 4:
{'v1': ['some data 2', 'some data 1'], 'v2': ['some data 1', 'some data 2']}
通常,我使用解決方案 1,但解決方案 2 和 3 似乎也不錯,但是,哪種解決方案是最有效的方法? 或者也許還有另一個最佳解決方案?
我想對數百萬個 list_of_dicts 使用上述解決方案之一,list_of_dicts 中的一個 dict 可以有 10 到 1000 個鍵。
如果針對 10 到 100_000 之間的 list_of_dicts 大小對解決方案進行基准測試,則 1 號解決方案顯示是最有效的:
from collections import defaultdict
from itertools import groupby
from operator import itemgetter
from simple_benchmark import BenchmarkBuilder
from random import randrange
b = BenchmarkBuilder()
@b.add_function()
def sol_1(list_of_dicts):
group_by_key_1 = defaultdict(list)
for d in list_of_dicts:
group_by_key_1[d['key_1']].append(d['key_2'])
return group_by_key_1
@b.add_function()
def sol_2(list_of_dicts):
group_by_key_1 = {}
for d in list_of_dicts:
group_by_key_1.setdefault(d['key_1'], []).append(d['key_2'])
return group_by_key_1
@b.add_function()
def sol_3(list_of_dicts):
group_by_key_1 = {}
for d in list_of_dicts:
if d['key_1'] not in group_by_key_1:
group_by_key_1[d['key_1']] = [d['key_2']]
else:
group_by_key_1[d['key_1']].append(d['key_2'])
return group_by_key_1
@b.add_function()
def sol_4(list_of_dicts):
list_of_dicts.sort(key=itemgetter('key_1'))
group = groupby(list_of_dicts, key=itemgetter('key_1'))
group_by_key_1 = dict((k, [e['key_2'] for e in v]) for k, v in group)
return group_by_key_1
@b.add_arguments('Size of list, number of keys')
def argument_provider():
for exp in range(2, 5):
size = 10**exp
keys_count = 1000
list_of_dicts = [{f'key_{i}': f'v{i}' for i in range(keys_count)} for _ in range(size)]
yield size, list_of_dicts
r = b.run()
r.plot()
輸出:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.