[英]python: how to merge dict in list of dicts based on value
我有一个字典列表,其中每个字典由3个键组成:名称,URL和位置。
在整个字典中,只有“名称”的值可以相同,并且在整个列表中“ URL”和“位置”始终是不同的值。
例:
[
{"name":"A1", "url":"B1", "location":"C1"},
{"name":"A1", "url":"B2", "location":"C2"},
{"name":"A2", "url":"B3", "location":"C3"},
{"name":"A2", "url":"B4", "location":"C4"}, ...
]
然后,我想根据“名称”中的值将它们分组,如下所示。
预期:
[
{"name":"A1", "url":"B1, B2", "location":"C1, C2"},
{"name":"A2", "url":"B3, B4", "location":"C3, C4"},
]
(实际列表包含2,000多个字典)
我很高兴能解决这种情况。
任何建议/答案将不胜感激。
提前致谢。
使用辅助分组字典(对于Python> 3.5):
data = [
{"name":"A1", "url":"B1", "location":"C1"},
{"name":"A1", "url":"B2", "location":"C2"},
{"name":"A2", "url":"B3", "location":"C3"},
{"name":"A2", "url":"B4", "location":"C4"}
]
groups = {}
for d in data:
if d['name'] not in groups:
groups[d['name']] = {'url': d['url'], 'location': d['location']}
else:
groups[d['name']]['url'] += ', ' + d['url']
groups[d['name']]['location'] += ', ' + d['location']
result = [{**{'name': k}, **v} for k, v in groups.items()]
print(result)
输出:
[{'name': 'A1', 'url': 'B1, B2', 'location': 'C1, C2'}, {'name': 'A2', 'url': 'B3, B4', 'location': 'C3, C4'}]
由于您的数据集相对较小,因此我想这里的时间复杂度并不重要,因此您可以考虑使用以下代码。
from collections import defaultdict
given_data = [
{"name":"A1", "url":"B1", "location":"C1"},
{"name":"A1", "url":"B2", "location":"C2"},
{"name":"A2", "url":"B3", "location":"C3"},
{"name":"A2", "url":"B4", "location":"C4"},
]
D = defaultdict(list)
for item in given_data:
D[item['name']].append(item)
result = []
for x in D:
urls = ""
locations = ""
for pp in D[x]:
urls += pp['url']+" "
locations += pp['location']+" "
result.append({'name': x, 'url': urls.strip(), 'location': locations.strip()})
res
在哪里:
[{'location': 'C1', 'name': 'A1', 'url': 'B1'},
{'location': 'C2', 'name': 'A1', 'url': 'B2'},
{'location': 'C3', 'name': 'A2', 'url': 'B3'},
{'location': 'C4', 'name': 'A2', 'url': 'B4'}]
您可以使用defaultdict
数据并将结果解defaultdict
为列表推导:
from collections import defaultdict
result = defaultdict(lambda: defaultdict(list))
for items in res:
result[items['name']]['location'].append(items['location'])
result[items['name']]['url'].append(items['url'])
final = [
{'name': name, **{inner_names: ' '.join(inner_values) for inner_names, inner_values in values.items()}}
for name, values in result.items()
]
final
是:
In [57]: final
Out[57]:
[{'location': 'C1 C2', 'name': 'A1', 'url': 'B1 B2'},
{'location': 'C3 C4', 'name': 'A2', 'url': 'B3 B4'}]
使用@Yaroslav Surzhikov注释,这是使用itertools.groupby的解决方案
from itertools import groupby
dicts = [
{"name":"A1", "url":"B1", "location":"C1"},
{"name":"A1", "url":"B2", "location":"C2"},
{"name":"A2", "url":"B3", "location":"C3"},
{"name":"A2", "url":"B4", "location":"C4"},
]
def merge(dicts):
new_list = []
for key, group in groupby(dicts, lambda x: x['name']):
new_item = {}
new_item['name'] = key
new_item['url'] = []
new_item['location'] = []
for item in group:
new_item['url'].extend([item.get('url', '')])
new_item['location'].extend([item.get('location', '')])
new_item['url'] = ', '.join(new_item.get('url', ''))
new_item['location'] = ', '.join(new_item.get('location', ''))
new_list.append(new_item)
return new_list
print(merge(dicts))
像这样吗 小偏差:我更喜欢将URL和位置存储在resDict内的列表中 ,而不是附加在str中 。
myDict = [
{"name":"A1", "url":"B1", "location":"C1"},
{"name":"A1", "url":"B2", "location":"C2"},
{"name":"A2", "url":"B3", "location":"C3"},
{"name":"A2", "url":"B4", "location":"C4"}
]
resDict = []
def getKeys(d):
arr = []
for row in d:
arr.append(row["name"])
ret = list(set(arr))
return ret
def filteredDict(d, k):
arr = []
for row in d:
if row["name"] == k:
arr.append(row)
return arr
def compressedDictRow(rowArr):
urls = []
locations = []
name = rowArr[0]['name']
for row in rowArr:
urls.append(row['url'])
locations.append(row['location'])
return {"name":name,"urls":urls, "locations":locations}
keys = getKeys(myDict)
for key in keys:
rowArr = filteredDict(myDict,key)
row = compressedDictRow(rowArr)
resDict.append(row)
print(resDict)
输出(一行):
[
{'name': 'A2', 'urls': ['B3', 'B4'], 'locations': ['C3', 'C4']},
{'name': 'A1', 'urls': ['B1', 'B2'], 'locations': ['C1', 'C2']}
]
这是一个变体(使用它很难理解 ,感觉就像用左手抓挠了我的头部的右侧,但是在这一点上,我不知道如何使它变短)使用:
groupby
accumulate
list
和dict
) >>> pprint.pprint(initial_list) [{'location': 'C1', 'name': 'A1', 'url': 'B1'}, {'location': 'C2', 'name': 'A1', 'url': 'B2'}, {'location': 'C3', 'name': 'A2', 'url': 'B3'}, {'location': 'C4', 'name': 'A2', 'url': 'B4'}] >>> >>> NAME_KEY = "name" >>> >>> final_list = [list(itertools.accumulate(group_list, func=lambda x, y: {key: x[key] if key == NAME_KEY else " ".join([x[key], y[key]]) for key in x}))[-1] \\ ... for group_list in [list(group[1]) for group in itertools.groupby(sorted(initial_list, key=lambda x: x[NAME_KEY]), key=lambda x: x[NAME_KEY])]] >>> >>> pprint.pprint(final_list) [{'location': 'C1 C2', 'name': 'A1', 'url': 'B1 B2'}, {'location': 'C3 C4', 'name': 'A2', 'url': 'B3 B4'}]
基本原理 (从外到内 ):
itertools.groupby
)对应的值将字典分类在初始列表中
sorted
) itertools.accumulate
)
func
参数“ sum s” 2个字典,基于关键字:
注意事项 :
func
) lambda
不确定(性能明智)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.