python：如何根據值合並字典列表中的字典

Question

我有一個字典列表，其中每個字典由3個鍵組成：名稱，URL和位置。
在整個字典中，只有“名稱”的值可以相同，並且在整個列表中“ URL”和“位置”始終是不同的值。

例：

[
{"name":"A1", "url":"B1", "location":"C1"}, 
{"name":"A1", "url":"B2", "location":"C2"}, 
{"name":"A2", "url":"B3", "location":"C3"},
{"name":"A2", "url":"B4", "location":"C4"}, ...
]

然后，我想根據“名稱”中的值將它們分組，如下所示。

預期：

[
{"name":"A1", "url":"B1, B2", "location":"C1, C2"},
{"name":"A2", "url":"B3, B4", "location":"C3, C4"},
]

（實際列表包含2,000多個字典）

我很高興能解決這種情況。
任何建議/答案將不勝感激。

提前致謝。

Answer 1

使用輔助分組字典（對於Python> 3.5）：

data = [
    {"name":"A1", "url":"B1", "location":"C1"}, 
    {"name":"A1", "url":"B2", "location":"C2"}, 
    {"name":"A2", "url":"B3", "location":"C3"},
    {"name":"A2", "url":"B4", "location":"C4"}
]

groups = {}
for d in data:
    if d['name'] not in groups:
        groups[d['name']] = {'url': d['url'], 'location': d['location']}
    else:
        groups[d['name']]['url'] += ', ' + d['url']
        groups[d['name']]['location'] += ', ' + d['location']
result = [{**{'name': k}, **v} for k, v in groups.items()]

print(result)

輸出：

[{'name': 'A1', 'url': 'B1, B2', 'location': 'C1, C2'}, {'name': 'A2', 'url': 'B3, B4', 'location': 'C3, C4'}]

Answer 2

由於您的數據集相對較小，因此我想這里的時間復雜度並不重要，因此您可以考慮使用以下代碼。

from collections import defaultdict
given_data = [
    {"name":"A1", "url":"B1", "location":"C1"}, 
    {"name":"A1", "url":"B2", "location":"C2"}, 
    {"name":"A2", "url":"B3", "location":"C3"},
    {"name":"A2", "url":"B4", "location":"C4"},
] 
D = defaultdict(list)
for item in given_data:
    D[item['name']].append(item)
result = []
for x in D:
    urls = ""
    locations = ""
    for pp in D[x]:
        urls += pp['url']+" "
        locations += pp['location']+" "
    result.append({'name': x, 'url': urls.strip(), 'location': locations.strip()})

Answer 3

res在哪里：

[{'location': 'C1', 'name': 'A1', 'url': 'B1'},
 {'location': 'C2', 'name': 'A1', 'url': 'B2'},
 {'location': 'C3', 'name': 'A2', 'url': 'B3'},
 {'location': 'C4', 'name': 'A2', 'url': 'B4'}]

您可以使用defaultdict數據並將結果解defaultdict為列表推導：

from collections import defaultdict

result = defaultdict(lambda: defaultdict(list))

for items in res:
     result[items['name']]['location'].append(items['location'])
     result[items['name']]['url'].append(items['url'])

final = [
    {'name': name, **{inner_names: ' '.join(inner_values) for inner_names, inner_values in values.items()}}
    for name, values in result.items()
]

final是：

In [57]: final
Out[57]:
[{'location': 'C1 C2', 'name': 'A1', 'url': 'B1 B2'},
 {'location': 'C3 C4', 'name': 'A2', 'url': 'B3 B4'}]

Answer 4

使用@Yaroslav Surzhikov注釋，這是使用itertools.groupby的解決方案

from itertools import groupby

dicts = [
    {"name":"A1", "url":"B1", "location":"C1"},
    {"name":"A1", "url":"B2", "location":"C2"},
    {"name":"A2", "url":"B3", "location":"C3"},
    {"name":"A2", "url":"B4", "location":"C4"},
]

def merge(dicts):
    new_list = []
    for key, group in groupby(dicts, lambda x: x['name']):
        new_item = {}
        new_item['name'] = key
        new_item['url'] = []
        new_item['location'] = []
        for item in group:
            new_item['url'].extend([item.get('url', '')])
            new_item['location'].extend([item.get('location', '')])
        new_item['url'] = ', '.join(new_item.get('url', ''))
        new_item['location'] = ', '.join(new_item.get('location', ''))
        new_list.append(new_item)
    return new_list

print(merge(dicts))

Answer 5

像這樣嗎 小偏差：我更喜歡將URL和位置存儲在resDict內的列表中 ，而不是附加在str中 。

myDict = [
{"name":"A1", "url":"B1", "location":"C1"}, 
{"name":"A1", "url":"B2", "location":"C2"}, 
{"name":"A2", "url":"B3", "location":"C3"},
{"name":"A2", "url":"B4", "location":"C4"}
]

resDict = []

def getKeys(d):
    arr = []
    for row in d:
        arr.append(row["name"])
    ret = list(set(arr))
    return ret

def filteredDict(d, k):
    arr = []
    for row in d:
        if row["name"] == k:
            arr.append(row)
    return arr

def compressedDictRow(rowArr):
    urls = []
    locations = []
    name = rowArr[0]['name']

    for row in rowArr:
       urls.append(row['url'])
       locations.append(row['location'])
    return {"name":name,"urls":urls, "locations":locations}

keys = getKeys(myDict)

for key in keys:
    rowArr = filteredDict(myDict,key)
    row = compressedDictRow(rowArr)
    resDict.append(row)
print(resDict)

輸出（一行）：

[
    {'name': 'A2', 'urls': ['B3', 'B4'], 'locations': ['C3', 'C4']}, 
    {'name': 'A1', 'urls': ['B1', 'B2'], 'locations': ['C1', 'C2']}
]

Answer 6

這是一個變體（使用它很難理解 ，感覺就像用左手抓撓了我的頭部的右側，但是在這一點上，我不知道如何使它變短）使用：

[Python]：itertools-為高效循環創建迭代器的函數
- groupby
- accumulate
理解力（ list和dict ）

 >>> pprint.pprint(initial_list) [{'location': 'C1', 'name': 'A1', 'url': 'B1'}, {'location': 'C2', 'name': 'A1', 'url': 'B2'}, {'location': 'C3', 'name': 'A2', 'url': 'B3'}, {'location': 'C4', 'name': 'A2', 'url': 'B4'}] >>> >>> NAME_KEY = "name" >>> >>> final_list = [list(itertools.accumulate(group_list, func=lambda x, y: {key: x[key] if key == NAME_KEY else " ".join([x[key], y[key]]) for key in x}))[-1] \\ ... for group_list in [list(group[1]) for group in itertools.groupby(sorted(initial_list, key=lambda x: x[NAME_KEY]), key=lambda x: x[NAME_KEY])]] >>> >>> pprint.pprint(final_list) [{'location': 'C1 C2', 'name': 'A1', 'url': 'B1 B2'}, {'location': 'C3 C4', 'name': 'A2', 'url': 'B3 B4'}]

基本原理 （從外到內）：

根據名稱鍵（ itertools.groupby ）對應的值將字典分類在初始列表中
- 要使此列表正常工作，一個輔助操作是在分組之前對相同值的列表進行排序（已sorted ）
對於每個這樣的詞典組，執行其“ 總和 ”（ itertools.accumulate ）
- func參數“ sum s” 2個字典，基於關鍵字：
  - 如果關鍵是名字，只取值從1 ^日詞典（它是兩個字典一樣的，反正）
  - 否則，只需在兩個值（字符串）之間添加空格即可

注意事項 ：

字典必須保持同質（所有字典必須具有相同的結構（鍵））
只有名稱鍵是硬編碼的（但是，如果您決定添加其他非字符串的鍵，則也必須調整func ）
出於可讀性考慮，可以對其進行拆分
對lambda不確定（性能明智）

python：如何根據值合並字典列表中的字典

問題描述

6 個解決方案

解決方案1
4 2018-05-04 05:43:38

解決方案2
4 2018-05-04 06:05:01

解決方案3
1 已采納 2018-05-04 06:00:13

解決方案4
0 2018-05-04 05:58:01

解決方案5
0 2018-05-04 05:58:59

解決方案6
0 2018-05-04 07:03:40

python：如何根據值合並字典列表中的字典

問題描述

6 個解決方案

解決方案1 4 2018-05-04 05:43:38

解決方案2 4 2018-05-04 06:05:01

解決方案3 1 已采納 2018-05-04 06:00:13

解決方案4 0 2018-05-04 05:58:01

解決方案5 0 2018-05-04 05:58:59

解決方案6 0 2018-05-04 07:03:40

解決方案1
4 2018-05-04 05:43:38

解決方案2
4 2018-05-04 06:05:01

解決方案3
1 已采納 2018-05-04 06:00:13

解決方案4
0 2018-05-04 05:58:01

解決方案5
0 2018-05-04 05:58:59

解決方案6
0 2018-05-04 07:03:40