简体   繁体   English

根据条件合并两个字典列表

[英]Merge two list of dictionaries based on a condition

I have two lists of dictionaries, and I need to merge them when ever USA and GOOG are the same.我有两个词典列表,当USAGOOG相同时,我需要合并它们。

list1 = 
[{'USA': 'Eastern', 
  'GOOG': '2019', 
  'Up': {'Upfront': 45}, 
  'Right': {'Upfront': 12}}, 

 {'USA': 'Western', 
  'GOOG': '2019', 
  'Up': {'Upfront': 10}, 
  'Right': {'Upfront': 15}}]

list2=
[{'USA': 'Western', 
  'GOOG': '2019', 
  'Down': {'Downback': 35}, 
  'Right': {'Downback': 25}}, 

 {'USA': 'Eastern', 
  'GOOG': '2018', 
  'Down': {'Downback': 15}, 
  'Right': {'Downback': 55}}]

Since USA and GOOG had same values for 2nd element in list1 and 1st element in list2 , so they should be merged.由于USAGOOG对于list1中的第二个元素和list2中的第一个元素具有相同的值,因此应将它们合并。 The result expected is as follows -预期结果如下——

Result = 
[{'USA': 'Eastern', 
  'GOOG': '2019', 
  'Up': {'Upfront': 45}, 
  'Right': {'Upfront': 12}}, 

 {'USA': 'Western', 
  'GOOG': '2019', 
  'Up': {'Upfront': 10}, 
  'Down': {'Downback': 35}, 
  'Right': {'Upfront': 15, 'Downback': 25}},

 {'USA': 'Eastern', 
  'GOOG': '2018', 
  'Down': {'Downback': 15}, 
  'Right': {'Downback': 55}}]

How can we write a generic code for this.我们如何为此编写通用代码。 I tried using defaultdict , but did not know how to concatenate an arbitrary number of rest of dictionaries.我尝试使用defaultdict ,但不知道如何连接任意数量的其余词典。

My attempt:我的尝试:

from collections import defaultdict
dics = list1+list2

for dic in dics:
    for key, val in dic.items():
        dd[key].append(val)            

for dic in dics:
    for key, val in dic.items(): 
        dd[key].append(val)

There are two algorithmic tasks in what you need: find the records that have the same values for USA and GOOGL, and then joining then and do that in a way that if the same key exists in both records, their value is merged. 您需要执行两项算法任务:查找USA和GOOGL的值相同的记录,然后将它们连接起来,并且如果两个记录中都存在相同的键,则将它们的值合并。

The naive approach for the first would be to have a for loop that would iterate the values of list1, and for each value, iterate all values for list2 - two separated loops won't cut it, you'd need two nested for loops: 第一种方法的幼稚方法是有一个for循环,该循环将迭代list1的值,而对于每个值,则迭代list2的所有值-两个分开的循环不会切割它,您需要两个嵌套的 for循环:

for element in list1:
    for other_element in list2:
        if ...:
            ...

While this approach would work, and is fine for small lists (<1000 records, for example), it takes an amount of time and resources that are proportional to the square of your list sizes - that is, for lists that are close to ~1000 items we are talking 1 million iterations. 尽管这种方法行之有效,并且适用于小型列表(例如,小于1000条记录),但是它花费的时间和资源与列表大小的平方成正比-也就是说,对于接近〜的列表我们正在讨论的1000个项目有1百万次迭代。 If the lists are thenselves 1.000.000 items, the computation would take 1 * 10^12 comparisons, and that is not feasible in today's computers at all. 如果列表中只有1.000.000项,则该计算将进行1 * 10 ^ 12比较,而这在当今的计算机中根本不可行。

So, a nice solution is to re-create one of the lists in a way that the comparison key is used as a hash -that is done by copying the list to a dictionary where the keys are the values you want to compare, and then iterate on the second list just once. 因此,一个不错的解决方案是以比较键用作哈希的方式重新创建列表之一,方法是将列表复制到字典中,键是您要比较的值,然后仅在第二个列表上重复一次。 As dictionaries have a constant time to find items, that will make the number of comparisons be proportional to your list sizes. 由于词典有固定的时间来查找项目,因此比较次数将与列表大小成比例。

The second part of your task is to compare to copy one record to a result list, and update the keys on the resulting copy so that any duplciate keys are merged. 任务的第二部分是比较以将一条记录复制到结果列表,并更新结果副本上的键,以便合并任何重复的键。 To avoid a problem when copying the first records, we are safer using Python's copy.deepcopy , which will ensure the sub-dictionaries are different objects than the ones in the original record, and will stay isolated. 为了避免在复制第一条记录时出现问题,我们使用Python的copy.deepcopy更加安全,这将确保子词典与原始记录中的对象不同,并且保持隔离状态。

from copy import deepcopy

def merge_lists(list1, list2):
    # create dictionary from list1:
    dict1 = {(record["GOOG"], record["USA"]): record  for record in list1}

    #compare elements in list2 to those on list1:

    result = {}
    for record in list2:
        ckey = record["GOOG"], record["USA"]
        new_record = deepcopy(record)
        if ckey in dict1:
            for key, value in dict1[ckey].items():
                if key in ("GOOG", "USA"):
                    # Do not merge these keys
                    continue
                # Dict's "setdefault" finds a key/value, and if it is missing
                # creates a new one with the second parameter as value
                new_record.setdefault(key, {}).update(value)

        result[ckey] = new_record

    # Add values from list1 that were not matched in list2:
    for key, value in dict1.items():
        if key not in result:
            result[key] = deepcopy(value)

    return list(result.values())

Here is my attempt. 这是我的尝试。 Not sure if this is the best way, but it's a start. 不知道这是否是最好的方法,但这只是一个开始。

Steps: 脚步:

  • combine lists of dictionaries 合并字典列表
  • create a sorted collection of the relevant values and index in combined list 创建相关值和组合列表中索引的排序集合
  • group by the relevant values 按相关值分组
  • iterate over the keys and groups adding the dictionary if it only appears once based on value matches or update a dictionary if is appears more than once based on value matches 遍历键和组,如果字典仅根据值匹配出现一次,则添加字典;如果字典根据值匹配多次出现,则更新字典

Code: 码:

import operator as op
import itertools as it
from functools import reduce
from pprint import pprint

dictionaries = reduce(op.add, (list1, list2,))
groups = it.groupby(sorted([(op.itemgetter('USA', 'GOOG')(d), i)
                            for i, d in enumerate(dictionaries)]),
                    key=op.itemgetter(0))
results = []
for key, group in groups:
    _, indices = zip(*group)
    if len(indices) == 1:
        i, = indices
        results.append(dictionaries[i])
    else:
        merge = dictionaries[indices[0]]
        for i in indices[1:]:
            merge.update(dictionaries[i])
        results.append(merge)
pprint(results, indent=4)

OUTPUT: 输出:

[ { 'Down': {'Downback': 15}, 'GOOG': '2018', 'Right': {'Downback': 55}, 'USA': 'Eastern'}, { 'GOOG': '2019', 'Right': {'Upfront': 12}, 'USA': 'Eastern', 'Up': {'Upfront': 45}}, { 'Down': {'Downback': 35}, 'GOOG': '2019', 'Right': {'Downback': 25}, 'USA': 'Western', 'Up': {'Upfront': 10}}] [{'Down':{'Downback':15},'GOOG':'2018','Right':{'Downback':55},'USA':'Eastern'},{'GOOG':'2019 ','Right':{'Upfront':12},'USA':'Eastern','Up':{'Upfront':45}},{'Down':{'Downback':35},'GOOG ':'2019','Right':{'Downback':25},'USA':'Western','Up':{'Upfront':10}}]]

Here is my attempt at a solution. 这是我尝试解决的方法。 It manages to reproduce the results you requested. 它设法重现您请求的结果。 Please ignore how badly named my variables are. 请忽略变量的命名错误。 I found this problem quite interesting. 我发现这个问题很有趣。

def joinListByDictionary(list1, list2):
    """Join lists on USA and GOOG having the same value"""
    list1.extend(list2)
    matchIndx = []
    matches = []    

    for dicts in range(len(list1)):
        for dicts2 in range(len(list1)):
            if dicts == dicts2:
                continue
            if list1[dicts]["GOOG"] == list1[dicts2]["GOOG"] and list1[dicts]["USA"] == list1[dicts2]["USA"]:

                matches.append(list1[dicts])
                matchIndx.append(dicts) 
    for dictz in matches:
        for dictzz in matches:
            for key in dictz.keys():
                if key in dictzz.keys() and isinstance(dictzz[key], dict):
                    dictzz[key].update(dictz[key])          
        matches.remove(dictz)

    newList = [list1[ele] for ele in range(len(list1)) if ele not in matchIndx]
    newList.extend(matches)
    print newList
    return newList       

joinListByDictionary(list1, list2)
list1 = [{'USA': 'Eastern', 
  'GOOG': '2019', 
  'Up': {'Upfront': 45}, 
  'Right': {'Upfront': 12}}, 

 {'USA': 'Western', 
  'GOOG': '2019', 
  'Up': {'Upfront': 10}, 
  'Right': {'Upfront': 15}}]

list2=[{'USA': 'Western', 
  'GOOG': '2019', 
  'Down': {'Downback': 35}, 
  'Right': {'Downback': 25}}, 

 {'USA': 'Eastern', 
  'GOOG': '2018', 
  'Down': {'Downback': 15}, 
  'Right': {'Downback': 55}}]



def mergeDicts(d1,d2):
    for k,v in d2.items():
        if k in d1:
            if isinstance(v,dict):
                mergeDicts(d1[k], v)
                
            else: d1[k]=v 
        else: d1[k]=v
        
def merge_lists(list1, list2):
    merged_list = []
    for d1 in list1:
        for d2 in list2:
            if d1['USA'] == d2['USA'] and d1['GOOG'] == d2['GOOG']:
                mergeDicts(d1, d2)
                merged_list.append(d1)
                break
        else:
            merged_list.append(d1)
    for d2 in list2:
        for d1 in list1:
            if d1['USA'] == d2['USA'] and d1['GOOG'] == d2['GOOG']:
                break
        else:
            merged_list.append(d2)
    return merged_list

res1 = merge_lists(list1, list2)
print(res1)
               
"""
[{'USA': 'Eastern', 'GOOG': '2019', 'Up': {'Upfront': 45}, 'Right': {'Upfront': 12}}, 
{'USA': 'Western', 'GOOG': '2019', 'Up': {'Upfront': 10}, 
'Right': {'Upfront': 15, 'Downback': 25},
 'Down': {'Downback': 35}}, 
 {'USA': 'Eastern', 'GOOG': '2018', 'Down': {'Downback': 15}, 'Right': {'Downback': 55}}]
"""                
                

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM