简体   繁体   中英

Merging multiple lists of dicts with same value in common key

I have 2 list of dicts

l1 = [{'company': 'XYZ', 'url': '/xyz', 'industry': 'Utilities', 'sector': 'Conventional electricity'}, {...}]
l2 = [{'url': '/xyz', 'industry': ['Electric utility'],  'website': ['xyz.com']}, {...}]

Every dict has a common key "url".

My requirement is 'url' key with same value in dict from all the list should be merged and if a particular key does not exist in the dicts while merging, then assign None/null for those keys.

Desired output should be:

[
{'company': 'XYZ', 'url': '/xyz', 'industry': ['Electric utility', 'Utilities'], 'sector': 'Conventional electricity',  'website': ['xyz.com']},
{...}
]

What I've tried so far is:

from itertools import groupby
from collections import ChainMap
from operator import itemgetter
from pprint import pprint


def merge_lists_of_dicts(list1, list2, by_key):
    dict_list = list1 + list2
    by_key = itemgetter(by_key)
    res = map(lambda dict_tuple: dict(ChainMap(*dict_tuple[1])),
              groupby(sorted(dict_list, key=by_key), key=by_key))

    return list(res)

pprint(merge_lists_of_dicts(l1, l2, "url"))

And the output I get is:

 [{'company': 'XYZ',
  'industry': 'Utilities',
  'sector': 'Internet',
  'url': '/xyz',
  'website': ['xyz.com']}, {...}]

Any help is appreciated. And a best pythonic way would be awesome. Thanks in advance

You can use itertools.groupby and then apply a custom merge function:

from itertools import groupby as gb
l1 = [{'company': 'XYZ', 'url': '/xyz', 'industry': 'Utilities', 'sector': 'Conventional electricity'}]
l2 = [{'url': '/xyz', 'industry': ['Electric utility'],  'website': ['xyz.com']}]
def merge(d, url):
   keys = [i for b in d for i in b]
   new_d = {i:(lambda x:None if not x else x)([b[i] for b in d if i in b]) for i in keys}
   _d = {a:b if b is None else b[0] if len(b) == 1 else [j for k in b for j in ([k] if not isinstance(k, list) else k)] for a, b in new_d.items()}
   return {**_d, 'url':url}

newl = [merge(list(b), a) for a, b in gb(sorted(l1+l2, key=lambda x:x['url']), key=lambda x:x['url'])]

Output:

[{'company': 'XYZ', 'url': '/xyz', 'industry': ['Utilities', 'Electric utility'], 'sector': 'Conventional electricity', 'website': ['xyz.com']}]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM