简体   繁体   中英

Python merge two lists of dicts, where dict key matches

I have two lists containing ~100 dicts in each. A dummy illustration is shown here:

first = [
    {"ip-10-1-1-1": {"job": {"company": "IBM", "title": "engineer"}}},
    {"ip-10-1-1-20": {"job": {"company": "Dell", "title": "manager"}}},
    {"ip-10-1-1-35": {"job": {"company": "Apple", "title": "CEO"}}},
]

second = [
    {"ip-10-1-1-1": {"demographics": {"age": 30, "gender": "female"}}},
    {"ip-10-1-1-20": {"demographics": {"age": "30", "gender": "male"}}},
    {"ip-10-1-1-49": {"demographics": {"age": "32", "gender": "female"}}},
]

I'm trying to merge these with this result:

[
    {
        "ip-10-1-1-1": {
            "demographics": {"age": 30, "gender": "female"},
            "job": {"company": "IBM", "title": "engineer"},
        }
    },
    {
        "ip-10-1-1-20": {
            "demographics": {"age": "30", "gender": "male"},
            "job": {"company": "Dell", "title": "manager"},
        }
    },
    {"ip-10-1-1-35": {"job": {"company": "Apple", "title": "CEO"}}},
    {"ip-10-1-1-49": {"demographics": {"age": "32", "gender": "female"}}},
]


I can nearly accomplish this by looping over second and first like so:

merged = []
for d1 in second:
    for k1 in d1.keys():
        for d2 in first:
            for k2 in d2.keys():
                if k2 == k1:
                    d1[k1]["job"] = d2[k2]["job"]
    merged.append(d1)
print(merged)

I'm new to python however and I'm hoping/thinking there's gotta be a more pythonic way to do this.

EDIT: To further complicate things - I can have keys in first but not second , and vice versa. I've updated the example to reflect this.

Here is a possible one-line solution:

result = {f'user_{i + 1}': dict(**d1[f'user_{i + 1}'], **d2[f'user_{i + 1}'])
          for i, (d1, d2) in enumerate(zip(first, second))}

If you are using python >=3.8 then you can use assignments in the dict comprehension:

result = {(key := f'user_{i + 1}'): dict(**d1[key], **d2[key])
          for i, (d1, d2) in enumerate(zip(first, second))}
first = [
    {"user_1": {"job": {"company": "IBM", "title": "engineer"}}},
    {"user_2": {"job": {"company": "Dell", "title": "manager"}}},
    {"user_3": {"job": {"company": "Microsoft", "title": "manager"}}},
]
second = [
    {"user_2": {"demographics": {"age": "30", "gender": "male"}}},
    {"user_1": {"demographics": {"age": "30", "gender": "female"}}},
]


def merge_list_of_dicts(list_of_dicts, current={}):
    get_key = lambda d: next(iter(d))
    get_value = lambda d: next(iter(d.values()))

    for d in list_of_dicts:
        key = get_key(d)
        value = get_value(d)

        if key not in current:
            current[key] = value
        else:
            current[key].update(value)

    return current


output = merge_list_of_dicts(second, merge_list_of_dicts(first))
print(output)

Will take care of extra keys in first , as well as out of order dictionaries (notice first and second list above). Outputs:

{
    "user_1": {
        "job": {"company": "IBM", "title": "engineer"},
        "demographics": {"age": "30", "gender": "female"},
    },
    "user_2": {
        "job": {"company": "Dell", "title": "manager"},
        "demographics": {"age": "30", "gender": "male"},
    },
    "user_3": {"job": {"company": "Microsoft", "title": "manager"}},
}

If you want a cleaner API:

def merge_list_of_dicts(d1: dict, d2: dict):
    def merge(list_of_dicts, current={}):
        get_key = lambda d: next(iter(d))
        get_value = lambda d: next(iter(d.values()))

        for d in list_of_dicts:
            key = get_key(d)
            value = get_value(d)

            if key not in current:
                current[key] = value
            else:
                current[key].update(value)

        return current
    return merge(d2, merge(d1))


output = merge_list_of_dicts(first, second)
import itertools

final_list = []
for key, group in itertools.groupby(
    sorted(first + second, key=lambda x: tuple(x)[0]), key=lambda x: tuple(x)[0]
):
    temp = {key: {}}
    for d in group:
        _, value = tuple(*d.items())
        temp[key].update(value)
    final_list.append(temp)
print(final_list)

Output:

[{'ip-10-1-1-1': {'job': {'company': 'IBM', 'title': 'engineer'}, 'demographics': {'age': 30, 'gender': 'female'}}}, {'ip-10-1-1-20': {'job': {'company': 'Dell', 'title': 'manager'}, 'demographics': {'age': '30', 'gender': 'male'}}}, {'ip-10-1-1-35': {'job': {'company': 'Apple', 'title': 'CEO'}}}, {'ip-10-1-1-49': {'demographics': {'age': '32', 'gender': 'female'}}}]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM