简体   繁体   中英

Python sum list of dicts by key with nested dicts

I have a list of dicts and would like to design a function to output a new dict which contains the sum for each unique key across all the dicts in the list.

For the list:

[
    {
         'apples': 1,
         'oranges': 1,
         'grapes': 2
    },
    {
         'apples': 3,
         'oranges': 5,
         'grapes': 8
    },
    {
         'apples': 13,
         'oranges': 21,
         'grapes': 34
    }
]

So far so good, this can be done with a counter:

def sumDicts(listToProcess):
    c = Counter()
    for entry in listToProcess:
        c.update(entry)
    return (dict(c))

Which correctly returns:

{'apples': 17, 'grapes': 44, 'oranges': 27}

The trouble comes when the dicts in my list start to contain nested dicts:

[
    {
        'fruits': {
            'apples': 1,
            'oranges': 1,
            'grapes': 2
            },
        'vegetables': {
            'carrots': 6,
            'beans': 3,
            'peas': 2
        },
        'grains': 4,
        'meats': 1  
    },
    {
        'fruits': {
            'apples': 3,
            'oranges': 5,
            'grapes': 8
            },
        'vegetables': {
            'carrots': 7,
            'beans': 4,
            'peas': 3
        },
        'grains': 3,
        'meats': 2  
    },
    {
        'fruits': {
            'apples': 13,
            'oranges': 21,
            'grapes': 34
            },
        'vegetables': {
            'carrots': 8,
            'beans': 5,
            'peas': 4
        },
        'grains': 2,
        'meats': 3
    },
]

Now the same function will give a TypeError because the counter can't add two Dicts.

The desired result would be:

{
    'fruits': {
        'apples': 17,
        'oranges': 27,
        'grapes': 44
        },
    'vegetables': {
        'carrots': 21,
        'beans': 12,
        'peas': 9
    },
    'grains': 9,
    'meats': 6  
}

Any ideas on how to do this in a reasonably efficient, Pythonic, generalizable way?

I would do this by performing a recursive merge on a recursively defined collections.defaultdict object.

from collections import defaultdict

def merge(d, new_d):
    for k, v in new_d.items():
        if isinstance(v, dict):
            merge(d[k], v)
        else: 
            d[k] = d.setdefault(k, 0) + v

# https://stackoverflow.com/a/19189356/4909087    
nested = lambda: defaultdict(nested)
d = nested()

for subd in data:
    merge(d, subd)

Using default_to_regular to convert it back, we have:

default_to_regular(d)
# {
#     "fruits": {
#         "apples": 17,
#         "oranges": 27,
#         "grapes": 44
#     },
#     "vegetables": {
#         "carrots": 21,
#         "beans": 12,
#         "peas": 9
#     },
#     "grains": 9,
#     "meats": 6
# }

You can use recursion. This solution finds all the dictionary keys in the input passed to merge , and then sums the values for each key if the values are integers. If the values are dictionaries, however, merge is called again:

def merge(c):
  _keys = {i for b in c for i in b}
  return {i:[sum, merge][isinstance(c[0][i], dict)]([h[i] for h in c]) for i in _keys}

d = [{'fruits': {'apples': 1, 'oranges': 1, 'grapes': 2}, 'vegetables': {'carrots': 6, 'beans': 3, 'peas': 2}, 'grains': 4, 'meats': 1}, {'fruits': {'apples': 3, 'oranges': 5, 'grapes': 8}, 'vegetables': {'carrots': 7, 'beans': 4, 'peas': 3}, 'grains': 3, 'meats': 2}, {'fruits': {'apples': 13, 'oranges': 21, 'grapes': 34}, 'vegetables': {'carrots': 8, 'beans': 5, 'peas': 4}, 'grains': 2, 'meats': 3}]

import json
print(json.dumps(merge(d), indent=4))

Output:

{
 "meats": 6,
 "grains": 9,
 "fruits": {
    "grapes": 44,
    "oranges": 27,
    "apples": 17
 },
"vegetables": {
     "beans": 12,
     "peas": 9,
     "carrots": 21
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM