简体   繁体   中英

Sum values grouped by key in list of dict

I have a list of dicts and now I am trying to find the total jobs for each remote identifier.

In this case I am expecting for the id 64 -> 11 jobs and 68 -> 0 jobs

[{
    'jobs': {
        'count': 4
    },
    'remote_identifier': {
        'id': '64'
    }
}, {
    'jobs': {
        'count': 0
    },
    'remote_identifier': {
        'id': '68'
    }
}, {
    'jobs': {
        'count': 7
    },
    'remote_identifier': {
        'id': '64'
    }
}]

I already tried something like this, but I don't know how to adapt it to my needs, since that only counts the number of occurrences.

from collections import Counter
print Counter(item['remote_identifier']['id'] for item in items )

Pretty straight forward with a defaultdict . ( data is your original list.)

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> 
>>> for d_inner in data:
...     id_ = d_inner['remote_identifier']['id']
...     d[int(id_)] += d_inner['jobs']['count']
... 
>>> d
defaultdict(<type 'int'>, {64: 11, 68: 0})

You can use a defaultdict to add up the counts:

from collections import defaultdict

jobs = [{
    'jobs': {
        'count': 4
    },
    'remote_identifier': {
        'id': '64'
    }
}, {
    'jobs': {
        'count': 0
    },
    'remote_identifier': {
        'id': '68'
    }
}, {
    'jobs': {
        'count': 7
    },
    'remote_identifier': {
        'id': '64'
    }
}]

counts = defaultdict(int)

for job in jobs:
    counts[job['remote_identifier']['id']] += job['jobs']['count']

print(counts)

Output:

defaultdict(<class 'int'>, {'64': 11, '68': 0})

The simplest way is by using the itertools module, which provides the function groupby .

import itertools as it

def get_id(entry):
    return entry['remote_identifier']['id']

data.sort(key=get_id)
for key, group in it.groupby(data, get_id):
    print(key, sum(entry['jobs']['count'] for entry in group))

Note that groupby assumes that the data is already sorted by the key you are using to group the elements in the data.

This should do the trick:

result = {}
for i in items:
    ri = i['remote_identifier']['id']
    j = i['jobs']['count']
    if ri in result:
        result[ri] += j
    else:
        result[ri] = j
result
#{'68': 0, '64': 11}

Another solution is as follows:

input = [{
    'jobs': {
        'count': 4
    },
    'remote_identifier': {
        'id': '64'
    }
}, {
    'jobs': {
        'count': 0
    },
    'remote_identifier': {
        'id': '68'
    }
}, {
    'jobs': {
        'count': 7
    },
    'remote_identifier': {
        'id': '64'
    }
}]

res = dict()
for item in input:

    if item['remote_identifier']['id'] in res:
        total = res[item['remote_identifier']['id']] + item['jobs']['count']
    else:
        total = item['jobs']['count']
    res.update({item['remote_identifier']['id']: total})

print res

output:

{'68': 0, '64': 11}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM