简体   繁体   中英

Add the other values of a dictionary if certain keys are the same

This is my input. I have a list of dictionaries:

[{'name1':'a', 'name2':'b','val1':10,'val2':20},
 {'name1':'a', 'name2':'b','val1':15,'val2':25},
 {'name1':'r', 'name2':'s','val1':30,'val2':20}] 

If the keys name1 and name2 have both the same value, then add val1 and val2 .

Here is the expected output:

[{'name1':'a', 'name2':'b','val1':25,'val2':45},
 {'name1':'r', 'name2':'s','val1':30,'val2':20}] 

In the first dict and second dict, both name1 is a and both name2 is b , so we add their values.

I was trying with loop but was not getting anywhere.

You can use collections.Counter and itertools.groupby :

>>> dicts = [{'name1':'a', 'name2':'b','val1':10,'val2':20},
 {'name1':'a', 'name2':'b','val1':15,'val2':25},
 {'name1':'r', 'name2':'s','val1':30,'val2':20}] 
>>> new_dicts = []
>>> for k, groups in groupby(dicts, lambda d: (d.pop('name1'), d.pop('name2'))):
        new_d = {
             'name1': k[0], 
             'name2': k[1], 
             **sum([Counter(g) for g in groups], Counter())
            }
        new_dicts.append(new_d)

>>> new_dicts
[{'name1': 'a', 'name2': 'b', 'val1': 25, 'val2': 45},
 {'name1': 'r', 'name2': 's', 'val1': 30, 'val2': 20}]

On the other hand, if you use pandas :

>>> pd.DataFrame(dicts).groupby(['name1', 'name2']).sum().reset_index().to_dict('r')
[{'name1': 'a', 'name2': 'b', 'val1': 25, 'val2': 45},
 {'name1': 'r', 'name2': 's', 'val1': 30, 'val2': 20}]

If you want to do this without modules, you can try:

>>> new_dicts = []
>>> for d in dicts:
        if not new_dicts:
            new_dicts.append(d)
        else:
            last_dict = new_dicts[-1]
            if (last_dict['name1'], last_dict['name2']) == (d['name1'], d['name2']):
                last_dict['val1'] += d['val1']
                last_dict['val2'] += d['val2']
            else:
                new_dicts.append(d)
>>> new_dicts
[{'name1': 'a', 'name2': 'b', 'val1': 25, 'val2': 45},
 {'name1': 'r', 'name2': 's', 'val1': 30, 'val2': 20}]

NOTE :

First and third solution assume that your list is sorted, ie same name1 name2 entries will appear consecutively, if that is not the case, you can add this line at the beginning:

>>> dicts = sorted(dicts, key=lambda x: (x['name1'], x['name2']))

You can just iterate and use an intermediate dictionary where (name1, name2) is the key to achieve linear time time complexity.

>>> for d in l:
...     name1, name2, val1, val2 = d['name1'], d['name2'], d['val1'], d['val2']
...     if (name1, name2) in res:
...             res[(name1, name2)] = res[(name1, name2)][0] + val1, res[(name1, name2)][1] + val2
...     else:
...             res[(name1, name2)] = (val1, val2)
... 
>>> res
{('a', 'b'): (25, 45), ('r', 's'): (30, 20)}
>>> output = [{'name1': k[0], 'name2': k[1], 'val1': v[0], 'val2': v[1]} for k,v in res.items()]
>>> output
[{'name1': 'a', 'name2': 'b', 'val1': 25, 'val2': 45}, {'name1': 'r', 'name2': 's', 'val1': 30, 'val2': 20}]

Run it through pandas, which is keenly good at this type of stuff. (and yes, this could probably be collapsed down to 1 or 2 chained statements.:

In [37]: a                                                                                    
Out[37]: 
[{'name1': 'a', 'name2': 'b', 'val1': 10, 'val2': 20},
 {'name1': 'a', 'name2': 'b', 'val1': 15, 'val2': 25},
 {'name1': 'r', 'name2': 's', 'val1': 30, 'val2': 20}]

In [38]: df =  pd.DataFrame(a)                                                                

In [39]: df                                                                                   
Out[39]: 
  name1 name2  val1  val2
0     a     b    10    20
1     a     b    15    25
2     r     s    30    20

In [40]: grouped_sum = df.groupby(['name1', 'name2']).sum()                                   

In [41]: grouped_sum                                                                          
Out[41]: 
             val1  val2
name1 name2            
a     b        25    45
r     s        30    20

In [42]: grouped_sum.reset_index(inplace=True)                                                

In [43]: data = grouped_sum.to_dict('records')                                                

In [44]: data                                                                                 
Out[44]: 
[{'name1': 'a', 'name2': 'b', 'val1': 25, 'val2': 45},
 {'name1': 'r', 'name2': 's', 'val1': 30, 'val2': 20}]

I suggest you to post the code you tried and then ask for help, so others can help by suggesting some changes. But something like this can help you,

di = [{'name1': 'a', 'name2': 'a', 'val1': 10, 'val2': 20},
      {'name1': 'a', 'name2': 'b', 'val1': 15, 'val2': 25},
      {'name1': 'r', 'name2': 's', 'val1': 30, 'val2': 20}]

for i in di:
    if i['name1'] == i['name2']:
        print("sum:", i['val1']+i['val2'])

It prints the sum of val1 and val2 if name1 amd name2 are equal.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM