[英]How can I combine values based on some key in python dict just like SQL GROUP BY
L = [{'id':1, 'quantity':1}, {'id':2, 'quantity':2}, {'id':1, 'quantity':3}]
I want to add quantity base on id , 我想基于id添加数量
So for the list above I would like the output to be: 因此,对于上面的列表,我希望输出为:
[{'id':1,'quantity':4},{'id':2,'quantity':2}]
another example: 另一个例子:
L = [{'id':1, 'quantity':1}, {'id':2, 'quantity':2}, {'id':1, 'quantity':2}, {'id':1, 'quantity':3}]
So for the list above I would like the output to be: 因此,对于上面的列表,我希望输出为:
[{'id':1, 'quantity':6}, {'id':2, 'quantity':2}]
In python "group by" functionality may be achieved by itertools.groupby()
function: 在python中, “分组依据”功能可以通过itertools.groupby()
函数实现:
import itertools
l = [{'id':1, 'quantity':1}, {'id':2, 'quantity':2}, {'id':1, 'quantity':3}]
result = [ {'id': k, 'quantity': sum(_['quantity'] for _ in g)}
for k,g in itertools.groupby(sorted(l, key=lambda x:x['id']), key=lambda x:x['id'])]
print(result)
The output: 输出:
[{'id': 1, 'quantity': 4}, {'id': 2, 'quantity': 2}]
This should do what you want: 这应该做您想要的:
from collections import defaultdict
def combine(items):
counts = defaultdict(int)
for d in items:
counts[d["id"]] += d["quantity"]
return [{"id": id, "quantity": q} for id, q in counts.items()]
Examples: 例子:
>>> combine([{'id':1, 'quantity':1}, {'id':2, 'quantity':2}, {'id':1, 'quantity':3}])
[{'quantity': 4, 'id': 1}, {'quantity': 2, 'id': 2}]
>>> combine([{'id':1, 'quantity':1}, {'id':2, 'quantity':2}, {'id':1, 'quantity':2}, {'id':1, 'quantity':3}])
[{'quantity': 6, 'id': 1}, {'quantity': 2, 'id': 2}]
This is about as simple and efficient as you're going to get. 这与您将要获得的一样简单和高效。
convert it to dataframe
and then back to dict
将其转换为dataframe
,然后返回到dict
import pandas as pd
L = [{'id':1, 'quantity':1}, {'id':2, 'quantity':2}, {'id':1, 'quantity':3}]
output=pd.DataFrame(L).groupby('id')['quantity'].sum().to_dict()
Assuming the input is properly defined, here I implemented in a intuitive way to achieve this: 假设输入定义正确,这里我以一种直观的方式实现了这一点:
output = {}
keys=[]
for e in L:
if e['id'] not in keys:
keys.append(e['id'])
output[e['id']] = e['quantity']
else:
output[e['id']] += e['quantity']
[{'id':key,'identity':values} for key,values in output.items()]
I was actually wondering that is there any further requirements, for instance, that you need a probably higher efficiency to perform a huge volume of data? 我实际上在想是否还有其他要求,例如,您需要更高的效率来执行大量数据? If yes, this method seems to be tedious. 如果是,则此方法似乎很乏味。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.