[英]Calculating from multiple columns based on groupby() in pandas
Suppose we want to do calculations between columns based on groups.假设我们要基于组在列之间进行计算。
The original dataframe:原厂dataframe:
data = {'order_id': [1, 1, 1, 2, 2, 3],
'quantity': [1, 3, 1, 1, 2, 2],
'item_price': [10, 6, 4, 5, 3, 6],}
df = pd.DataFrame(data, columns=['order_id', 'quantity', 'item_price'])
order_id | quantity | item_price
1 1 10
1 3 6
1 1 4
2 1 5
2 2 3
3 2 6
I want to calculate the total price for each order, it should be like:我想计算每个订单的总价,应该是这样的:
order_id | quantity | item_price | order_price
1 1 10 32
1 3 6 32
1 1 4 32
2 1 5 11
2 2 3 11
3 2 6 12
I get this by adding a new column item_price_total
:我通过添加一个新列
item_price_total
得到这个:
df['item_price_total'] = df['quantity'] * df['item_price']
And use grouby(['order_id'])['item_price_total'].transform('sum')
:并使用
grouby(['order_id'])['item_price_total'].transform('sum')
:
order_id | quantity | item_price | item_price_total | order_price
1 1 10 10 32
1 3 6 18 32
1 1 4 4 32
2 1 5 5 11
2 2 3 6 11
3 2 6 12 12
My question is how to get the result directly from quantity
and item_price
grouped on order_id
, without the use of item_price_total
?我的问题是如何直接从按
order_id
分组的quantity
和item_price
获得结果,而不使用item_price_total
? My thought is to use groupby(['order_id']).apply()
with lambda
function, but after many attempts, I still didn't find a solution for that.我的想法是使用
groupby(['order_id']).apply()
和lambda
function,但经过多次尝试,我仍然没有找到解决方案。
Thanks to Anky's idea,感谢Anky的想法,
You can try this:你可以试试这个:
result = pd.DataFrame(df['quantity'].mul(df['item_price'])
.groupby(df['order_id'])
.transform('sum'), columns=['order_price'])
.join(df)
print(result)
# order_price order_id quantity item_price
# 0 32 1 1 10
# 1 32 1 3 6
# 2 32 1 1 4
# 3 11 2 1 5
# 4 11 2 2 3
# 5 12 3 2 6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.