[英]How do I use pandas groupby function to apply a formula based on the groupby value
My question may be a little confusing, so let me explain.我的问题可能有点混乱,所以让我解释一下。 I have a dataframe of information that I would like to group by the unique order id that will produce the following columns:
我有一个信息数据框,我想按将生成以下列的唯一订单 ID 进行分组:
sum qty = the total amount that was executed per order id. sum qty = 每个订单 ID 执行的总金额。 csv = this is the sum of the csv column per order id divided by the sum of the executed amount of the order id.
csv = 这是每个订单 ID 的 csv 列的总和除以订单 ID 的执行金额总和。
The first column is easy to create with groupby, it's the second column that I am having issues with.第一列很容易用 groupby 创建,这是我遇到问题的第二列。 Here is sample data that I am working with:
这是我正在使用的示例数据:
qty sym price ordrefno ord_bidprice ord_askprice csv
0 -25000 TEST 0.044 984842 0.0435 0.044 12.5
1 100 TEST 0.0443 984702 0.0435 0.044 0.03
2 -10000 TEST 0.0405 983375 0.039 0.0405 15
3 -100 TEST 0.0443 984842 0.0435 0.044 0.03
This is my code:这是我的代码:
cs1 = lambda x: np.sum(test.csv / test.qty)
f2 = {'qty' : ['sum'], 'csv' : {'es' : cs1}}
agg_td = trades.groupby('ordrefno').agg(f2)
Writing a named funtion and using apply
works:编写一个命名函数并使用
apply
工作:
def func(group):
sum_ = group.qty.sum()
es = (group.csv / group.qty).sum()
return pd.Series([sum_, es], index=['qty', 'es'])
trades.groupby('ordrefno').apply(func)
Result:结果:
qty es
ordrefno
983375 -10000 -0.0015
984702 100 0.0003
984842 -25100 -0.0008
Assuming you want the ratio of the sums rather than the sum of the ratios (the way the question is worded suggest this but the function in you code would give the sum of the ratios if applied to the df), I think the cleanest way to do this is in two steps.假设您想要总和的比率而不是比率的总和(问题的措辞方式表明了这一点,但是如果应用于 df,您代码中的函数将给出比率的总和),我认为最简洁的方法是这分两步完成。 First just get the sum of the two columns and then divide:
首先只是得到两列的总和,然后除以:
agg_td = trades.groupby('ordrefno')[['qty', 'csv']].sum()
agg_td.eval('es = csv/qty')
You could also create a special function and pass it to the groupby apply
method:您还可以创建一个特殊函数并将其传递给 groupby
apply
方法:
es = trades.groupby('ordrefno').apply(lambda df: df.csv.sum() / df.qty.sum())
But this will only get you the 'es'
column.但这只会为您提供
'es'
列。 The problem with using agg
is that the dict of functions are column-specific where here you need to combine two columns.使用
agg
的问题在于函数的 dict 是特定于列的,在这里您需要组合两列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.