简体   繁体   English

如何使用 pandas groupby 函数根据 groupby 值应用公式

[英]How do I use pandas groupby function to apply a formula based on the groupby value

My question may be a little confusing, so let me explain.我的问题可能有点混乱,所以让我解释一下。 I have a dataframe of information that I would like to group by the unique order id that will produce the following columns:我有一个信息数据框,我想按将生成以下列的唯一订单 ID 进行分组:

sum qty = the total amount that was executed per order id. sum qty = 每个订单 ID 执行的总金额。 csv = this is the sum of the csv column per order id divided by the sum of the executed amount of the order id. csv = 这是每个订单 ID 的 csv 列的总和除以订单 ID 的执行金额总和。

The first column is easy to create with groupby, it's the second column that I am having issues with.第一列很容易用 groupby 创建,这是我遇到问题的第二列。 Here is sample data that I am working with:这是我正在使用的示例数据:

    qty     sym     price   ordrefno    ord_bidprice    ord_askprice    csv
0   -25000  TEST    0.044   984842      0.0435          0.044          12.5
1   100     TEST    0.0443  984702      0.0435          0.044          0.03
2   -10000  TEST    0.0405  983375      0.039           0.0405         15
3   -100    TEST    0.0443  984842      0.0435          0.044          0.03

This is my code:这是我的代码:

cs1 = lambda x: np.sum(test.csv / test.qty)
f2 = {'qty' : ['sum'], 'csv' : {'es' : cs1}}

agg_td = trades.groupby('ordrefno').agg(f2)

Writing a named funtion and using apply works:编写一个命名函数并使用apply工作:

def func(group):
    sum_ = group.qty.sum()
    es = (group.csv / group.qty).sum()
    return pd.Series([sum_, es], index=['qty', 'es'])

trades.groupby('ordrefno').apply(func)

Result:结果:

            qty     es
ordrefno               
983375   -10000 -0.0015
984702      100  0.0003
984842   -25100 -0.0008

Assuming you want the ratio of the sums rather than the sum of the ratios (the way the question is worded suggest this but the function in you code would give the sum of the ratios if applied to the df), I think the cleanest way to do this is in two steps.假设您想要总和的比率而不是比率的总和(问题的措辞方式表明了这一点,但是如果应用于 df,您代码中的函数将给出比率的总和),我认为最简洁的方法是这分两步完成。 First just get the sum of the two columns and then divide:首先只是得到两列的总和,然后除以:

agg_td = trades.groupby('ordrefno')[['qty', 'csv']].sum()
agg_td.eval('es = csv/qty')

You could also create a special function and pass it to the groupby apply method:您还可以创建一个特殊函数并将其传递给 groupby apply方法:

es = trades.groupby('ordrefno').apply(lambda df: df.csv.sum() / df.qty.sum()) 

But this will only get you the 'es' column.但这只会为您提供'es'列。 The problem with using agg is that the dict of functions are column-specific where here you need to combine two columns.使用agg的问题在于函数的 dict 是特定于列的,在这里您需要组合两列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM