简体   繁体   English

Pandas:如何在 groupby 对象上使用自定义应用函数返回多列

[英]Pandas: How to return multiple columns with a custom apply function on a groupby object

The basic idea is that I have a computation that involves multiple columns from a dataframe and returns multiple columns, which I'd like to integrate in the dataframe.基本思想是我有一个计算,它涉及来自数据帧的多列并返回多列,我想将其集成到数据帧中。 I'd like to do something like this:我想做这样的事情:

df = pd.DataFrame({'id':['i1', 'i1', 'i2', 'i2'], 'a':[1,2,3,4], 'b':[5,6,7,8]})

def custom_f(a, b):
    computation = a+b
    return computation + 1, computation*2

df['c1'], df['c2'] = df.groupby('id').apply(lambda x: custom_f(x.a, x.b))

Desired output:期望的输出:

    id  a   b  c1     c2
0   i1  1   5  7      12
1   i1  2   6  9      16
2   i2  3   7  11     20
3   i2  4   8  13     24

I know how I could do this one column at a time, but in reality the 'computation' operation using the two columns is quite expensive so I'm trying to figure out how I could only run it once.我知道如何一次完成一列,但实际上使用两列的“计算”操作非常昂贵,因此我试图弄清楚我如何只能运行一次。

EDIT: I realised that the given example can be solved without the groupby, but for my use case for the actual 'computation' I'm doing the groupby because I'm using the first and last values of arrays in each group for my computation.编辑:我意识到可以在没有 groupby 的情况下解决给定的示例,但是对于我的实际“计算”用例,我正在执行 groupby,因为我使用每个组中数组的第一个和最后一个值进行计算. For the sake of simplicity I omitted that, but imagine that it is needed.为了简单起见,我省略了它,但想象一下它是必要的。

df['c1'], df['c2'] = custom_f(df['a'], df['b']) # you dont need apply for your desired output here

you can try:你可以试试:

def custom_f(a, b):
    computation = a+b
    return pd.concat([(computation + 1),(computation*2)],axis=1)

Finally:最后:

df[['c1','c2']]=df.groupby('id').apply(lambda x: custom_f(x.a, x.b)).values

output of df : df输出:

    id  a   b   c1  c2
0   i1  1   5   7   12
1   i1  2   6   9   16
2   i2  3   7   11  20
3   i2  4   8   13  24

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM