简体   繁体   English

如何将多个相互关联的列传递给groupby和agg上的函数?

[英]How to pass multiple interrelated columns to the function on groupby and agg?

I have the following pandas DataFrame df : 我有以下熊猫DataFrame df

id  col1   col2
1   7      1.2
1   6      0.8
1   12     0.9
1   1      1.1
2   3      2.0
2   6      1.8
3   10     0.7
3   11     0.9
3   12     1.2

Here is the code to create this df : 这是创建此df的代码:

import pandas as pd
df = pd.DataFrame({'id': [1,1,1,1,2,2,3,3,3], 
                   'col1': [7,6,12,1,3,6,10,11,12],
                   'col2': [1.2,0.8,0.9,1.1,2.0,1.8,0.7,0.9,1.2]})

I need to group by id and apply the function myfunc to each group. 我需要按id分组,并将函数myfunc应用于每个组。 The problem is that myfunc requires several interrelated columns as an input. 问题是myfunc需要几个相互关联的列作为输入。 The final goal is to create a new column new_col for each id . 最终目标是为每个id创建一个新列new_col

How can I do it? 我该怎么做?

This is my current code: 这是我当前的代码:

def myfunc(df, col1, col2):

    df1 = col1
    df2 = df[df[col2] < 1][[col1]]
    var1 = df1.iloc[0]
    var2 = df2.iloc[0][0]

    result = var2 - var1

    return result


df["new_col"] = df.groupby("id").agg(myfunc(...??))

In groupby-apply, my_func() is passed the entire group, with all columns. 在groupby-apply中, my_func()传递给整个组以及所有列。 You can simply select the columns from that group: 您可以简单地从该组中选择列:

def myfunc(g):
    var1 = g['col1'].iloc[0]
    var2 = g.loc[g['col2'] > 1, 'col1'].iloc[0]

    return var1 / var2

df['new_col'] = df.groupby("id").apply(myfunc)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM