How to pass multiple interrelated columns to the function on groupby and agg?

Question

I have the following pandas DataFrame df :

id  col1   col2
1   7      1.2
1   6      0.8
1   12     0.9
1   1      1.1
2   3      2.0
2   6      1.8
3   10     0.7
3   11     0.9
3   12     1.2

Here is the code to create this df :

import pandas as pd
df = pd.DataFrame({'id': [1,1,1,1,2,2,3,3,3], 
                   'col1': [7,6,12,1,3,6,10,11,12],
                   'col2': [1.2,0.8,0.9,1.1,2.0,1.8,0.7,0.9,1.2]})

I need to group by id and apply the function myfunc to each group. The problem is that myfunc requires several interrelated columns as an input. The final goal is to create a new column new_col for each id .

How can I do it?

This is my current code:

def myfunc(df, col1, col2):

    df1 = col1
    df2 = df[df[col2] < 1][[col1]]
    var1 = df1.iloc[0]
    var2 = df2.iloc[0][0]

    result = var2 - var1

    return result


df["new_col"] = df.groupby("id").agg(myfunc(...??))

Answer 1

In groupby-apply, my_func() is passed the entire group, with all columns. You can simply select the columns from that group:

def myfunc(g):
    var1 = g['col1'].iloc[0]
    var2 = g.loc[g['col2'] > 1, 'col1'].iloc[0]

    return var1 / var2

df['new_col'] = df.groupby("id").apply(myfunc)

How to pass multiple interrelated columns to the function on groupby and agg?

Question

1 answers

solution1
0 ACCPTED 2019-07-18 09:41:01

How to pass multiple interrelated columns to the function on groupby and agg?

Question

1 answers

solution1 0 ACCPTED 2019-07-18 09:41:01

solution1
0 ACCPTED 2019-07-18 09:41:01