I have the following pandas DataFrame df
:
id col1 col2
1 7 1.2
1 6 0.8
1 12 0.9
1 1 1.1
2 3 2.0
2 6 1.8
3 10 0.7
3 11 0.9
3 12 1.2
Here is the code to create this df
:
import pandas as pd
df = pd.DataFrame({'id': [1,1,1,1,2,2,3,3,3],
'col1': [7,6,12,1,3,6,10,11,12],
'col2': [1.2,0.8,0.9,1.1,2.0,1.8,0.7,0.9,1.2]})
I need to group by id
and apply the function myfunc
to each group. The problem is that myfunc
requires several interrelated columns as an input. The final goal is to create a new column new_col
for each id
.
How can I do it?
This is my current code:
def myfunc(df, col1, col2):
df1 = col1
df2 = df[df[col2] < 1][[col1]]
var1 = df1.iloc[0]
var2 = df2.iloc[0][0]
result = var2 - var1
return result
df["new_col"] = df.groupby("id").agg(myfunc(...??))
In groupby-apply, my_func()
is passed the entire group, with all columns. You can simply select the columns from that group:
def myfunc(g):
var1 = g['col1'].iloc[0]
var2 = g.loc[g['col2'] > 1, 'col1'].iloc[0]
return var1 / var2
df['new_col'] = df.groupby("id").apply(myfunc)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.