[英]groupby and apply function to pandas dataframe
I have pandas dataframe with a columns client_id, customer_id, overall_date, fetched_date, cal_value, expo_value
.我有 pandas dataframe 与列
client_id, customer_id, overall_date, fetched_date, cal_value, expo_value
。 based on this columns i have to apply business formula to predict output_value
column using groupby condition to client_id & customer_id.基于此列,我必须应用业务公式来预测
output_value
列,使用 groupby 条件到 client_id 和 customer_id。
am unable to iterate each row and fetch output from a given dataframe我无法迭代每一行并从给定的 dataframe 获取 output
below is the function i have written for dataframe but its not working.下面是我为 dataframe 编写的 function 但它不起作用。
def cal_df(df):
df= df.groupby(['client_id','customer_id'].reset_index()
for i in df.iterows():
# loop to iterate each row & calculate values
df.iloc[i]= df.iloc['cal_value'][i]/30 * df.iloc['expo_value'][:-1][i] + df.iloc['cal_value'][i]/60 *df.iloc['expo_value'][:-2][i]
return df
data = data.apply(lambda x:cal_df(df))
Formula : (df['cal_value']/30) * df['expo_value']["if calculating for August month July month value should pick"] + df['cal_value']/60 * df['expo_value']["here June Month Value should pick"]公式:(df['cal_value']/30) * df['expo_value']["如果计算8月份7月份的值应该选择"] + df['cal_value']/60 * df['expo_value'][ “这里应该选择六月值”]
Example: Based on gropuby client_id, customer_id below formulation should be calculated:示例:根据 gropuby client_id,应计算以下公式的 customer_id:
Input Dataframe输入 Dataframe
client_id expo_value overall_date customer_id fetched_date cal_value
1 289 2022-06-01 1449 2022-08-01 45.9
1 777 2022-07-01 1449 2022-08-01 45.9
1 155 2022-08-01 1449 2022-08-01 45.9
2 181 2022-06-01 2700 2022-08-01 36.0
2 663 2022-07-01 2700 2022-08-01 36.0
2 136 2022-08-01 2700 2022-08-01 36.0
Output Dataframe Output Dataframe
client_id expo_value overall_date customer_id fetched_date cal_value output_value
1 155 2022-08-01 1449 2022-08-01 45.9 1630.27
2 136 2022-08-01 2700 2022-08-01 36.0 1012.8
you could also apply a regular function to a groupby, so this might work:您还可以将常规 function 应用于 groupby,因此这可能有效:
def get_result(df0):
df0['output_val'] = (df0['cal_value'] * df0['expo_value'] / 30).iloc[:2].sum()
return df0.drop_duplicates('client_id')
df.groupby('client_id').apply(get_result).reset_index(drop=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.