简体   繁体   English

groupby 并应用 function 到 pandas dataframe

[英]groupby and apply function to pandas dataframe

I have pandas dataframe with a columns client_id, customer_id, overall_date, fetched_date, cal_value, expo_value .我有 pandas dataframe 与列client_id, customer_id, overall_date, fetched_date, cal_value, expo_value based on this columns i have to apply business formula to predict output_value column using groupby condition to client_id & customer_id.基于此列,我必须应用业务公式来预测output_value列,使用 groupby 条件到 client_id 和 customer_id。

am unable to iterate each row and fetch output from a given dataframe我无法迭代每一行并从给定的 dataframe 获取 output

below is the function i have written for dataframe but its not working.下面是我为 dataframe 编写的 function 但它不起作用。

def cal_df(df):
    df= df.groupby(['client_id','customer_id'].reset_index()
    for i in df.iterows():
# loop to iterate each row & calculate values
        df.iloc[i]= df.iloc['cal_value'][i]/30 * df.iloc['expo_value'][:-1][i] + df.iloc['cal_value'][i]/60 *df.iloc['expo_value'][:-2][i]
    return df

data = data.apply(lambda x:cal_df(df))

Formula : (df['cal_value']/30) * df['expo_value']["if calculating for August month July month value should pick"] + df['cal_value']/60 * df['expo_value']["here June Month Value should pick"]公式:(df['cal_value']/30) * df['expo_value']["如果计算8月份7月份的值应该选择"] + df['cal_value']/60 * df['expo_value'][ “这里应该选择六月值”]

Example: Based on gropuby client_id, customer_id below formulation should be calculated:示例:根据 gropuby client_id,应计算以下公式的 customer_id:

  • For clientId 1) (45.9/30) * 777 +(45.9/30) * 289 = 1188.1+442.17 = 1630.27对于 clientId 1) (45.9/30) * 777 +(45.9/30) * 289 = 1188.1+442.17 = 1630.27
  • For clientId 2) (36.0/30) * 663 +(36.9/30) * 181 = 795.6+217.2 = 1012.8对于 clientId 2) (36.0/30) * 663 +(36.9/30) * 181 = 795.6+217.2 = 1012.8

Input Dataframe输入 Dataframe

client_id    expo_value  overall_date  customer_id   fetched_date     cal_value
1             289      2022-06-01      1449          2022-08-01        45.9
1             777      2022-07-01      1449          2022-08-01        45.9
1             155      2022-08-01      1449          2022-08-01        45.9

2             181      2022-06-01      2700          2022-08-01        36.0
2             663      2022-07-01      2700          2022-08-01        36.0
2             136      2022-08-01      2700          2022-08-01        36.0

Output Dataframe Output Dataframe

client_id expo_value overall_date  customer_id fetched_date  cal_value   output_value

1          155      2022-08-01      1449        2022-08-01     45.9         1630.27

2          136      2022-08-01      2700        2022-08-01     36.0          1012.8

you could also apply a regular function to a groupby, so this might work:您还可以将常规 function 应用于 groupby,因此这可能有效:

def get_result(df0):
    df0['output_val'] = (df0['cal_value'] * df0['expo_value'] / 30).iloc[:2].sum()
    return df0.drop_duplicates('client_id')

df.groupby('client_id').apply(get_result).reset_index(drop=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM