I am trying to create a calculated column in a pandas data frame that runs a different calculation based on another column in the data frame.
First I tried:
df_rollup['modeled_days'] = abs(round(((df_rollup.risk_avg) - 31) / (master_weight /100) / (prod_tolerance / 100), 0)).where(df_rollup['completion_status'] == 'PRODUCING')
df_rollup['modeled_days'] = abs(round(((df_rollup.risk_avg) - 31) / (master_weight / 100) / (shutin_tolerance / 100), 0)).where(df_rollup['completion_status'] == 'SHUT IN')
df_rollup['modeled_days'] = abs(round(((df_rollup.risk_avg) - 31) / (master_weight / 100) / (abandoned_tolerance / 100), 0)).where(df_rollup['completion_status'].str.contains('ABANDONED'))
I quickly realized that this would overwrite every row with the last updates calculatation and replace the non matching rows to Nan.
So I researched another approach that I believe is on the right track but I receive the error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
This is my approach:
def production_type_calc(df_rollup, master_weight, prod_tolerance, shutin_tolerance, abandoned_tolerance):
if df_rollup['completion_status'] == 'PRODUCING':
return abs(round((df_rollup.risk_avg - 31) / (master_weight / 100) / (prod_tolerance / 100), 0))
elif df_rollup['completion_status'] == 'SHUT IN':
return abs(round((df_rollup.risk_avg - 31) / (master_weight / 100) / (shutin_tolerance / 100), 0))
elif df_rollup['completion_status'].str.contains('ABANDONED'):
return abs(round((df_rollup.risk_avg - 31) / (master_weight / 100) / (abandoned_tolerance / 100), 0))
else:
return 0
I ran this function using this the.apply method as such:
df_rollup['modeled_days'] = df_rollup.apply(production_type_calc(df_rollup, master_weight, prod_tolerance, shutin_tolerance, abandoned_tolerance), axis=1)
I have ran into this problem before and it seems like I need to nest the data frame eg df = df[df[''] or something of the sort but I don't know how to begin. I would appreciate any help on this.
I still like your first method, and we can do the np.select
con1=df_rollup['completion_status'] == 'PRODUCING'
con2=df_rollup['completion_status'] == 'SHUT IN'
con3=df_rollup['completion_status'].str.contains('ABANDONED')
v1=abs(round(((df_rollup.risk_avg) - 31) / (master_weight /100) / (prod_tolerance / 100), 0))
v2=abs(round(((df_rollup.risk_avg) - 31) / (master_weight / 100) / (shutin_tolerance / 100), 0))
v3=abs(round(((df_rollup.risk_avg) - 31) / (master_weight / 100) / (abandoned_tolerance / 100), 0))
df_rollup['modeled_days']=np.select([con1, con2, con3], [v1,v2,v3])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.