简体   繁体   中英

Set a calculated column value in a data frame derived from another columns value

I am trying to create a calculated column in a pandas data frame that runs a different calculation based on another column in the data frame.

First I tried:

df_rollup['modeled_days'] = abs(round(((df_rollup.risk_avg) - 31) / (master_weight /100) / (prod_tolerance / 100), 0)).where(df_rollup['completion_status'] == 'PRODUCING')
df_rollup['modeled_days'] = abs(round(((df_rollup.risk_avg) - 31) / (master_weight / 100) / (shutin_tolerance / 100), 0)).where(df_rollup['completion_status'] == 'SHUT IN')
df_rollup['modeled_days'] = abs(round(((df_rollup.risk_avg) - 31) / (master_weight / 100) / (abandoned_tolerance / 100), 0)).where(df_rollup['completion_status'].str.contains('ABANDONED'))

I quickly realized that this would overwrite every row with the last updates calculatation and replace the non matching rows to Nan.

So I researched another approach that I believe is on the right track but I receive the error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

This is my approach:

def production_type_calc(df_rollup, master_weight, prod_tolerance, shutin_tolerance, abandoned_tolerance):

if df_rollup['completion_status'] == 'PRODUCING':
    return abs(round((df_rollup.risk_avg - 31) / (master_weight / 100) / (prod_tolerance / 100), 0))
elif df_rollup['completion_status'] == 'SHUT IN':
    return abs(round((df_rollup.risk_avg - 31) / (master_weight / 100) / (shutin_tolerance / 100), 0))
elif df_rollup['completion_status'].str.contains('ABANDONED'):
    return abs(round((df_rollup.risk_avg - 31) / (master_weight / 100) / (abandoned_tolerance / 100), 0))
else:
    return 0

I ran this function using this the.apply method as such:

df_rollup['modeled_days'] = df_rollup.apply(production_type_calc(df_rollup, master_weight, prod_tolerance, shutin_tolerance, abandoned_tolerance), axis=1)

I have ran into this problem before and it seems like I need to nest the data frame eg df = df[df[''] or something of the sort but I don't know how to begin. I would appreciate any help on this.

I still like your first method, and we can do the np.select

con1=df_rollup['completion_status'] == 'PRODUCING'
con2=df_rollup['completion_status'] == 'SHUT IN'
con3=df_rollup['completion_status'].str.contains('ABANDONED')
v1=abs(round(((df_rollup.risk_avg) - 31) / (master_weight /100) / (prod_tolerance / 100), 0))
v2=abs(round(((df_rollup.risk_avg) - 31) / (master_weight / 100) / (shutin_tolerance / 100), 0))
v3=abs(round(((df_rollup.risk_avg) - 31) / (master_weight / 100) / (abandoned_tolerance / 100), 0))
df_rollup['modeled_days']=np.select([con1, con2, con3], [v1,v2,v3])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM