简体   繁体   中英

Copy value from one column to another based on condition (using pandas)

I have a data frame as seen below...

Month    JUN    JUL    AUG    SOI_Final    JUN_bool    JUL_bool    AUG_bool
Aug      1.2    0.8    0.1    NaN          False       False       True
Aug      0.2    0      -2     NaN          False       False       True
Jun      3.2    -2.5   0.6    NaN          True        False       False
Jul      2.2    -0.7   -0.8   NaN          False       True        False

What I'm trying to do is for each row in the table, lookup what month is in the 'Month' column and assign the appropriate value from columns JUN, JUL or AUG to 'SOI_Final'. For instance if column 'Month' is 'Jun' for a given row, then 'SOI_Final' for that row will get the value from column 'JUN'. Here is the code I got so far...

df_merged['JUN_bool'] = (df_merged['Month'] == 'Jun')
df_merged['JUL_bool'] = (df_merged['Month'] == 'Jul')
df_merged['AUG_bool'] = (df_merged['Month'] == 'Aug')

if df_merged['JUN_bool'] is True:
    df_merged['SOI_Final']=df_merged['JUN']
elif df_merged['JUL_bool'] is True:
    df_merged['SOI_Final']=df_merged['JUL']
elif df_merged['AUG_bool'] is True:
    df_merged['SOI_Final']=df_merged['AUG']  
else:
    df_merged['SOI_Final']=np.NaN

My dataframe is only showing NaN's for 'SOI_Final' and is not picking up the correct value. I created a Boolean column for each of the 3 months and the correct monthly value should only be copied over if the bool value is 'True'. Does anyone have any suggestions as to what I might be missing here?

Thanks, Jeff

The problem here is that each of bool columns, ie df_merged['JUN_bool'] are series so the comparison whith is operator will never return just True , so evertyhing is assigned as nan.

If months values are aligned with columns you can do some capitalization and use stack method, only if indexes are unique this is a three months example:

np.random.seed(10)
months = np.random.choice(['Aug', 'Jun', 'Jul'], 100)
JUN = np.random.random(100)
JUL = np.random.random(100)
AUG = np.random.random(100)
index = [i for i in range(1900, 2000)]

data = pd.DataFrame(dict(months=months, JUN=JUN, JUL=JUL, AUG=AUG), index=index)

Do the modification to the months column and boolean masks:

data['months'] = data.months.str.upper()

df2 = data[['JUN', 'JUL', 'AUG']].stack(
    ).reset_index(level=1)

df2.rename(columns={0: 'month_value'}, inplace=True)

df2['months'] = data['months']
SOI = df2[df2['months'] == df2['level_1']].month_value
data['SOI'] = SOI

data.head(4)

#       months  JUN         JUL         AUG         SOI
# 1900  JUN     0.637952    0.933852    0.384843    0.637952
# 1901  JUN     0.372520    0.558900    0.820415    0.372520
# 1902  AUG     0.002407    0.672449    0.895022    0.895022

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM