I have a data frame as seen below...
Month JUN JUL AUG SOI_Final JUN_bool JUL_bool AUG_bool
Aug 1.2 0.8 0.1 NaN False False True
Aug 0.2 0 -2 NaN False False True
Jun 3.2 -2.5 0.6 NaN True False False
Jul 2.2 -0.7 -0.8 NaN False True False
What I'm trying to do is for each row in the table, lookup what month is in the 'Month' column and assign the appropriate value from columns JUN, JUL or AUG to 'SOI_Final'. For instance if column 'Month' is 'Jun' for a given row, then 'SOI_Final' for that row will get the value from column 'JUN'. Here is the code I got so far...
df_merged['JUN_bool'] = (df_merged['Month'] == 'Jun')
df_merged['JUL_bool'] = (df_merged['Month'] == 'Jul')
df_merged['AUG_bool'] = (df_merged['Month'] == 'Aug')
if df_merged['JUN_bool'] is True:
df_merged['SOI_Final']=df_merged['JUN']
elif df_merged['JUL_bool'] is True:
df_merged['SOI_Final']=df_merged['JUL']
elif df_merged['AUG_bool'] is True:
df_merged['SOI_Final']=df_merged['AUG']
else:
df_merged['SOI_Final']=np.NaN
My dataframe is only showing NaN's for 'SOI_Final' and is not picking up the correct value. I created a Boolean column for each of the 3 months and the correct monthly value should only be copied over if the bool value is 'True'. Does anyone have any suggestions as to what I might be missing here?
Thanks, Jeff
The problem here is that each of bool columns, ie df_merged['JUN_bool']
are series so the comparison whith is
operator will never return just True
, so evertyhing is assigned as nan.
If months values are aligned with columns you can do some capitalization and use stack method, only if indexes are unique this is a three months example:
np.random.seed(10)
months = np.random.choice(['Aug', 'Jun', 'Jul'], 100)
JUN = np.random.random(100)
JUL = np.random.random(100)
AUG = np.random.random(100)
index = [i for i in range(1900, 2000)]
data = pd.DataFrame(dict(months=months, JUN=JUN, JUL=JUL, AUG=AUG), index=index)
Do the modification to the months column and boolean masks:
data['months'] = data.months.str.upper()
df2 = data[['JUN', 'JUL', 'AUG']].stack(
).reset_index(level=1)
df2.rename(columns={0: 'month_value'}, inplace=True)
df2['months'] = data['months']
SOI = df2[df2['months'] == df2['level_1']].month_value
data['SOI'] = SOI
data.head(4)
# months JUN JUL AUG SOI
# 1900 JUN 0.637952 0.933852 0.384843 0.637952
# 1901 JUN 0.372520 0.558900 0.820415 0.372520
# 1902 AUG 0.002407 0.672449 0.895022 0.895022
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.