I have a Data frame like this
Temp_in_C Temp_in_F Date Year Month Day
23 65 2011-12-12 2011 12 12
12 72 2011-12-12 2011 12 12
NaN 67 2011-12-12 2011 12 12
0 0 2011-12-12 2011 12 12
7 55 2011-12-13 2011 12 13
I am trying to get output in this format (The NaN and zero values of pertuculer day is replaced by avg temp of that day only) Output will be
Temp_in_C Temp_in_F Date Year Month Day
23 65 2011-12-12 2011 12 12
12 72 2011-12-12 2011 12 12
17.5 67 2011-12-12 2011 12 12
17.5 68 2011-12-12 2011 12 12
7 55 2011-12-13 2011 12 13
These vales will be replaced by mean of that perticuler day. I am trying to do this
temp_df = csv_data_df[csv_data_df["Temp_in_C"]!=0]
temp_df["Temp_in_C"] =
temp_df["Temp_in_C"].replace('*',np.nan)
x=temp_df["Temp_in_C"].mean()
csv_data_df["Temp_in_C"]=csv_data_df["Temp_in_C"]
.replace(0.0,x)
csv_data_df["Temp_in_C"]=csv_data_df["Temp_in_C"]
.fillna(x)
This code is taking the mean of whole columns and replacing it directly. How can i group by day and take mean and then replace values for that particular day only.
First, replace zeros with NaN
df = df.replace(0,np.nan)
Then fill the missing values using transform
(see this post )
df.groupby('Date').transform(lambda x: x.fillna(x.mean()))
Gives:
Temp_in_C Temp_in_F Year Month Day
0 23.0 65.0 2011 12 12
1 12.0 72.0 2011 12 12
2 17.5 67.0 2011 12 12
3 17.5 68.0 2011 12 12
4 7.0 55.0 2011 12 13
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.