简体   繁体   中英

Pandas fillna() method not filling all missing values

I have rain and temp data sourced from Environment Canada but it contains some NaN values.

start_date = '2015-12-31'
end_date = '2021-05-26'
mask = (data['date'] > start_date) & (data['date'] <= end_date)
df = data.loc[mask]
print(df)
            date      time  rain_gauge_value  temperature
8760  2016-01-01  00:00:00               0.0         -2.9
8761  2016-01-01  01:00:00               0.0         -3.4
8762  2016-01-01  02:00:00               0.0         -3.6
8763  2016-01-01  03:00:00               0.0         -3.6
8764  2016-01-01  04:00:00               0.0         -4.0
...          ...       ...               ...          ...
56107 2021-05-26  19:00:00               0.0         22.0
56108 2021-05-26  20:00:00               0.0         21.5
56109 2021-05-26  21:00:00               0.0         21.1
56110 2021-05-26  22:00:00               0.0         19.5
56111 2021-05-26  23:00:00               0.0         18.5

[47352 rows x 4 columns]

Find the rows with a NaN value

null = df[df['rain_gauge_value'].isnull()]
print(null)
            date      time  rain_gauge_value  temperature
11028 2016-04-04  12:00:00               NaN         -6.9
11986 2016-05-14  10:00:00               NaN          NaN
11987 2016-05-14  11:00:00               NaN          NaN
11988 2016-05-14  12:00:00               NaN          NaN
11989 2016-05-14  13:00:00               NaN          NaN
...          ...       ...               ...          ...
49024 2020-08-04  16:00:00               NaN          NaN
49025 2020-08-04  17:00:00               NaN          NaN
50505 2020-10-05  09:00:00               NaN         11.3
54083 2021-03-03  11:00:00               NaN         -5.1
54084 2021-03-03  12:00:00               NaN         -4.5

[6346 rows x 4 columns]

This is my dataframe I want to use to fill the NaN values

print(rain_df)
             date      time  rain_gauge_value  temperature
0      2015-12-28  00:00:00               0.1         -6.0
1      2015-12-28  01:00:00               0.0         -7.0
2      2015-12-28  02:00:00               0.0         -8.0
3      2015-12-28  03:00:00               0.0         -8.0
4      2015-12-28  04:00:00               0.0         -7.0
...           ...       ...               ...          ...
48043  2021-06-19  19:00:00               0.6         20.0
48044  2021-06-19  20:00:00               0.6         19.0
48045  2021-06-19  21:00:00               0.8         18.0
48046  2021-06-19  22:00:00               0.4         17.0
48047  2021-06-19  23:00:00               0.0         16.0

[48048 rows x 4 columns]

But when I use the fillna() method, some of the values don't get substitued.

null = null.fillna(rain_df)
null = null[null['rain_gauge_value'].isnull()]
print(null)
            date      time  rain_gauge_value  temperature
48057 2020-06-25  09:00:00               NaN          NaN
48058 2020-06-25  10:00:00               NaN          NaN
48059 2020-06-25  11:00:00               NaN          NaN
48060 2020-06-25  12:00:00               NaN          NaN
48586 2020-07-17  10:00:00               NaN          NaN
48587 2020-07-17  11:00:00               NaN          NaN
48588 2020-07-17  12:00:00               NaN          NaN
49022 2020-08-04  14:00:00               NaN          NaN
49023 2020-08-04  15:00:00               NaN          NaN
49024 2020-08-04  16:00:00               NaN          NaN
49025 2020-08-04  17:00:00               NaN          NaN
50505 2020-10-05  09:00:00               NaN         11.3
54083 2021-03-03  11:00:00               NaN         -5.1
54084 2021-03-03  12:00:00               NaN         -4.5

How can I resolve this issue?

when fillna , you probably want a method, like fill using previous/next value, mean of column etc, what we can do is like this

nulls_index = df['rain_gauge_value'].isnull()
df = df.fillna(method='ffill') # use ffill as example
nulls_after_fill = df[nulls_index]

take a look at: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html

You need to inform pandas how you want to patch. It may be obvious to you want to use the "patch" dataframe's values when the date and times line up, but it won't be obvious to pandas. see my dummy example:

raw = pd.DataFrame(dict(date=[date(2015,12,28), date(2015,12,28)], time= [time(0,0,0),time(0,0,1)],temp=[1.,np.nan],rain=[4.,np.nan]))                                                                                                       
raw                                                                                                                          
         date      time  temp  rain                                                                                     
0  2015-12-28  00:00:00   1.0   4.0                                                                                     
1  2015-12-28  00:00:01   NaN   NaN 

patch = pd.DataFrame(dict(date=[date(2015,12,28), date(2015,12,28)], time=[time(0,0,0),time(0,0,1)],temp=[5.,5.],rain=[10.,10.])) 
patch
         date      time  temp  rain                                                                                     
0  2015-12-28  00:00:00   5.0  10.0                                                                                     
1  2015-12-28  00:00:01   5.0  10.0 

you need the indexes of raw and patch to correspond to how you want to patch the raw data (in this case, you want to patch based on date and time)

raw.set_index(['date','time']).fillna(patch.set_index(['date','time'])) 

returns

                     temp  rain                                                                                         
date       time                                                                                                         
2015-12-28 00:00:00   1.0   4.0                                                                                                    
           00:00:01   5.0  10.0 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM