简体   繁体   中英

Filling missing values pandas dataframe

I'm trying to fill missing datavalues in a pandas dataframe based on date column.

df.head()

            col1 col2 col3
date            
2014-06-20  3    752     4028
2014-06-21  4    752     4028
2014-06-22  32   752     4028
2014-06-25  44   882     4548
2014-06-26  32   882     4548

I tried the following

idx = pd.date_range(df.index[0], df.index[-1])

df = df.reindex(idx).reset_index()

But, I get a dataframe of nans.

    index       col1 col2   col3
0   2014-06-20  NaN  NaN    NaN
1   2014-06-21  NaN  NaN    NaN
2   2014-06-22  NaN  NaN    NaN
3   2014-06-23  NaN  NaN    NaN
4   2014-06-24  NaN  NaN    NaN

What am I missing here ?

The behavior you describe would happen if the index is a pd.Index containing strings , rather than a pd.DatetimeIndex containing timestamps.

For example,

import pandas as pd

df = pd.DataFrame(
    {'col1': [3, 4, 32, 44, 32],
     'col2': [752, 752, 752, 882, 882],
     'col3': [4028, 4028, 4028, 4548, 4548]},
    index = ['2014-06-20', '2014-06-21', '2014-06-22', '2014-06-25', '2014-06-26'])

idx = pd.date_range(df.index[0], df.index[-1])
print(df.reindex(idx).reset_index())
#        index  col1  col2  col3
# 0 2014-06-20   NaN   NaN   NaN
# 1 2014-06-21   NaN   NaN   NaN
# 2 2014-06-22   NaN   NaN   NaN
# 3 2014-06-23   NaN   NaN   NaN
# 4 2014-06-24   NaN   NaN   NaN
# 5 2014-06-25   NaN   NaN   NaN
# 6 2014-06-26   NaN   NaN   NaN

whereas, in contrast, if you make the index a DatetimeIndex:

df.index = pd.DatetimeIndex(df.index)

then

print(df.reindex(idx).reset_index())
       index  col1  col2  col3
0 2014-06-20     3   752  4028
1 2014-06-21     4   752  4028
2 2014-06-22    32   752  4028
3 2014-06-23   NaN   NaN   NaN
4 2014-06-24   NaN   NaN   NaN
5 2014-06-25    44   882  4548
6 2014-06-26    32   882  4548

Pandas has a builtin method to achieve this. Have a look at http://pandas.pydata.org/pandas-docs/stable/timeseries.html .

You can use df.asfreq('1d') to resample your data based on the date column and fill in the missing values automatically.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM