I'm trying to fill missing datavalues in a pandas dataframe based on date column.
df.head()
col1 col2 col3
date
2014-06-20 3 752 4028
2014-06-21 4 752 4028
2014-06-22 32 752 4028
2014-06-25 44 882 4548
2014-06-26 32 882 4548
I tried the following
idx = pd.date_range(df.index[0], df.index[-1])
df = df.reindex(idx).reset_index()
But, I get a dataframe of nans.
index col1 col2 col3
0 2014-06-20 NaN NaN NaN
1 2014-06-21 NaN NaN NaN
2 2014-06-22 NaN NaN NaN
3 2014-06-23 NaN NaN NaN
4 2014-06-24 NaN NaN NaN
What am I missing here ?
The behavior you describe would happen if the index is a pd.Index
containing strings , rather than a pd.DatetimeIndex
containing timestamps.
For example,
import pandas as pd
df = pd.DataFrame(
{'col1': [3, 4, 32, 44, 32],
'col2': [752, 752, 752, 882, 882],
'col3': [4028, 4028, 4028, 4548, 4548]},
index = ['2014-06-20', '2014-06-21', '2014-06-22', '2014-06-25', '2014-06-26'])
idx = pd.date_range(df.index[0], df.index[-1])
print(df.reindex(idx).reset_index())
# index col1 col2 col3
# 0 2014-06-20 NaN NaN NaN
# 1 2014-06-21 NaN NaN NaN
# 2 2014-06-22 NaN NaN NaN
# 3 2014-06-23 NaN NaN NaN
# 4 2014-06-24 NaN NaN NaN
# 5 2014-06-25 NaN NaN NaN
# 6 2014-06-26 NaN NaN NaN
whereas, in contrast, if you make the index a DatetimeIndex:
df.index = pd.DatetimeIndex(df.index)
then
print(df.reindex(idx).reset_index())
index col1 col2 col3
0 2014-06-20 3 752 4028
1 2014-06-21 4 752 4028
2 2014-06-22 32 752 4028
3 2014-06-23 NaN NaN NaN
4 2014-06-24 NaN NaN NaN
5 2014-06-25 44 882 4548
6 2014-06-26 32 882 4548
Pandas has a builtin method to achieve this. Have a look at http://pandas.pydata.org/pandas-docs/stable/timeseries.html .
You can use df.asfreq('1d')
to resample your data based on the date column and fill in the missing values automatically.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.