[英]Filling missing values pandas dataframe
I'm trying to fill missing datavalues in a pandas dataframe based on date column. 我正在尝试基于日期列在熊猫数据框中填充缺少的数据值。
df.head()
col1 col2 col3
date
2014-06-20 3 752 4028
2014-06-21 4 752 4028
2014-06-22 32 752 4028
2014-06-25 44 882 4548
2014-06-26 32 882 4548
I tried the following 我尝试了以下
idx = pd.date_range(df.index[0], df.index[-1])
df = df.reindex(idx).reset_index()
But, I get a dataframe of nans. 但是,我得到了nans的数据框。
index col1 col2 col3
0 2014-06-20 NaN NaN NaN
1 2014-06-21 NaN NaN NaN
2 2014-06-22 NaN NaN NaN
3 2014-06-23 NaN NaN NaN
4 2014-06-24 NaN NaN NaN
What am I missing here ? 我在这里想念什么?
The behavior you describe would happen if the index is a pd.Index
containing strings , rather than a pd.DatetimeIndex
containing timestamps. 如果该索引是会发生您所描述的行为
pd.Index
包含字符串 ,而不是一个pd.DatetimeIndex
包含时间戳。
For example, 例如,
import pandas as pd
df = pd.DataFrame(
{'col1': [3, 4, 32, 44, 32],
'col2': [752, 752, 752, 882, 882],
'col3': [4028, 4028, 4028, 4548, 4548]},
index = ['2014-06-20', '2014-06-21', '2014-06-22', '2014-06-25', '2014-06-26'])
idx = pd.date_range(df.index[0], df.index[-1])
print(df.reindex(idx).reset_index())
# index col1 col2 col3
# 0 2014-06-20 NaN NaN NaN
# 1 2014-06-21 NaN NaN NaN
# 2 2014-06-22 NaN NaN NaN
# 3 2014-06-23 NaN NaN NaN
# 4 2014-06-24 NaN NaN NaN
# 5 2014-06-25 NaN NaN NaN
# 6 2014-06-26 NaN NaN NaN
whereas, in contrast, if you make the index a DatetimeIndex: 相反,如果将索引设为DatetimeIndex:
df.index = pd.DatetimeIndex(df.index)
then 然后
print(df.reindex(idx).reset_index())
index col1 col2 col3
0 2014-06-20 3 752 4028
1 2014-06-21 4 752 4028
2 2014-06-22 32 752 4028
3 2014-06-23 NaN NaN NaN
4 2014-06-24 NaN NaN NaN
5 2014-06-25 44 882 4548
6 2014-06-26 32 882 4548
Pandas has a builtin method to achieve this. 熊猫有一个内置的方法来实现这一目标。 Have a look at http://pandas.pydata.org/pandas-docs/stable/timeseries.html .
看看http://pandas.pydata.org/pandas-docs/stable/timeseries.html 。
You can use df.asfreq('1d')
to resample your data based on the date column and fill in the missing values automatically. 您可以使用
df.asfreq('1d')
根据日期列重新采样数据,并自动填写缺失值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.