简体   繁体   English

填充缺失值熊猫数据框

[英]Filling missing values pandas dataframe

I'm trying to fill missing datavalues in a pandas dataframe based on date column. 我正在尝试基于日期列在熊猫数据框中填充缺少的数据值。

df.head()

            col1 col2 col3
date            
2014-06-20  3    752     4028
2014-06-21  4    752     4028
2014-06-22  32   752     4028
2014-06-25  44   882     4548
2014-06-26  32   882     4548

I tried the following 我尝试了以下

idx = pd.date_range(df.index[0], df.index[-1])

df = df.reindex(idx).reset_index()

But, I get a dataframe of nans. 但是,我得到了nans的数据框。

    index       col1 col2   col3
0   2014-06-20  NaN  NaN    NaN
1   2014-06-21  NaN  NaN    NaN
2   2014-06-22  NaN  NaN    NaN
3   2014-06-23  NaN  NaN    NaN
4   2014-06-24  NaN  NaN    NaN

What am I missing here ? 我在这里想念什么?

The behavior you describe would happen if the index is a pd.Index containing strings , rather than a pd.DatetimeIndex containing timestamps. 如果该索引是会发生您所描述的行为pd.Index包含字符串 ,而不是一个pd.DatetimeIndex包含时间戳。

For example, 例如,

import pandas as pd

df = pd.DataFrame(
    {'col1': [3, 4, 32, 44, 32],
     'col2': [752, 752, 752, 882, 882],
     'col3': [4028, 4028, 4028, 4548, 4548]},
    index = ['2014-06-20', '2014-06-21', '2014-06-22', '2014-06-25', '2014-06-26'])

idx = pd.date_range(df.index[0], df.index[-1])
print(df.reindex(idx).reset_index())
#        index  col1  col2  col3
# 0 2014-06-20   NaN   NaN   NaN
# 1 2014-06-21   NaN   NaN   NaN
# 2 2014-06-22   NaN   NaN   NaN
# 3 2014-06-23   NaN   NaN   NaN
# 4 2014-06-24   NaN   NaN   NaN
# 5 2014-06-25   NaN   NaN   NaN
# 6 2014-06-26   NaN   NaN   NaN

whereas, in contrast, if you make the index a DatetimeIndex: 相反,如果将索引设为DatetimeIndex:

df.index = pd.DatetimeIndex(df.index)

then 然后

print(df.reindex(idx).reset_index())
       index  col1  col2  col3
0 2014-06-20     3   752  4028
1 2014-06-21     4   752  4028
2 2014-06-22    32   752  4028
3 2014-06-23   NaN   NaN   NaN
4 2014-06-24   NaN   NaN   NaN
5 2014-06-25    44   882  4548
6 2014-06-26    32   882  4548

Pandas has a builtin method to achieve this. 熊猫有一个内置的方法来实现这一目标。 Have a look at http://pandas.pydata.org/pandas-docs/stable/timeseries.html . 看看http://pandas.pydata.org/pandas-docs/stable/timeseries.html

You can use df.asfreq('1d') to resample your data based on the date column and fill in the missing values automatically. 您可以使用df.asfreq('1d')根据日期列重新采样数据,并自动填写缺失值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM