简体   繁体   English

在pandas数据帧中填写缺失值

[英]Fill in missing values in pandas dataframe

I would like to fill in missing values in my pandas dataframe. 我想在我的pandas数据框中填写缺失值。 Optimally I would like the minute column to range from 0-60 for each hour. 最理想的是,我希望minute列的范围为每小时0-60。 Unfortunately, the data generating process did not record any rows where sub_count = 0 . 不幸的是,数据生成过程没有记录sub_count = 0任何行。 Is there anyway to do this? 反正有没有这样做? My data covers the dates 2014-03-31 and 2014-04-01 . 我的数据涵盖2014-03-312014-04-01的日期。

df = 

   sub_count        date  hour  minute
0          1  2014-03-31     0       0
1          1  2014-03-31     0       4
2          1  2014-03-31     0       5
3          1  2014-03-31     0       6
4          2  2014-03-31     0       7
...

Construct a DatetimeIndex (you may be able to do this while reading the data in, depending on how it's stored): 构造一个DatetimeIndex(您可以在读取数据时执行此操作,具体取决于它的存储方式):

df = df.set_index(pd.to_datetime(df.date + 'T' +
                                 df.hour.astype(str) + ':' +
                                 df.minute.astype(str))

In [23]: df = df['sub_count']

In [24]: df
Out[24]: 
2014-03-31 00:00:00    1
2014-03-31 00:04:00    1
2014-03-31 00:05:00    1
2014-03-31 00:06:00    1
2014-03-31 00:07:00    2
Name: sub_count, dtype: int64

Then resample: 然后重新采样:

In [26]: df.resample('T')
Out[26]: 
2014-03-31 00:00:00     1
2014-03-31 00:01:00   NaN
2014-03-31 00:02:00   NaN
2014-03-31 00:03:00   NaN
2014-03-31 00:04:00     1
2014-03-31 00:05:00     1
2014-03-31 00:06:00     1
2014-03-31 00:07:00     2
Freq: T, Name: sub_count, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM