如何在 pandas dataframe 中填写缺失的 5 分钟间隔

Question

I have a dataframe holding trade data every 5 minutes, like我有一个 dataframe 每 5 分钟持有一次交易数据，比如

                    open  close
datetime                     
2015-02-02 08:00:00  43.5 NaN

2015-02-02 08:10:00  43.3   0
2015-02-02 08:15:00  43.2   7
2015-02-02 08:20:00   NaN NaN
2015-02-02 08:25:00  43.1   9

2015-02-02 08:35:00  43.0   9
2015-02-02 08:40:00  43.0  11
2015-02-02 08:45:00   NaN NaN
2015-02-02 08:50:00   NaN NaN
2015-02-02 08:55:00   NaN NaN
2015-02-02 09:00:00  43.1   9

and I am looking to fill the missing rows like at the 08:30:00 timestamp, with just np.nan and then forward fill.我希望像 08:30:00 时间戳一样填充缺失的行，只需np.nan然后向前填充。 I've looked into using the pd.date_range function to calculate the index per five minute interval from a start to an end date, and just naively assigning that to be my dataframe's index, but as I thought, that raises an error.我已经研究过使用pd.date_range function 来计算从开始到结束日期每五分钟间隔的索引，并且只是天真地将其分配为我的数据框的索引，但正如我所想的那样，这会引发错误。

I also looked at this question which is very similar to what I'm asking, but the answer uses resample .我还查看了这个与我所问的问题非常相似的问题，但答案使用了resample 。 I don't know how that solved the OP's problem because you can't treat the resample object like a dataframe and query it in the same way, as far as I know.我不知道这如何解决了 OP 的问题，因为据我所知，您不能将重采样 object 视为 dataframe 并以相同的方式查询它。

EDIT: I ended up finding a way to get this done.编辑：我最终找到了完成这项工作的方法。 I made a dataframe with the same columns with the whole date range I want using date_range , and then updating this dataframe with the values I actually have from the trade data using update我使用date_range制作了一个 dataframe 与我想要的整个日期范围的相同列，然后使用我从贸易数据中实际获得的值更新这个 dataframe 使用update

Answer 1

to get something out of the resample object, you need to add a dispatching method (see the docs ), eg:要从重采样 object 中得到一些东西，您需要添加一个调度方法（请参阅文档），例如：

import numpy as np
import pandas as pd

df = pd.DataFrame({'open': [43.5,43.3,43.2,np.NaN,43.1,43.0,43.0,np.NaN,np.NaN,np.NaN,43.1],
                   'close': [np.NaN,0,7,np.NaN,9,9,11,np.NaN,np.NaN,np.NaN,9]},
                   index = pd.to_datetime(['2015-02-02 08:00:00','2015-02-02 08:10:00','2015-02-02 08:15:00',
                                           '2015-02-02 08:20:00','2015-02-02 08:25:00','2015-02-02 08:35:00',
                                           '2015-02-02 08:40:00','2015-02-02 08:45:00','2015-02-02 08:50:00',
                                           '2015-02-02 08:55:00','2015-02-02 09:00:00']))

df1 = df.resample('5min').mean()
# df1
#                      open  close
# 2015-02-02 08:00:00  43.5    NaN
# 2015-02-02 08:05:00   NaN    NaN
# 2015-02-02 08:10:00  43.3    0.0
# 2015-02-02 08:15:00  43.2    7.0
# 2015-02-02 08:20:00   NaN    NaN
# 2015-02-02 08:25:00  43.1    9.0
# 2015-02-02 08:30:00   NaN    NaN
# 2015-02-02 08:35:00  43.0    9.0
# 2015-02-02 08:40:00  43.0   11.0
# 2015-02-02 08:45:00   NaN    NaN
# 2015-02-02 08:50:00   NaN    NaN
# 2015-02-02 08:55:00   NaN    NaN
# 2015-02-02 09:00:00  43.1    9.0

如何在 pandas dataframe 中填写缺失的 5 分钟间隔

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-06-17 08:36:58

如何在 pandas dataframe 中填写缺失的 5 分钟间隔

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-06-17 08:36:58

解决方案1
0 已采纳 2020-06-17 08:36:58