[英]Insert missing weekdays in pandas dataframe and fill them with NaN
I am trying to insert missing weekdays in a time series dataframe such has 我试图在时间序列数据框中插入缺少的工作日,例如
import pandas as pd
from pandas.tseries.offsets import *
df = pd.DataFrame([['2016-09-30', 10, 2020], ['2016-10-03', 20, 2424], ['2016-10-05', 5, 232]], columns=['date', 'price', 'vol']).set_index('date')
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
data looks like this : 数据看起来像这样:
Out[300]:
price vol
date
2016-09-30 10 2020
2016-10-03 20 2424
2016-10-05 5 232
I can create a series of week days easily with pd.date_range()
我可以使用pd.date_range()
轻松创建一系列工作日
pd.date_range('2016-09-30', '2016-10-05', freq=BDay())
Out[301]: DatetimeIndex(['2016-09-30', '2016-10-03', '2016-10-04', '2016-10-05'], dtype='datetime64[ns]', freq='B')
based on that DateTimeIndex I would like to add missing dates in my df
and fill column values with NaN so I get: 基于该DateTimeIndex我想在我的df
添加缺少的日期,并用NaN填充列值,所以我得到:
Out[300]:
price vol
date
2016-09-30 10 2020
2016-10-03 20 2424
2016-10-04 NaN NaN
2016-10-05 5 232
is there an easy way to do this? 是否有捷径可寻? Thanks! 谢谢!
Alternatively, you can use pandas.DataFrame.resample() , specifying 'B' for Business Day with no need to specify beginning or end date sequence as along as the dataframe maintains a datetime index 或者,您可以使用pandas.DataFrame.resample() ,在工作日指定“ B”,而无需指定开始或结束日期序列,因为数据框维护了日期时间索引
df = df.resample('B').sum()
# price vol
# date
# 2016-09-30 10.0 2020.0
# 2016-10-03 20.0 2424.0
# 2016-10-04 NaN NaN
# 2016-10-05 5.0 232.0
You can use reindex: 您可以使用reindex:
df.index = pd.to_datetime(df.index)
df.reindex(pd.date_range('2016-09-30', '2016-10-05', freq=BDay()))
Out:
price vol
2016-09-30 10.0 2020.0
2016-10-03 20.0 2424.0
2016-10-04 NaN NaN
2016-10-05 5.0 232.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.