[英]Expanding multi-indexed dataframe with new dates as forecast
Note: I have followed Stackoverflow's instruction of how to create MRE and paste the MRE into 'code block' as instructed (ie paste it in the Body and then press Ctrl+K when highlighting it).注意:我已经按照 Stackoverflow 的说明如何创建 MRE 并按照指示将 MRE 粘贴到“代码块”中(即,将其粘贴到正文中,然后在突出显示时按 Ctrl+K)。 If I am still not doing it correctly, let me know.
如果我仍然没有正确执行,请告诉我。
Back to question: Suppose I now have a df multi-indexed in both the date (df['DT']) and ID (df['ID'])回到问题:假设我现在在日期 (df['DT']) 和 ID (df['ID']) 中都有一个 df 多重索引
DT,ID,value1,value2
2020-10-01,a,1,1
2020-10-01,b,2,1
2020-10-01,c,3,1
2020-10-01,d,4,1
2020-10-02,a,10,1
2020-10-02,b,11,1
2020-10-02,c,12,1
2020-10-02,d,13,1
df = df.set_index(['DT','ID'])
And now, I want to expand the df to have '2020-10-03' and '2020-10-04' with the same set of ID {a,b,c,d} as my forecast period.现在,我想将 df 扩展为 '2020-10-03' 和 '2020-10-04',其 ID {a,b,c,d} 集与我的预测期相同。 To forecast value 1, I assume they will take the average of the existing values, eg for a's value1 in both 2020-10-03' and '2020-10-04', I assume it will take (1+10)/2 = 5.5.
为了预测值 1,我假设他们将取现有值的平均值,例如对于 2020-10-03' 和 '2020-10-04' 中的值 1,我假设它将需要 (1+10)/2 = 5.5。 For value 2, I assume it will stay constant as 1.
对于值 2,我假设它将保持不变为 1。
The expected df will look like this:预期的 df 将如下所示:
DT,ID,value1,value2
2020-10-01,a,1.0,1
2020-10-01,b,2.0,1
2020-10-01,c,3.0,1
2020-10-01,d,4.0,1
2020-10-02,a,10.0,1
2020-10-02,b,11.0,1
2020-10-02,c,12.0,1
2020-10-02,d,13.0,1
2020-10-03,a,5.5,1
2020-10-03,b,6.5,1
2020-10-03,c,7.5,1
2020-10-03,d,8.5,1
2020-10-04,a,5.5,1
2020-10-04,b,6.5,1
2020-10-04,c,7.5,1
2020-10-04,d,8.5,1
Appreciate your help and time.感谢您的帮助和时间。
For easy forecast with mean use DataFrame.unstack
for DatetimeIndex
, add next datetimes by DataFrame.reindex
with date_range
and then replace missing values in value1
level by DataFrame.fillna
and for value2
is set 1
, last reshape back by DataFrame.stack
:对于平均使用容易预测
DataFrame.unstack
为DatetimeIndex
,通过添加下一个日期时间DataFrame.reindex
与date_range
,然后替换缺失值value1
的水平DataFrame.fillna
和value2
设为1
,由过去的整形回DataFrame.stack
:
print (df)
value1 value2
DT ID
2020-10-01 a 1 1
b 2 1
c 3 1
d 4 1
2020-10-02 a 10 1
b 11 1
c 12 1
d 13 1
rng = pd.date_range('2020-10-01','2020-10-04', name='DT')
df1 = df.unstack().reindex(rng)
df1['value1'] = df1['value1'].fillna(df1['value1'].mean())
df1['value2'] = 1
df2 = df1.stack()
print (df2)
value1 value2
DT ID
2020-10-01 a 1.0 1
b 2.0 1
c 3.0 1
d 4.0 1
2020-10-02 a 10.0 1
b 11.0 1
c 12.0 1
d 13.0 1
2020-10-03 a 5.5 1
b 6.5 1
c 7.5 1
d 8.5 1
2020-10-04 a 5.5 1
b 6.5 1
c 7.5 1
d 8.5 1
But forecasting is more complex, you can check this但是预测比较复杂,你可以看看这个
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.