简体   繁体   English

使用新日期作为预测扩展多索引数据框

[英]Expanding multi-indexed dataframe with new dates as forecast

Note: I have followed Stackoverflow's instruction of how to create MRE and paste the MRE into 'code block' as instructed (ie paste it in the Body and then press Ctrl+K when highlighting it).注意:我已经按照 Stackoverflow 的说明如何创建 MRE 并按照指示将 MRE 粘贴到“代码块”中(即,将其粘贴到正文中,然后在突出显示时按 Ctrl+K)。 If I am still not doing it correctly, let me know.如果我仍然没有正确执行,请告诉我。

Back to question: Suppose I now have a df multi-indexed in both the date (df['DT']) and ID (df['ID'])回到问题:假设我现在在日期 (df['DT']) 和 ID (df['ID']) 中都有一个 df 多重索引

DT,ID,value1,value2
2020-10-01,a,1,1
2020-10-01,b,2,1
2020-10-01,c,3,1
2020-10-01,d,4,1
2020-10-02,a,10,1
2020-10-02,b,11,1
2020-10-02,c,12,1
2020-10-02,d,13,1

df = df.set_index(['DT','ID'])

And now, I want to expand the df to have '2020-10-03' and '2020-10-04' with the same set of ID {a,b,c,d} as my forecast period.现在,我想将 df 扩展为 '2020-10-03' 和 '2020-10-04',其 ID {a,b,c,d} 集与我的预测期相同。 To forecast value 1, I assume they will take the average of the existing values, eg for a's value1 in both 2020-10-03' and '2020-10-04', I assume it will take (1+10)/2 = 5.5.为了预测值 1,我假设他们将取现有值的平均值,例如对于 2020-10-03' 和 '2020-10-04' 中的值 1,我假设它将需要 (1+10)/2 = 5.5。 For value 2, I assume it will stay constant as 1.对于值 2,我假设它将保持不变为 1。

The expected df will look like this:预期的 df 将如下所示:

DT,ID,value1,value2
2020-10-01,a,1.0,1
2020-10-01,b,2.0,1
2020-10-01,c,3.0,1
2020-10-01,d,4.0,1
2020-10-02,a,10.0,1
2020-10-02,b,11.0,1
2020-10-02,c,12.0,1
2020-10-02,d,13.0,1
2020-10-03,a,5.5,1
2020-10-03,b,6.5,1
2020-10-03,c,7.5,1
2020-10-03,d,8.5,1
2020-10-04,a,5.5,1
2020-10-04,b,6.5,1
2020-10-04,c,7.5,1
2020-10-04,d,8.5,1

Appreciate your help and time.感谢您的帮助和时间。

For easy forecast with mean use DataFrame.unstack for DatetimeIndex , add next datetimes by DataFrame.reindex with date_range and then replace missing values in value1 level by DataFrame.fillna and for value2 is set 1 , last reshape back by DataFrame.stack :对于平均使用容易预测DataFrame.unstackDatetimeIndex ,通过添加下一个日期时间DataFrame.reindexdate_range ,然后替换缺失值value1的水平DataFrame.fillnavalue2设为1 ,由过去的整形回DataFrame.stack

print (df)
               value1  value2
DT         ID                
2020-10-01 a        1       1
           b        2       1
           c        3       1
           d        4       1
2020-10-02 a       10       1
           b       11       1
           c       12       1
           d       13       1

rng = pd.date_range('2020-10-01','2020-10-04', name='DT')
df1 = df.unstack().reindex(rng)
df1['value1'] = df1['value1'].fillna(df1['value1'].mean())
df1['value2'] = 1

df2 = df1.stack()

print (df2)
               value1  value2
DT         ID                
2020-10-01 a      1.0       1
           b      2.0       1
           c      3.0       1
           d      4.0       1
2020-10-02 a     10.0       1
           b     11.0       1
           c     12.0       1
           d     13.0       1
2020-10-03 a      5.5       1
           b      6.5       1
           c      7.5       1
           d      8.5       1
2020-10-04 a      5.5       1
           b      6.5       1
           c      7.5       1
           d      8.5       1

But forecasting is more complex, you can check this但是预测比较复杂,你可以看看这个

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM