[英]Pandas: How to fill missing dates in a long dataframe with multiple non overlapping time series?
I have a long dataframe with multiple timeseries which are non overlapping.我有一个长数据帧,其中包含多个不重叠的时间序列。
import numpy as np
import pandas as pd
df = pd.DataFrame({'id':[1,1,1,1,1,2,2,2,2,2,2],
't':[0,1,2,3,4,2,3,4,5,6,7],
'price':[10,10.2,10.8,10.1,10.4,142.1,142.8,143.6,142.8,141.4,140.7]})
The df looks like this df 看起来像这样
Out[65]:
id t price
0 1 0 10.0
1 1 1 10.2
2 1 2 10.8
3 1 3 10.1
4 1 4 10.4
5 2 2 142.1
6 2 3 142.8
7 2 4 143.6
8 2 5 142.8
9 2 6 141.4
10 2 7 140.7
For the time series with id 1, the missing timestamps are 5,6 and 7 and the time series #2 misses timestamps 0 and 1.对于 id 为 1 的时间序列,缺少的时间戳是 5,6 和 7,而时间序列 #2 缺少时间戳 0 和 1。
I would like to fill the missing dates for all the time series in the dataframe so all of them have all the dates filled with nan:我想为数据框中的所有时间序列填充缺失的日期,以便所有这些日期都用 nan 填充:
df_target = pd.DataFrame({'id':[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
't':[0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7],
'price':[10,10.2,10.8,10.1,10.4,np.nan,np.nan,np.nan,np.nan,np.nan,142.1,142.8,143.6,142.8,141.4,140.7]})
Out[68]:
id t price
0 1 0 10.0
1 1 1 10.2
2 1 2 10.8
3 1 3 10.1
4 1 4 10.4
5 1 5 NaN
6 1 6 NaN
7 1 7 NaN
8 2 0 NaN
9 2 1 NaN
10 2 2 142.1
11 2 3 142.8
12 2 4 143.6
13 2 5 142.8
14 2 6 141.4
15 2 7 140.7
The objective is to be able to then reshape this dataframe to a 3d array.目标是能够将这个数据帧重塑为 3d 数组。 Is there a simple way to fill missing dates for each time series?是否有一种简单的方法可以为每个时间序列填充缺失的日期? Thanks谢谢
Use Series.unstack
with DataFrame.stack
:将Series.unstack
与DataFrame.stack
Series.unstack
使用:
df1 = (df.set_index(['id','t'])['price']
.unstack()
.stack(dropna=False)
.reset_index(name='price'))
print (df1)
id t price
0 1 0 10.0
1 1 1 10.2
2 1 2 10.8
3 1 3 10.1
4 1 4 10.4
5 1 5 NaN
6 1 6 NaN
7 1 7 NaN
8 2 0 NaN
9 2 1 NaN
10 2 2 142.1
11 2 3 142.8
12 2 4 143.6
13 2 5 142.8
14 2 6 141.4
15 2 7 140.7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.