简体   繁体   English

Pandas:如何在具有多个非重叠时间序列的长数据框中填充缺失的日期?

[英]Pandas: How to fill missing dates in a long dataframe with multiple non overlapping time series?

I have a long dataframe with multiple timeseries which are non overlapping.我有一个长数据帧,其中包含多个不重叠的时间序列。

import numpy as np
import pandas as pd
df = pd.DataFrame({'id':[1,1,1,1,1,2,2,2,2,2,2],
                   't':[0,1,2,3,4,2,3,4,5,6,7],
                   'price':[10,10.2,10.8,10.1,10.4,142.1,142.8,143.6,142.8,141.4,140.7]})

The df looks like this df 看起来像这样

Out[65]: 
    id  t  price
0    1  0   10.0
1    1  1   10.2
2    1  2   10.8
3    1  3   10.1
4    1  4   10.4
5    2  2  142.1
6    2  3  142.8
7    2  4  143.6
8    2  5  142.8
9    2  6  141.4
10   2  7  140.7

For the time series with id 1, the missing timestamps are 5,6 and 7 and the time series #2 misses timestamps 0 and 1.对于 id 为 1 的时间序列,缺少的时间戳是 5,6 和 7,而时间序列 #2 缺少时间戳 0 和 1。

I would like to fill the missing dates for all the time series in the dataframe so all of them have all the dates filled with nan:我想为数据框中的所有时间序列填充缺失的日期,以便所有这些日期都用 nan 填充:

    df_target = pd.DataFrame({'id':[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
                              't':[0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7],
                              'price':[10,10.2,10.8,10.1,10.4,np.nan,np.nan,np.nan,np.nan,np.nan,142.1,142.8,143.6,142.8,141.4,140.7]})

Out[68]: 
    id  t  price
0    1  0   10.0
1    1  1   10.2
2    1  2   10.8
3    1  3   10.1
4    1  4   10.4
5    1  5    NaN
6    1  6    NaN
7    1  7    NaN
8    2  0    NaN
9    2  1    NaN
10   2  2  142.1
11   2  3  142.8
12   2  4  143.6
13   2  5  142.8
14   2  6  141.4
15   2  7  140.7

The objective is to be able to then reshape this dataframe to a 3d array.目标是能够将这个数据帧重塑为 3d 数组。 Is there a simple way to fill missing dates for each time series?是否有一种简单的方法可以为每个时间序列填充缺失的日期? Thanks谢谢

Use Series.unstack with DataFrame.stack :Series.unstackDataFrame.stack Series.unstack使用:

df1 = (df.set_index(['id','t'])['price']
         .unstack()
         .stack(dropna=False)
         .reset_index(name='price'))
print (df1)
    id  t  price
0    1  0   10.0
1    1  1   10.2
2    1  2   10.8
3    1  3   10.1
4    1  4   10.4
5    1  5    NaN
6    1  6    NaN
7    1  7    NaN
8    2  0    NaN
9    2  1    NaN
10   2  2  142.1
11   2  3  142.8
12   2  4  143.6
13   2  5  142.8
14   2  6  141.4
15   2  7  140.7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM