將 pandas dataframe 通過兩個索引在時間 window 中取消堆疊

Question

我有以下 dataframe：

| car_id | timestamp           | gas | odometer | temperature |
|--------|---------------------|-----|----------|-------------|
| aac43f | 2019-10-05 14:00:00 | 70  | 152042   | 87          |
| aac43f | 2019-10-05 15:00:00 | 63  | 152112   | 88          |
| aac43f | 2019-10-05 18:00:00 | 44  | 152544   | 93          |
| bg112  | 2019-08-22 09:00:00 | 90  | 1242     | 85          |
| bg112  | 2019-08-22 10:00:00 | 89  | 1270     | 85          |
| 32rre  | 2019-01-01 12:00:00 | 20  | 84752    | 74          |

我想通過car_id和timestamp對其進行轉換，其中新功能 - 1 小時和 2 小時前的傳感器讀數如下所示：

| car_id | timestamp           | gas | gas_1h_ago | gas_2h_ago | odometer | o_1h   | o_2h | temperature | t_1h_ago | t_2h_ago |
|--------|---------------------|-----|------------|------------|----------|--------|------|-------------|----------|----------|
| aac43f | 2019-10-05 14:00:00 | 70  | NaN        | NaN        | 152042   | NaN    | NaN  | 87          | NaN      | NaN      |
| aac43f | 2019-10-05 15:00:00 | 63  | 70         | NaN        | 152112   | 152042 | NaN  | 88          | 87       | NaN      |
| aac43f | 2019-10-05 18:00:00 | 44  | NaN        | NaN        | 152544   | NaN    | NaN  | 93          | NaN      | NaN      |
| bg112  | 2019-08-22 09:00:00 | 90  | NaN        | NaN        | 1242     | NaN    | NaN  | 85          | NaN      | NaN      |
| bg112  | 2019-08-22 10:00:00 | 89  | 90         | NaN        | 1270     | 1242   | NaN  | 85          | 85       | NaN      |
| 32rre  | 2019-01-01 12:00:00 | 20  | NaN        | NaN        | 84752    | NaN    | NaN  | 74          | NaN      | NaN      |

我以為我可以使用unstack function，但是，我想不出解決方案。

Answer 1

您可以使用GroupBy.apply

使用DataFrame.resample制作幾個小時的樣本。 使用DataFrame.sum + DataFrame.shift傳輸每小時的當前時間值。 使用DataFrame.reindex將 dataframe 返回到其原始行。 在使用前 x 小時為列添加后綴 DataFrame.add_sufix 在 x 小時前執行此操作並使用DataFrame.add_sufix pd.concat 。

最后，使用pd.concat再次將生成的 eldataframe 與原始數據結合起來。 使用DataFrame.set_index + DataFrame.sort_index + DataFrame.reset_index重新排列列

hours_ago = [1,2]

#Creating a DataFrame by hour ago and concat

df_x_hours_ago= (

pd.concat(

[( df.groupby('car_id')
     .apply(lambda x: x.resample('H',on='timestamp')
                       .sum(min_count=1)
                       .shift(hour))
     .reset_index(level='car_id',drop='car_id')                 
     .reindex(index=df['timestamp'])
     .add_suffix(f'_{hour}h_ago')
     .reset_index(drop=True))

   for hour in hours_ago],
axis=1)

)
#Concat and ordering columns:

new_df=( pd.concat([df,df_x_hours_ago],axis=1)
           .set_index(['car_id','timestamp'])
           .sort_index(axis=1)
           .reset_index() )
print(new_df)

Output

   car_id           timestamp  gas  gas_1h_ago  gas_2h_ago  odometer  \
0  aac43f 2019-10-05 14:00:00   70         NaN         NaN    152042   
1  aac43f 2019-10-05 15:00:00   63        70.0         NaN    152112   
2  aac43f 2019-10-05 18:00:00   44         NaN         NaN    152544   
3   bg112 2019-08-22 09:00:00   90         NaN         NaN      1242   
4   bg112 2019-08-22 10:00:00   89        90.0         NaN      1270   
5   32rre 2019-01-01 12:00:00   20         NaN         NaN     84752   

   odometer_1h_ago  odometer_2h_ago  temperature  temperature_1h_ago  \
0              NaN              NaN           87                 NaN   
1         152042.0              NaN           88                87.0   
2              NaN              NaN           93                 NaN   
3              NaN              NaN           85                 NaN   
4           1242.0              NaN           85                85.0   
5              NaN              NaN           74                 NaN   

   temperature_2h_ago  
0                 NaN  
1                 NaN  
2                 NaN  
3                 NaN  
4                 NaN  
5                 NaN

用 0 填充刪除min_count=1

將 pandas dataframe 通過兩個索引在時間 window 中取消堆疊

問題描述

1 個解決方案

解決方案1
1 2019-11-07 11:14:42

將 pandas dataframe 通過兩個索引在時間 window 中取消堆疊

問題描述

1 個解決方案

解決方案1 1 2019-11-07 11:14:42

解決方案1
1 2019-11-07 11:14:42