简体   繁体   English

将 pandas dataframe 通过两个索引在时间 window 中取消堆叠

[英]Unstack pandas dataframe by two index in the time window

I have following dataframe:我有以下 dataframe:

| car_id | timestamp           | gas | odometer | temperature |
|--------|---------------------|-----|----------|-------------|
| aac43f | 2019-10-05 14:00:00 | 70  | 152042   | 87          |
| aac43f | 2019-10-05 15:00:00 | 63  | 152112   | 88          |
| aac43f | 2019-10-05 18:00:00 | 44  | 152544   | 93          |
| bg112  | 2019-08-22 09:00:00 | 90  | 1242     | 85          |
| bg112  | 2019-08-22 10:00:00 | 89  | 1270     | 85          |
| 32rre  | 2019-01-01 12:00:00 | 20  | 84752    | 74          |

I would like to transform it by car_id and timestamp , where new features - sensor readings 1 and 2 hours ago like this:我想通过car_idtimestamp对其进行转换,其中新功能 - 1 小时和 2 小时前的传感器读数如下所示:

| car_id | timestamp           | gas | gas_1h_ago | gas_2h_ago | odometer | o_1h   | o_2h | temperature | t_1h_ago | t_2h_ago |
|--------|---------------------|-----|------------|------------|----------|--------|------|-------------|----------|----------|
| aac43f | 2019-10-05 14:00:00 | 70  | NaN        | NaN        | 152042   | NaN    | NaN  | 87          | NaN      | NaN      |
| aac43f | 2019-10-05 15:00:00 | 63  | 70         | NaN        | 152112   | 152042 | NaN  | 88          | 87       | NaN      |
| aac43f | 2019-10-05 18:00:00 | 44  | NaN        | NaN        | 152544   | NaN    | NaN  | 93          | NaN      | NaN      |
| bg112  | 2019-08-22 09:00:00 | 90  | NaN        | NaN        | 1242     | NaN    | NaN  | 85          | NaN      | NaN      |
| bg112  | 2019-08-22 10:00:00 | 89  | 90         | NaN        | 1270     | 1242   | NaN  | 85          | 85       | NaN      |
| 32rre  | 2019-01-01 12:00:00 | 20  | NaN        | NaN        | 84752    | NaN    | NaN  | 74          | NaN      | NaN      |

I thought I could use the unstack function, but, I сan't think of a solution.我以为我可以使用unstack function,但是,我想不出解决方案。

You can use GroupBy.apply您可以使用GroupBy.apply

to make a sample for hours using DataFrame.resample .使用DataFrame.resample制作几个小时的样本。 Use DataFrame.sum + DataFrame.shift to transfer the value of the current time for each hour .使用DataFrame.sum + DataFrame.shift传输每小时的当前时间值。 Return the dataframe to its original rows using DataFrame.reindex .使用DataFrame.reindex将 dataframe 返回到其原始行。 Add a suffix to the columns based on the x hours before using DataFrame.add_sufix Perform this operation for x hours before and join it using pd.concat .在使用前 x 小时为列添加后缀 DataFrame.add_sufix 在 x 小时前执行此操作并使用DataFrame.add_sufix pd.concat

Finally, unite the resulting eldataframe with the original again with pd.concat .最后,使用pd.concat再次将生成的 eldataframe 与原始数据结合起来。 Rearrange the columns with DataFrame.set_index + DataFrame.sort_index + DataFrame.reset_index使用DataFrame.set_index + DataFrame.sort_index + DataFrame.reset_index重新排列列

hours_ago = [1,2]

#Creating a DataFrame by hour ago and concat

df_x_hours_ago= (

pd.concat(

[( df.groupby('car_id')
     .apply(lambda x: x.resample('H',on='timestamp')
                       .sum(min_count=1)
                       .shift(hour))
     .reset_index(level='car_id',drop='car_id')                 
     .reindex(index=df['timestamp'])
     .add_suffix(f'_{hour}h_ago')
     .reset_index(drop=True))

   for hour in hours_ago],
axis=1)

)
#Concat and ordering columns:

new_df=( pd.concat([df,df_x_hours_ago],axis=1)
           .set_index(['car_id','timestamp'])
           .sort_index(axis=1)
           .reset_index() )
print(new_df)

Output Output

   car_id           timestamp  gas  gas_1h_ago  gas_2h_ago  odometer  \
0  aac43f 2019-10-05 14:00:00   70         NaN         NaN    152042   
1  aac43f 2019-10-05 15:00:00   63        70.0         NaN    152112   
2  aac43f 2019-10-05 18:00:00   44         NaN         NaN    152544   
3   bg112 2019-08-22 09:00:00   90         NaN         NaN      1242   
4   bg112 2019-08-22 10:00:00   89        90.0         NaN      1270   
5   32rre 2019-01-01 12:00:00   20         NaN         NaN     84752   

   odometer_1h_ago  odometer_2h_ago  temperature  temperature_1h_ago  \
0              NaN              NaN           87                 NaN   
1         152042.0              NaN           88                87.0   
2              NaN              NaN           93                 NaN   
3              NaN              NaN           85                 NaN   
4           1242.0              NaN           85                85.0   
5              NaN              NaN           74                 NaN   

   temperature_2h_ago  
0                 NaN  
1                 NaN  
2                 NaN  
3                 NaN  
4                 NaN  
5                 NaN  

to fillna with 0 remove min_count=1用 0 填充删除min_count=1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM