[英]Unstack pandas dataframe by two index in the time window
我有以下 dataframe:
| car_id | timestamp | gas | odometer | temperature |
|--------|---------------------|-----|----------|-------------|
| aac43f | 2019-10-05 14:00:00 | 70 | 152042 | 87 |
| aac43f | 2019-10-05 15:00:00 | 63 | 152112 | 88 |
| aac43f | 2019-10-05 18:00:00 | 44 | 152544 | 93 |
| bg112 | 2019-08-22 09:00:00 | 90 | 1242 | 85 |
| bg112 | 2019-08-22 10:00:00 | 89 | 1270 | 85 |
| 32rre | 2019-01-01 12:00:00 | 20 | 84752 | 74 |
我想通過car_id
和timestamp
對其進行轉換,其中新功能 - 1 小時和 2 小時前的傳感器讀數如下所示:
| car_id | timestamp | gas | gas_1h_ago | gas_2h_ago | odometer | o_1h | o_2h | temperature | t_1h_ago | t_2h_ago |
|--------|---------------------|-----|------------|------------|----------|--------|------|-------------|----------|----------|
| aac43f | 2019-10-05 14:00:00 | 70 | NaN | NaN | 152042 | NaN | NaN | 87 | NaN | NaN |
| aac43f | 2019-10-05 15:00:00 | 63 | 70 | NaN | 152112 | 152042 | NaN | 88 | 87 | NaN |
| aac43f | 2019-10-05 18:00:00 | 44 | NaN | NaN | 152544 | NaN | NaN | 93 | NaN | NaN |
| bg112 | 2019-08-22 09:00:00 | 90 | NaN | NaN | 1242 | NaN | NaN | 85 | NaN | NaN |
| bg112 | 2019-08-22 10:00:00 | 89 | 90 | NaN | 1270 | 1242 | NaN | 85 | 85 | NaN |
| 32rre | 2019-01-01 12:00:00 | 20 | NaN | NaN | 84752 | NaN | NaN | 74 | NaN | NaN |
我以為我可以使用unstack
function,但是,我想不出解決方案。
您可以使用GroupBy.apply
使用DataFrame.resample
制作幾個小時的樣本。 使用DataFrame.sum
+ DataFrame.shift
傳輸每小時的當前時間值。 使用DataFrame.reindex
將 dataframe 返回到其原始行。 在使用前 x 小時為列添加后綴 DataFrame.add_sufix 在 x 小時前執行此操作並使用DataFrame.add_sufix
pd.concat
。
最后,使用pd.concat
再次將生成的 eldataframe 與原始數據結合起來。 使用DataFrame.set_index
+ DataFrame.sort_index
+ DataFrame.reset_index
重新排列列
hours_ago = [1,2]
#Creating a DataFrame by hour ago and concat
df_x_hours_ago= (
pd.concat(
[( df.groupby('car_id')
.apply(lambda x: x.resample('H',on='timestamp')
.sum(min_count=1)
.shift(hour))
.reset_index(level='car_id',drop='car_id')
.reindex(index=df['timestamp'])
.add_suffix(f'_{hour}h_ago')
.reset_index(drop=True))
for hour in hours_ago],
axis=1)
)
#Concat and ordering columns:
new_df=( pd.concat([df,df_x_hours_ago],axis=1)
.set_index(['car_id','timestamp'])
.sort_index(axis=1)
.reset_index() )
print(new_df)
Output
car_id timestamp gas gas_1h_ago gas_2h_ago odometer \
0 aac43f 2019-10-05 14:00:00 70 NaN NaN 152042
1 aac43f 2019-10-05 15:00:00 63 70.0 NaN 152112
2 aac43f 2019-10-05 18:00:00 44 NaN NaN 152544
3 bg112 2019-08-22 09:00:00 90 NaN NaN 1242
4 bg112 2019-08-22 10:00:00 89 90.0 NaN 1270
5 32rre 2019-01-01 12:00:00 20 NaN NaN 84752
odometer_1h_ago odometer_2h_ago temperature temperature_1h_ago \
0 NaN NaN 87 NaN
1 152042.0 NaN 88 87.0
2 NaN NaN 93 NaN
3 NaN NaN 85 NaN
4 1242.0 NaN 85 85.0
5 NaN NaN 74 NaN
temperature_2h_ago
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
用 0 填充刪除min_count=1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.