I have following dataframe:
| car_id | timestamp | gas | odometer | temperature |
|--------|---------------------|-----|----------|-------------|
| aac43f | 2019-10-05 14:00:00 | 70 | 152042 | 87 |
| aac43f | 2019-10-05 15:00:00 | 63 | 152112 | 88 |
| aac43f | 2019-10-05 18:00:00 | 44 | 152544 | 93 |
| bg112 | 2019-08-22 09:00:00 | 90 | 1242 | 85 |
| bg112 | 2019-08-22 10:00:00 | 89 | 1270 | 85 |
| 32rre | 2019-01-01 12:00:00 | 20 | 84752 | 74 |
I would like to transform it by car_id
and timestamp
, where new features - sensor readings 1 and 2 hours ago like this:
| car_id | timestamp | gas | gas_1h_ago | gas_2h_ago | odometer | o_1h | o_2h | temperature | t_1h_ago | t_2h_ago |
|--------|---------------------|-----|------------|------------|----------|--------|------|-------------|----------|----------|
| aac43f | 2019-10-05 14:00:00 | 70 | NaN | NaN | 152042 | NaN | NaN | 87 | NaN | NaN |
| aac43f | 2019-10-05 15:00:00 | 63 | 70 | NaN | 152112 | 152042 | NaN | 88 | 87 | NaN |
| aac43f | 2019-10-05 18:00:00 | 44 | NaN | NaN | 152544 | NaN | NaN | 93 | NaN | NaN |
| bg112 | 2019-08-22 09:00:00 | 90 | NaN | NaN | 1242 | NaN | NaN | 85 | NaN | NaN |
| bg112 | 2019-08-22 10:00:00 | 89 | 90 | NaN | 1270 | 1242 | NaN | 85 | 85 | NaN |
| 32rre | 2019-01-01 12:00:00 | 20 | NaN | NaN | 84752 | NaN | NaN | 74 | NaN | NaN |
I thought I could use the unstack
function, but, I сan't think of a solution.
You can use GroupBy.apply
to make a sample for hours using DataFrame.resample
. Use DataFrame.sum
+ DataFrame.shift
to transfer the value of the current time for each hour . Return the dataframe to its original rows using DataFrame.reindex
. Add a suffix to the columns based on the x hours before using DataFrame.add_sufix
Perform this operation for x hours before and join it using pd.concat
.
Finally, unite the resulting eldataframe with the original again with pd.concat
. Rearrange the columns with DataFrame.set_index
+ DataFrame.sort_index
+ DataFrame.reset_index
hours_ago = [1,2]
#Creating a DataFrame by hour ago and concat
df_x_hours_ago= (
pd.concat(
[( df.groupby('car_id')
.apply(lambda x: x.resample('H',on='timestamp')
.sum(min_count=1)
.shift(hour))
.reset_index(level='car_id',drop='car_id')
.reindex(index=df['timestamp'])
.add_suffix(f'_{hour}h_ago')
.reset_index(drop=True))
for hour in hours_ago],
axis=1)
)
#Concat and ordering columns:
new_df=( pd.concat([df,df_x_hours_ago],axis=1)
.set_index(['car_id','timestamp'])
.sort_index(axis=1)
.reset_index() )
print(new_df)
Output
car_id timestamp gas gas_1h_ago gas_2h_ago odometer \
0 aac43f 2019-10-05 14:00:00 70 NaN NaN 152042
1 aac43f 2019-10-05 15:00:00 63 70.0 NaN 152112
2 aac43f 2019-10-05 18:00:00 44 NaN NaN 152544
3 bg112 2019-08-22 09:00:00 90 NaN NaN 1242
4 bg112 2019-08-22 10:00:00 89 90.0 NaN 1270
5 32rre 2019-01-01 12:00:00 20 NaN NaN 84752
odometer_1h_ago odometer_2h_ago temperature temperature_1h_ago \
0 NaN NaN 87 NaN
1 152042.0 NaN 88 87.0
2 NaN NaN 93 NaN
3 NaN NaN 85 NaN
4 1242.0 NaN 85 85.0
5 NaN NaN 74 NaN
temperature_2h_ago
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
to fillna with 0 remove min_count=1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.