Unstack pandas dataframe by two index in the time window

Question

I have following dataframe:

| car_id | timestamp           | gas | odometer | temperature |
|--------|---------------------|-----|----------|-------------|
| aac43f | 2019-10-05 14:00:00 | 70  | 152042   | 87          |
| aac43f | 2019-10-05 15:00:00 | 63  | 152112   | 88          |
| aac43f | 2019-10-05 18:00:00 | 44  | 152544   | 93          |
| bg112  | 2019-08-22 09:00:00 | 90  | 1242     | 85          |
| bg112  | 2019-08-22 10:00:00 | 89  | 1270     | 85          |
| 32rre  | 2019-01-01 12:00:00 | 20  | 84752    | 74          |

I would like to transform it by car_id and timestamp , where new features - sensor readings 1 and 2 hours ago like this:

| car_id | timestamp           | gas | gas_1h_ago | gas_2h_ago | odometer | o_1h   | o_2h | temperature | t_1h_ago | t_2h_ago |
|--------|---------------------|-----|------------|------------|----------|--------|------|-------------|----------|----------|
| aac43f | 2019-10-05 14:00:00 | 70  | NaN        | NaN        | 152042   | NaN    | NaN  | 87          | NaN      | NaN      |
| aac43f | 2019-10-05 15:00:00 | 63  | 70         | NaN        | 152112   | 152042 | NaN  | 88          | 87       | NaN      |
| aac43f | 2019-10-05 18:00:00 | 44  | NaN        | NaN        | 152544   | NaN    | NaN  | 93          | NaN      | NaN      |
| bg112  | 2019-08-22 09:00:00 | 90  | NaN        | NaN        | 1242     | NaN    | NaN  | 85          | NaN      | NaN      |
| bg112  | 2019-08-22 10:00:00 | 89  | 90         | NaN        | 1270     | 1242   | NaN  | 85          | 85       | NaN      |
| 32rre  | 2019-01-01 12:00:00 | 20  | NaN        | NaN        | 84752    | NaN    | NaN  | 74          | NaN      | NaN      |

I thought I could use the unstack function, but, I сan't think of a solution.

Answer 1

You can use GroupBy.apply

to make a sample for hours using DataFrame.resample . Use DataFrame.sum + DataFrame.shift to transfer the value of the current time for each hour . Return the dataframe to its original rows using DataFrame.reindex . Add a suffix to the columns based on the x hours before using DataFrame.add_sufix Perform this operation for x hours before and join it using pd.concat .

Finally, unite the resulting eldataframe with the original again with pd.concat . Rearrange the columns with DataFrame.set_index + DataFrame.sort_index + DataFrame.reset_index

hours_ago = [1,2]

#Creating a DataFrame by hour ago and concat

df_x_hours_ago= (

pd.concat(

[( df.groupby('car_id')
     .apply(lambda x: x.resample('H',on='timestamp')
                       .sum(min_count=1)
                       .shift(hour))
     .reset_index(level='car_id',drop='car_id')                 
     .reindex(index=df['timestamp'])
     .add_suffix(f'_{hour}h_ago')
     .reset_index(drop=True))

   for hour in hours_ago],
axis=1)

)
#Concat and ordering columns:

new_df=( pd.concat([df,df_x_hours_ago],axis=1)
           .set_index(['car_id','timestamp'])
           .sort_index(axis=1)
           .reset_index() )
print(new_df)

Output

   car_id           timestamp  gas  gas_1h_ago  gas_2h_ago  odometer  \
0  aac43f 2019-10-05 14:00:00   70         NaN         NaN    152042   
1  aac43f 2019-10-05 15:00:00   63        70.0         NaN    152112   
2  aac43f 2019-10-05 18:00:00   44         NaN         NaN    152544   
3   bg112 2019-08-22 09:00:00   90         NaN         NaN      1242   
4   bg112 2019-08-22 10:00:00   89        90.0         NaN      1270   
5   32rre 2019-01-01 12:00:00   20         NaN         NaN     84752   

   odometer_1h_ago  odometer_2h_ago  temperature  temperature_1h_ago  \
0              NaN              NaN           87                 NaN   
1         152042.0              NaN           88                87.0   
2              NaN              NaN           93                 NaN   
3              NaN              NaN           85                 NaN   
4           1242.0              NaN           85                85.0   
5              NaN              NaN           74                 NaN   

   temperature_2h_ago  
0                 NaN  
1                 NaN  
2                 NaN  
3                 NaN  
4                 NaN  
5                 NaN

to fillna with 0 remove min_count=1

Unstack pandas dataframe by two index in the time window

Question

1 answers

solution1
1 2019-11-07 11:14:42

Unstack pandas dataframe by two index in the time window

Question

1 answers

solution1 1 2019-11-07 11:14:42

solution1
1 2019-11-07 11:14:42