DataFrame organise data based on time interval

Question

I have a DataFrame with timestamped data like below

             Date      Time  Name   Value
0        20210322  08:00:00     A       1 
1        20210322  08:00:00     B       2  
2        20210322  08:28:00     A       3  
3        20210322  08:30:00     B       4  
4        20210322  09:00:00     A       5  
5        20210322  09:30:00     B       6  
6        20210322  09:55:00     A       7  
7        20210322  10:30:00     B       8  
8        20210322  10:30:00     A       9  
9        20210322  11:58:00     A      10  
10       20210322  12:00:00     B      11
...

The time series has value every 30 seconds but every name is not specified at every timestamp

But the final goal is to merge it with an other dataFrame Name datetime

Name                 Datetime  
   A      2021-03-22 08:00:00 
   B      2021-03-22 09:25:15  
   A      2021-03-22 09:30:00
   A      2021-03-22 10:30:00
...

and fill the columns +1h, +2h and so on, where +1h contains the values of the first DataFrame at timestamp DateTime +1h. If the precise data is not available at the specified timestamp, the ideal would be to take to first next available

Name                 Datetime    +0h   +1h   +2h  +3h  ....
   A      2021-03-22 08:00:00      1     5     7    9  
   B      2021-03-22 09:25:15      6     8    11   NA  
   A      2021-03-22 09:30:00      7     9    10   NA    
   A      2021-03-22 10:30:00      9    10    NA   NA
...

Is there a pandas way to do that kind of trick?

Don't hesitate to ask for precision if I have not been clear enough. If you have suggestion for a better title for this post, please share, I don't don't know if the current one is precise enough.

Thanks in advance Irpie

Answer 1

I think you need merge_asof first:

df1['Datetime'] = pd.to_datetime(df1['Datetime'])
#convert to datetimes
df['Datetime1'] = pd.to_datetime(df['Date'].astype(str) + ' ' + df['Time'])


df = pd.merge_asof(df, df1, left_on='Datetime1',right_on='Datetime', by='Name')
print (df)
        Date      Time Name  Value           Datetime1            Datetime
0   20210322  08:00:00    A      1 2021-03-22 08:00:00 2021-03-22 08:00:00
1   20210322  08:00:00    B      2 2021-03-22 08:00:00                 NaT
2   20210322  08:28:00    A      3 2021-03-22 08:28:00 2021-03-22 08:00:00
3   20210322  08:30:00    B      4 2021-03-22 08:30:00                 NaT
4   20210322  09:00:00    A      5 2021-03-22 09:00:00 2021-03-22 08:00:00
5   20210322  09:30:00    B      6 2021-03-22 09:30:00 2021-03-22 09:25:15
6   20210322  09:55:00    A      7 2021-03-22 09:55:00 2021-03-22 09:30:00
7   20210322  10:30:00    B      8 2021-03-22 10:30:00 2021-03-22 09:25:15
8   20210322  10:30:00    A      9 2021-03-22 10:30:00 2021-03-22 10:30:00
9   20210322  11:58:00    A     10 2021-03-22 11:58:00 2021-03-22 10:30:00
10  20210322  12:00:00    B     11 2021-03-22 12:00:00 2021-03-22 09:25:15

# #subtract hours and floor, convert to integers
df['hour'] = (df['Datetime1'].sub(df['Datetime']).dt.floor('H')
                            .dt.total_seconds()
                            .div(3600)
                            # .astype(int)
                            )

#pivoting with sum, rename columns names and cumualtive sum per rows
f = lambda x: f'+{x}h'
df = (df.pivot_table(index=['Name','Datetime'], 
                    columns='hour', 
                    values='Value', 
                    aggfunc='sum',
                    fill_value=0)
          .rename(columns=f))
print (df)
hour                      +0.0h  +1.0h  +2.0h
Name Datetime                                
A    2021-03-22 08:00:00      4      5      0
     2021-03-22 09:30:00      7      0      0
     2021-03-22 10:30:00      9     10      0
B    2021-03-22 09:25:15      6      8     11

DataFrame organise data based on time interval

Question

1 answers

solution1
1 ACCPTED 2021-03-23 11:07:07

DataFrame organise data based on time interval

Question

1 answers

solution1 1 ACCPTED 2021-03-23 11:07:07

solution1
1 ACCPTED 2021-03-23 11:07:07