简体   繁体   中英

DataFrame organise data based on time interval

I have a DataFrame with timestamped data like below

             Date      Time  Name   Value
0        20210322  08:00:00     A       1 
1        20210322  08:00:00     B       2  
2        20210322  08:28:00     A       3  
3        20210322  08:30:00     B       4  
4        20210322  09:00:00     A       5  
5        20210322  09:30:00     B       6  
6        20210322  09:55:00     A       7  
7        20210322  10:30:00     B       8  
8        20210322  10:30:00     A       9  
9        20210322  11:58:00     A      10  
10       20210322  12:00:00     B      11
...

The time series has value every 30 seconds but every name is not specified at every timestamp

But the final goal is to merge it with an other dataFrame Name datetime

Name                 Datetime  
   A      2021-03-22 08:00:00 
   B      2021-03-22 09:25:15  
   A      2021-03-22 09:30:00
   A      2021-03-22 10:30:00
...

and fill the columns +1h, +2h and so on, where +1h contains the values of the first DataFrame at timestamp DateTime +1h. If the precise data is not available at the specified timestamp, the ideal would be to take to first next available

Name                 Datetime    +0h   +1h   +2h  +3h  ....
   A      2021-03-22 08:00:00      1     5     7    9  
   B      2021-03-22 09:25:15      6     8    11   NA  
   A      2021-03-22 09:30:00      7     9    10   NA    
   A      2021-03-22 10:30:00      9    10    NA   NA
...

Is there a pandas way to do that kind of trick?

Don't hesitate to ask for precision if I have not been clear enough. If you have suggestion for a better title for this post, please share, I don't don't know if the current one is precise enough.

Thanks in advance Irpie

I think you need merge_asof first:

df1['Datetime'] = pd.to_datetime(df1['Datetime'])
#convert to datetimes
df['Datetime1'] = pd.to_datetime(df['Date'].astype(str) + ' ' + df['Time'])


df = pd.merge_asof(df, df1, left_on='Datetime1',right_on='Datetime', by='Name')
print (df)
        Date      Time Name  Value           Datetime1            Datetime
0   20210322  08:00:00    A      1 2021-03-22 08:00:00 2021-03-22 08:00:00
1   20210322  08:00:00    B      2 2021-03-22 08:00:00                 NaT
2   20210322  08:28:00    A      3 2021-03-22 08:28:00 2021-03-22 08:00:00
3   20210322  08:30:00    B      4 2021-03-22 08:30:00                 NaT
4   20210322  09:00:00    A      5 2021-03-22 09:00:00 2021-03-22 08:00:00
5   20210322  09:30:00    B      6 2021-03-22 09:30:00 2021-03-22 09:25:15
6   20210322  09:55:00    A      7 2021-03-22 09:55:00 2021-03-22 09:30:00
7   20210322  10:30:00    B      8 2021-03-22 10:30:00 2021-03-22 09:25:15
8   20210322  10:30:00    A      9 2021-03-22 10:30:00 2021-03-22 10:30:00
9   20210322  11:58:00    A     10 2021-03-22 11:58:00 2021-03-22 10:30:00
10  20210322  12:00:00    B     11 2021-03-22 12:00:00 2021-03-22 09:25:15

# #subtract hours and floor, convert to integers
df['hour'] = (df['Datetime1'].sub(df['Datetime']).dt.floor('H')
                            .dt.total_seconds()
                            .div(3600)
                            # .astype(int)
                            )

#pivoting with sum, rename columns names and cumualtive sum per rows
f = lambda x: f'+{x}h'
df = (df.pivot_table(index=['Name','Datetime'], 
                    columns='hour', 
                    values='Value', 
                    aggfunc='sum',
                    fill_value=0)
          .rename(columns=f))
print (df)
hour                      +0.0h  +1.0h  +2.0h
Name Datetime                                
A    2021-03-22 08:00:00      4      5      0
     2021-03-22 09:30:00      7      0      0
     2021-03-22 10:30:00      9     10      0
B    2021-03-22 09:25:15      6      8     11

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM