简体   繁体   中英

Pandas - how to merge dataframes on datetime column of different format?

I have two dataframes that I need to merge based on date. The first dataframe looks like:

             Time Stamp  HP_1H_mean  Coolant1_1H_mean  Extreme_1H_mean
0   2019-07-26 07:00:00  410.637966        414.607081              0.0   
1   2019-07-26 08:00:00  403.521735        424.787366              0.0   
2   2019-07-26 09:00:00  403.143925        425.739639              0.0   
3   2019-07-26 10:00:00  410.542895        426.210538              0.0
...
17  2019-07-27 00:00:00    0.000000          0.000000              0.0   
18  2019-07-27 01:00:00    0.000000          0.000000              0.0   
19  2019-07-27 02:00:00    0.000000          0.000000              0.0   
20  2019-07-27 03:00:00    0.000000          0.000000              0.0 

The second is like this:

    Time Stamp  Qty Compl
0   2019-07-26  150
1   2019-07-27  20
2   2019-07-29  230
3   2019-07-30  230
4   2019-07-31  170

Both Time Stamp columns are datetime64[ns] . I wanted to merge left, and forward fill the date into all the other rows for a day. My problem is at the merge, the Qty Compl from the second df is applied at midnight of each day, and some days does not have a midnight time stamp, such as the first day in the first dataframe.

Is there a way to merge and match every row that contains the same day? The desired output would look like this:

         Time Stamp  HP_1H_mean  Coolant1_1H_mean  Extreme_1H_mean    Qty Compl
0   2019-07-26 07:00:00  410.637966        414.607081              0.0      150   
1   2019-07-26 08:00:00  403.521735        424.787366              0.0      150
2   2019-07-26 09:00:00  403.143925        425.739639              0.0      150
3   2019-07-26 10:00:00  410.542895        426.210538              0.0      150
...
17  2019-07-27 00:00:00    0.000000          0.000000              0.0      20
18  2019-07-27 01:00:00    0.000000          0.000000              0.0      20
19  2019-07-27 02:00:00    0.000000          0.000000              0.0      20
20  2019-07-27 03:00:00    0.000000          0.000000              0.0      20

Use merge_asof with sorted both DataFrames by datetimes:

#if necessary
df1['Time Stamp'] = pd.to_datetime(df1['Time Stamp'])
df2['Time Stamp'] = pd.to_datetime(df2['Time Stamp'])
df1 = df1.sort_values('Time Stamp')
df2 = df2.sort_values('Time Stamp')

df = pd.merge_asof(df1, df2, on='Time Stamp')
print (df)
           Time Stamp  HP_1H_mean  Coolant1_1H_mean  Extreme_1H_mean  \
0 2019-07-26 07:00:00  410.637966        414.607081              0.0   
1 2019-07-26 08:00:00  403.521735        424.787366              0.0   
2 2019-07-26 09:00:00  403.143925        425.739639              0.0   
3 2019-07-26 10:00:00  410.542895        426.210538              0.0   
4 2019-07-27 00:00:00    0.000000          0.000000              0.0   
5 2019-07-27 01:00:00    0.000000          0.000000              0.0   
6 2019-07-27 02:00:00    0.000000          0.000000              0.0   
7 2019-07-27 03:00:00    0.000000          0.000000              0.0   

   Qty Compl  
0        150  
1        150  
2        150  
3        150  
4         20  
5         20  
6         20  
7         20  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM