I have two dataframes df_A
and df_B
where each has date, time and a value. An example below:
import pandas as pd
df_A = pd.DataFrame({
'date_A': ["2021-02-01", "2021-02-01", "2021-02-02"],
'time_A': ["22:00:00", "23:00:00", "00:00:00"],
'val_A': [100, 200, 300]})
df_B = pd.DataFrame({
'date_B': ["2021-02-01", "2021-02-01", "2021-02-01", "2021-02-01", "2021-02-02"],
'time_B': ["22:01:12", "22:59:34", "23:00:17", "23:59:57", "00:00:11"],
'val_B': [104, 203, 195, 296, 294]})
I need to join this dataframes but date and time never match. So I want a left join by the closest datetime from df_B
to df_A
. So the output should be:
df_out = pd.DataFrame({
'date_A': ["2021-02-01", "2021-02-01", "2021-02-02"],
'time_A': ["22:00:00", "23:00:00", "00:00:00"],
'val_A': [100, 200, 300],
'date_B': ["2021-02-01", "2021-02-01", "2021-02-01"],
'time_B': ["22:01:12", "23:00:17", "23:59:57"],
'val_B': [104, 195, 296]})
Pandas has a handy merge_asof()
function for these types of problems ( https://pandas.pydata.org/docs/reference/api/pandas.merge_asof.html )
It requires a single key to merge on, so you can create a single date-time column in each dataframe and perform the merge:
df_A['date_time'] = pd.to_datetime(df_A.date_A + " " + df_A.time_A)
df_B['date_time'] = pd.to_datetime(df_B.date_B + " " + df_B.time_B)
# Sort the two dataframes by the new key, as required by merge_asof function
df_A.sort_values(by="date_time", inplace=True, ignore_index=True)
df_B.sort_values(by="date_time", inplace=True, ignore_index=True)
result_df = pd.merge_asof(df_A, df_B, on="date_time", direction="nearest")
Note the direction argument's value is "nearest" as you requested. There are other values you can choose, like "backward" and "forward".
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.