简体   繁体   中英

Python pandas - How do I merge two data frames based on dates that are not consistent in both?

I currently have two very large data sets:

df1 :

                    created_at  PM1.0_CF1_ug/m3  ...  PM2.5_ATM_ug/m3  Unnamed: 9
0    2019-08-08 18:00:00+00:00             4.46  ...             8.78         NaN
1    2019-08-08 19:00:00+00:00             0.00  ...             0.00         NaN

df2 :

                    created_at  REF
0    2019-08-08 17:00:00+00:00             1.08
1    2019-08-08 18:00:00+00:00            84.31

Not all of the created_at values given in df1 apprear in df2 , which is a smaller data frame than the first.

What I would like to do is merge/join the two tables based on the created_at values given in df2 and have a REF column in the merged table which only shows up on the dates that were originally in df2 .

Here is an example of what I would like:

                    created_at  PM1.0_CF1_ug/m3  ...  PM2.5_ATM_ug/m3  Unnamed: 9         REF
0    2019-08-08 18:00:00+00:00             4.46  ...             8.78         NaN       84.31
1    2019-08-08 19:00:00+00:00             0.00  ...             0.00         NaN         NaN

Maybe it's possible to do this in SQL and then convert it to a pandas DF, however I'm familiar with SQL joins.

Thanks!

You should look into pd.merge_asof and specify a tolerance. Or, merge on just the dates instead of the datetimes.

In SQL, you would typically use a left join to optionnaly bring the matching row from df2 :

select df1.*, df2.ref
from df1
left join df2 on df2.created_at = df1.created_at

When there is no match in df2 , column ref will come up as null in the resultset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM