I currently have two very large data sets:
df1 :
created_at PM1.0_CF1_ug/m3 ... PM2.5_ATM_ug/m3 Unnamed: 9
0 2019-08-08 18:00:00+00:00 4.46 ... 8.78 NaN
1 2019-08-08 19:00:00+00:00 0.00 ... 0.00 NaN
df2 :
created_at REF
0 2019-08-08 17:00:00+00:00 1.08
1 2019-08-08 18:00:00+00:00 84.31
Not all of the created_at
values given in df1 apprear in df2 , which is a smaller data frame than the first.
What I would like to do is merge/join the two tables based on the created_at
values given in df2 and have a REF
column in the merged table which only shows up on the dates that were originally in df2 .
Here is an example of what I would like:
created_at PM1.0_CF1_ug/m3 ... PM2.5_ATM_ug/m3 Unnamed: 9 REF
0 2019-08-08 18:00:00+00:00 4.46 ... 8.78 NaN 84.31
1 2019-08-08 19:00:00+00:00 0.00 ... 0.00 NaN NaN
Maybe it's possible to do this in SQL and then convert it to a pandas DF, however I'm familiar with SQL joins.
Thanks!
You should look into pd.merge_asof
and specify a tolerance. Or, merge on just the dates instead of the datetimes.
In SQL, you would typically use a left join
to optionnaly bring the matching row from df2
:
select df1.*, df2.ref
from df1
left join df2 on df2.created_at = df1.created_at
When there is no match in df2
, column ref
will come up as null
in the resultset.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.