[英]Merging two dataframes based on closest match without exact match
I want to merge two dataframes that both have a date
column. 我想合并两个都有date
列的数据框。 However, the dates are not always aligned, so I want to merge in such a way that all the data in df1
is retained and the data from df2
is placed alongside the nearest matching date. 但是,日期并不总是对齐的,因此我想以一种合并方式来保留df1
中的所有数据,并将df2
的数据放置在最接近的匹配日期旁边。
#Example dataframes
set.seed(5)
df1 <- data.frame(date=as.Date(c('2001-01-02','2001-01-03','2001-01-06','2001-01-15','2001-01-18','2001-01-21')),
val=rnorm(6))
df2 <- data.frame(date=as.Date(c('2001-01-01', '2001-01-08', '2001-01-15', '2001-01-21')),
info=rnorm(4))
df1
date val
1 2001-01-02 -0.84085548
2 2001-01-03 1.38435934
3 2001-01-06 -1.25549186
4 2001-01-15 0.07014277
5 2001-01-18 1.71144087
6 2001-01-21 -0.60290798
df2
date info
1 2001-01-01 -0.4721664
2 2001-01-08 -0.6353713
3 2001-01-15 -0.2857736
4 2001-01-21 0.1381082
So the date
column in the above dataframes don't all match, but I want my final dataframe to look like this, which is created by matching the date
in df2
with its closest date
in df1
: 因此, date
在上面dataframes列不全部比赛,但我想我的最终数据帧到这个样子,这是通过匹配创建date
在df2
与它最接近的date
在df1
:
df1merged
date val info
1 2001-01-02 -0.84085548 -0.4721664
2 2001-01-03 1.38435934 -0.4721664
3 2001-01-06 -1.25549186 -0.6353713
4 2001-01-15 0.07014277 -0.2857736
5 2001-01-18 1.71144087 -0.2857736
6 2001-01-21 -0.60290798 0.1381082
This looks like a pretty good use case for rolling joins (Some good examples here and here , and here ) in data.table
. 这看起来是一个不错的用例滚动连接(一些很好的例子在这里和这里 ,和这里的) data.table
。
library(data.table)
## Convert to data.tables
setDT(df1); setDT(df2)
## Set keys as date for both
setkey(df1, date); setkey(df2, date)
## Perform a rolling join
df2[df1, roll = "nearest"]
# date info val
# 1: 2001-01-02 -0.4721664 -0.84085548
# 2: 2001-01-03 -0.4721664 1.38435934
# 3: 2001-01-06 -0.6353713 -1.25549186
# 4: 2001-01-15 -0.2857736 0.07014277
# 5: 2001-01-18 -0.2857736 1.71144087
# 6: 2001-01-21 0.1381082 -0.60290798
You could do this... 你可以做...
df1$info <- sapply(df1$date, function(x) df2$info[which.min(abs(df2$date-x))])
df1
date val info
1 2001-01-02 -0.84085548 -0.4721664
2 2001-01-03 1.38435934 -0.4721664
3 2001-01-06 -1.25549186 -0.6353713
4 2001-01-15 0.07014277 -0.2857736
5 2001-01-18 1.71144087 -0.2857736
6 2001-01-21 -0.60290798 0.1381082
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.