简体   繁体   English

根据最接近的匹配(不完全匹配)合并两个数据帧

[英]Merging two dataframes based on closest match without exact match

I want to merge two dataframes that both have a date column. 我想合并两个都有date列的数据框。 However, the dates are not always aligned, so I want to merge in such a way that all the data in df1 is retained and the data from df2 is placed alongside the nearest matching date. 但是,日期并不总是对齐的,因此我想以一种合并方式来保留df1中的所有数据,并将df2的数据放置在最接近的匹配日期旁边。

#Example dataframes
set.seed(5)
df1 <- data.frame(date=as.Date(c('2001-01-02','2001-01-03','2001-01-06','2001-01-15','2001-01-18','2001-01-21')), 
                  val=rnorm(6))
df2 <- data.frame(date=as.Date(c('2001-01-01', '2001-01-08', '2001-01-15', '2001-01-21')), 
                  info=rnorm(4))
df1
        date         val
1 2001-01-02 -0.84085548
2 2001-01-03  1.38435934
3 2001-01-06 -1.25549186
4 2001-01-15  0.07014277
5 2001-01-18  1.71144087
6 2001-01-21 -0.60290798
df2
        date       info
1 2001-01-01 -0.4721664
2 2001-01-08 -0.6353713
3 2001-01-15 -0.2857736
4 2001-01-21  0.1381082

So the date column in the above dataframes don't all match, but I want my final dataframe to look like this, which is created by matching the date in df2 with its closest date in df1 : 因此, date在上面dataframes列不全部比赛,但我想我的最终数据帧到这个样子,这是通过匹配创建datedf2与它最接近的datedf1

df1merged
        date         val       info
1 2001-01-02 -0.84085548 -0.4721664
2 2001-01-03  1.38435934 -0.4721664
3 2001-01-06 -1.25549186 -0.6353713
4 2001-01-15  0.07014277 -0.2857736
5 2001-01-18  1.71144087 -0.2857736
6 2001-01-21 -0.60290798  0.1381082

This looks like a pretty good use case for rolling joins (Some good examples here and here , and here ) in data.table . 这看起来是一个不错的用例滚动连接(一些很好的例子在这里这里 ,和这里 data.table

library(data.table)
## Convert to data.tables
setDT(df1);   setDT(df2)

## Set keys as date for both
setkey(df1, date);  setkey(df2, date)

## Perform a rolling join
df2[df1, roll = "nearest"]

#          date       info         val
# 1: 2001-01-02 -0.4721664 -0.84085548
# 2: 2001-01-03 -0.4721664  1.38435934
# 3: 2001-01-06 -0.6353713 -1.25549186
# 4: 2001-01-15 -0.2857736  0.07014277
# 5: 2001-01-18 -0.2857736  1.71144087
# 6: 2001-01-21  0.1381082 -0.60290798

You could do this... 你可以做...

df1$info <- sapply(df1$date, function(x) df2$info[which.min(abs(df2$date-x))])

df1
        date         val       info
1 2001-01-02 -0.84085548 -0.4721664
2 2001-01-03  1.38435934 -0.4721664
3 2001-01-06 -1.25549186 -0.6353713
4 2001-01-15  0.07014277 -0.2857736
5 2001-01-18  1.71144087 -0.2857736
6 2001-01-21 -0.60290798  0.1381082

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 匹配两个数据框中具有最接近值的行 - Match rows from two dataframes with closest values R:根据最接近的匹配和每行中两个值的完全匹配来合并查找表中的行 - R: Merge rows in lookup table based on closest match and exact match for two values in each row 基于一列的完全匹配和两列的模糊匹配对两个数据帧的内部连接 - inner join on two dataframes based on an exact match for one column and fuzzy match for two columns 根据 R 中的一列中的完全匹配合并两个数据帧并在另一列中的错误内匹配 - Merge two dataframes based on an exact match in one column and match within an error in another column in R 根据两个数据帧之间的匹配替换值 - Replace value based on match between two dataframes 根据匹配变量合并R中长度不均匀的数据帧? - Merging dataframes of uneven length in R based on match variable? 基于部分字符串匹配比较两个数据帧的两列 - Comparing two columns of two dataframes based on partial string match R - 如何根据多个条件匹配两个数据帧? - R - How to match two dataframes based on multiple conditions? 合并基于一列的两个表(部分匹配或逗号分隔的列)? - Merging two tables based on a column (partial match or comma separated column)? 匹配两个数据框的行名和列名 - Match rownames and colnames of two dataframes
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM