简体   繁体   English

完全匹配某些列,部分匹配 inner_join

[英]Match some columns exactly, and some partially with inner_join

I have two dataframes from different sources that refer to the same people, but due to errors from self-reported data, the dates may be slightly off.我有两个来自不同来源的数据框,它们指的是同一个人,但由于自我报告数据的错误,日期可能略有偏差。

Example data:示例数据:

df1 <- data.frame(name= c("Ann", "Betsy", "Charlie", "Dave"), 
                  dob= c(as.Date("2000-01-01", "%Y-%m-%d"), as.Date("2001-01-01", "%Y-%m-%d"), 
                         as.Date("2002-01-01", "%Y-%m-%d"), as.Date("2003-01-01", "%Y-%m-%d")), 
                  stringsAsFactors=FALSE)

df2 <- data.frame(name= c("Ann", "Charlie", "Elmer", "Fred"), 
                  dob= c(as.Date("2000-01-11", "%Y-%m-%d"), as.Date("2004-01-01", "%Y-%m-%d"), 
                         as.Date("2001-01-01", "%Y-%m-%d"), as.Date("2006-01-01", "%Y-%m-%d")), 
                  stringsAsFactors=FALSE)

I want to match by exact name, with dplyr like:我想用确切的名字匹配,像 dplyr 这样的:

library(dplyr)
inner_join(df1, df2, by = c("name"))

name    dob.x   dob.y
Ann     2000-01-01  2000-01-11
Charlie     2002-01-01  2004-01-01

but also by dates of birth (dob) within 30 days, with the fuzzyjoin package like:也可以按 30 天内的出生日期 (dob),使用 fuzzyjoin 包,例如:

library(fuzzyjoin)

difference_inner_join(df1, df2, by=c("dob"), max_dist = 30)

name.x  dob.x   name.y  dob.y
Ann     2000-01-01  Ann     2000-01-11
Betsy   2001-01-01  Elmer   2001-01-01

But combine both criteria, so that only Ann would be returned但是结合这两个标准,这样只有 Ann 会被返回

Relying on dplyr and base R alone.仅依靠 dplyr 和 base R。 I rarely need fuzzy joins.我很少需要模糊连接。 inner_join and subsequently filter usually is enough inner_join和随后的filter通常就足够了

inner_join(df1, df2, by = c("name")) %>% 
  filter(abs(difftime(dob.x,dob.y, units = "days"))<30)

result结果

  name      dob.x      dob.y
 1  Ann 2000-01-01 2000-01-11

Well you could do this:那么你可以这样做:

 difference_inner_join(df1, df2, by=c("dob"), max_dist = 30) %>%
  filter(name.x == name.y)

  name.x      dob.x name.y      dob.y
1    Ann 2000-01-01    Ann 2000-01-11

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM