[英]Join (or merge) data sets based on 2 variables in second data set
I want to join/merge two data sets based on 2 variables of the second data set. 我想基于第二个数据集的2个变量来联接/合并两个数据集。
Described in words, I want to join based on variable 1 ( VAR1
) and if this results in NA
join with variable 2 ( VAR2
). 用言语描述,我想基于变量1( VAR1
)进行VAR1
,如果这导致NA
与变量2( VAR2
) VAR2
。
Here's an example and my solution to this: 这是一个示例,我对此的解决方案:
df_x <- data.frame(VAR0=c("A","P","C","D","Z"), stringsAsFactors = F)
df_y <- data.frame(VAR1=c("A","B","C","D","E"),
VAR2=c("A","F","T","D","Z"),
VAR3=c("YES", "YES", "NO", "MAYBE", "YES"),
stringsAsFactors = F)
require(dplyr)
# LEFT JOIN TWICE TO MEET BOTH CONDITIONS
lj_1 <- left_join(df_x, df_y, by=c("VAR0" = "VAR1"))
lj_2 <- left_join(df_x, df_y, by=c("VAR0" = "VAR2"))
# THEN REPLACE NAs FROM FIRST LEFT JOIN WITH VALUE FROM SECOND LEFT JOIN
ifelse(lj_1$VAR3 %in% NA, lj_2$VAR3, lj_1$VAR3)
# [1] "YES" NA "NO" "MAYBE" "YES"
I was wondering if there is a better way to do that? 我想知道是否有更好的方法可以做到这一点?
We can do the left_join
in a loop and reduce
it to a single vector
by applying coalesce
on the 'VAR3' 我们可以在一个循环中进行left_join
,并通过在'VAR3'上应用coalesce
reduce
其reduce
为单个vector
library(tidyverse)
map(paste0("VAR", 1:2), ~
left_join(df_x, df_y, by = c("VAR0" = .x)) %>%
pull(VAR3)) %>%
reduce(coalesce)
#[1] "YES" NA "NO" "MAYBE" "YES"
Or using base R
或使用base R
pmin(df_y$VAR3[match(df_x$VAR0, df_y$VAR1)],
df_y$VAR3[match(df_x$VAR0, df_y$VAR2)], na.rm = TRUE)
#[1] "YES" NA "NO" "MAYBE" "YES"
Or to avoid using the df
calls, use with
为了避免使用df
调用,请with
with(df_y, with(df_x, pmin(VAR3[match(VAR0, VAR1)],
VAR3[match(VAR0, VAR2)], na.rm = TRUE)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.