[英]Merge unequal dataframes by matching two rows replace with 0 the missing values in R
I would like to create a new data frame by merging two unequal data frames by matching two columns and replace with 0 the missing values.我想通过匹配两列合并两个不相等的数据框并用 0 替换缺失值来创建一个新的数据框。 These are two examples of the data frames I have:
这是我拥有的数据框的两个示例:
df1
ID YEAR INTERVIEW ID_HOUSEHOLD
1 2017 300
1 2018 300
1 2019 300
2 2017 150
2 2018 150
2 2019 150
3 2017 420
3 2018 420
df2
ID YEAR INTERVIEW YEARS_EDU
1 2017 10
1 2018 10
1 2019 10
3 2017 3
3 2018 3
*note that in the second data frame I don´t have information for individual 2 I would like to get the following data frame: *请注意,在第二个数据框中,我没有个人 2 的信息,我想获得以下数据框:
df3
df1
ID YEAR INTERVIEW ID_HOUSEHOLD YEARS_EDU
1 2017 300 10
1 2018 300 10
1 2019 300 10
2 2017 150 0
2 2018 150 0
2 2019 150 0
3 2017 420 3
3 2018 420 3
I am trying:我在尝试:
df3<-merge(df1,df2, by="ID", all=TRUE)
df3<-merge(df1,df2, by="ID","YEAR_INTERVIEW", all=TRUE)
The first option replicates hundreds of ID observations with years of interviews while the second gives me 0 values.第一个选项通过多年的采访复制了数百个 ID 观察结果,而第二个选项给了我 0 个值。
Any help would be much appreciated:) THANK YOU任何帮助将不胜感激:) 谢谢
The by
needs to be a vector
ie we can create a vector with c()
. by
需要是一个vector
,即我们可以使用c()
创建一个向量。 Also, all = TRUE
, is a full join, but here, it should be a left join, so it is all.x = TRUE
.此外,
all = TRUE
是一个完全连接,但在这里,它应该是一个左连接,所以它是all.x = TRUE
。 If there is no match, then the element will be NA
by default如果没有匹配,则元素默认为
NA
out <- merge(df1,df2, by=c("ID","YEAR_INTERVIEW"), all.x=TRUE)
The NA
s can be converted to 0 NA
可以转换为 0
out$YEARS_EDU[is.na(out$YEARS_EDU)] <- 0
-output -输出
out
# ID YEAR_INTERVIEW ID_HOUSEHOLD YEARS_EDU
#1 1 2017 300 10
#2 1 2018 300 10
#3 1 2019 300 10
#4 2 2017 150 0
#5 2 2018 150 0
#6 2 2019 150 0
#7 3 2017 420 3
#8 3 2018 420 3
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L),
YEAR_INTERVIEW = c(2017L,
2018L, 2019L, 2017L, 2018L, 2019L, 2017L, 2018L), ID_HOUSEHOLD = c(300L,
300L, 300L, 150L, 150L, 150L, 420L, 420L)), class = "data.frame",
row.names = c(NA,
-8L))
df2 <- structure(list(ID = c(1L, 1L, 1L, 3L, 3L),
YEAR_INTERVIEW = c(2017L,
2018L, 2019L, 2017L, 2018L), YEARS_EDU = c(10L, 10L, 10L, 3L,
3L)), class = "data.frame", row.names = c(NA, -5L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.