简体   繁体   English

通过匹配两行来合并不相等的数据帧,将 R 中的缺失值替换为 0

[英]Merge unequal dataframes by matching two rows replace with 0 the missing values in R

I would like to create a new data frame by merging two unequal data frames by matching two columns and replace with 0 the missing values.我想通过匹配两列合并两个不相等的数据框并用 0 替换缺失值来创建一个新的数据框。 These are two examples of the data frames I have:这是我拥有的数据框的两个示例:

df1
ID YEAR INTERVIEW  ID_HOUSEHOLD
1    2017           300
1    2018           300
1    2019           300
2    2017           150
2    2018           150
2    2019           150
3    2017           420
3    2018           420

df2
ID YEAR INTERVIEW  YEARS_EDU
1    2017           10
1    2018           10
1    2019           10
3    2017           3
3    2018           3

*note that in the second data frame I don´t have information for individual 2 I would like to get the following data frame: *请注意,在第二个数据框中,我没有个人 2 的信息,我想获得以下数据框:

df3
df1
ID YEAR INTERVIEW  ID_HOUSEHOLD  YEARS_EDU
1    2017           300           10
1    2018           300           10
1    2019           300           10
2    2017           150           0
2    2018           150           0
2    2019           150           0
3    2017           420           3
3    2018           420           3

I am trying:我在尝试:

df3<-merge(df1,df2, by="ID", all=TRUE)
df3<-merge(df1,df2, by="ID","YEAR_INTERVIEW", all=TRUE)

The first option replicates hundreds of ID observations with years of interviews while the second gives me 0 values.第一个选项通过多年的采访复制了数百个 ID 观察结果,而第二个选项给了我 0 个值。

Any help would be much appreciated:) THANK YOU任何帮助将不胜感激:) 谢谢

The by needs to be a vector ie we can create a vector with c() . by需要是一个vector ,即我们可以使用c()创建一个向量。 Also, all = TRUE , is a full join, but here, it should be a left join, so it is all.x = TRUE .此外, all = TRUE是一个完全连接,但在这里,它应该是一个左连接,所以它是all.x = TRUE If there is no match, then the element will be NA by default如果没有匹配,则元素默认为NA

out <- merge(df1,df2, by=c("ID","YEAR_INTERVIEW"), all.x=TRUE)

The NA s can be converted to 0 NA可以转换为 0

out$YEARS_EDU[is.na(out$YEARS_EDU)] <- 0

-output -输出

out
#  ID YEAR_INTERVIEW ID_HOUSEHOLD YEARS_EDU
#1  1           2017          300        10
#2  1           2018          300        10
#3  1           2019          300        10
#4  2           2017          150         0
#5  2           2018          150         0
#6  2           2019          150         0
#7  3           2017          420         3
#8  3           2018          420         3

data数据

df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), 
 YEAR_INTERVIEW = c(2017L, 
2018L, 2019L, 2017L, 2018L, 2019L, 2017L, 2018L), ID_HOUSEHOLD = c(300L, 
300L, 300L, 150L, 150L, 150L, 420L, 420L)), class = "data.frame",
row.names = c(NA, 
-8L))


df2 <- structure(list(ID = c(1L, 1L, 1L, 3L, 3L), 
YEAR_INTERVIEW = c(2017L, 
2018L, 2019L, 2017L, 2018L), YEARS_EDU = c(10L, 10L, 10L, 3L, 
3L)), class = "data.frame", row.names = c(NA, -5L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM