简体   繁体   English

根据两列R中的匹配行填充列中的空白

[英]Filling in gaps in column based on matching rows in two columns R

In the df2 I would like to fill in gaps in column d based on matching records in columns b and c between two dataframes. df2我想基于两个数据帧之间的bc列中的匹配记录来填充d列中的间隙。 What would be a quick and elegant way to do that? 什么是快速而优雅的方法呢? It is important to mention it should work well for occasions where matching rows might have different locations in both dataframes. 重要的是要提到它在匹配的行在两个数据帧中可能具有不同位置的情况下应该能很好地工作。

df1 <- data.frame( a = c(1,1,1,1,1,2,2,2,2,2) ,b = rep(seq(41,45,1),each=2), c = c(101:105,101:105), d = LETTERS[seq( from = 1, to = 10 )])
df2 <- data.frame( a = c(1,1,1,1,1,2,2,2,2,2) ,b = rep(seq(41,45,1),each=2), c = c(101:105,101:105), d = c(LETTERS[seq( from = 1, to = 6 )],rep(NA,4)))

> df1
   a  b   c d
1  1 41 101 A
2  1 41 102 B
3  1 42 103 C
4  1 42 104 D
5  1 43 105 E
6  2 43 101 F
7  2 44 102 G
8  2 44 103 H
9  2 45 104 I
10 2 45 105 J
> df2
   a  b   c    d
1  1 41 101    A
2  1 41 102    B
3  1 42 103    C
4  1 42 104    D
5  1 43 105    E
6  2 43 101    F
7  2 44 102 <NA>
8  2 44 103 <NA>
9  2 45 104 <NA>
10 2 45 105 <NA>

The result should be following: 结果应为:

   a  b   c d
1  1 41 101 A
2  1 41 102 B
3  1 42 103 C
4  1 42 104 D
5  1 43 105 E
6  2 43 101 F
7  2 44 102 G
8  2 44 103 H
9  2 45 104 I
10 2 45 105 J

While you can do lookups with match and perhaps %in% , I'd think another (robust) way to do it is with a merge/join: 虽然您可以使用match%in%进行查找,但我认为另一种(健壮)的方法是使用合并/联接:

df2mod <- merge(df2, df1[,c('b','c','d')], by = c("b", "c"), all=TRUE)
df2mod
#     b   c a  d.x d.y
# 1  41 101 1    A   A
# 2  41 102 1    B   B
# 3  42 103 1    C   C
# 4  42 104 1    D   D
# 5  43 101 2    F   F
# 6  43 105 1    E   E
# 7  44 102 2 <NA>   G
# 8  44 103 2 <NA>   H
# 9  45 104 2 <NA>   I
# 10 45 105 2 <NA>   J

In this case, dx are the original df2$d . 在这种情况下, dx是原始df2$d Because your data is factor s, some extra parts are necessary ( as.character and the re factor ). 由于您的数据是s factor ,因此需要一些额外的部分( as.character和re factor )。

df2mod$d <- with(df2mod, ifelse(is.na(d.x), as.character(d.y), as.character(d.x)))
df2mod$d <- factor(df2mod$d, levels = levels(df1$d))
df2mod
#     b   c a  d.x d.y d
# 1  41 101 1    A   A A
# 2  41 102 1    B   B B
# 3  42 103 1    C   C C
# 4  42 104 1    D   D D
# 5  43 101 2    F   F F
# 6  43 105 1    E   E E
# 7  44 102 2 <NA>   G G
# 8  44 103 2 <NA>   H H
# 9  45 104 2 <NA>   I I
# 10 45 105 2 <NA>   J J
df2mod[,c("d.x", "d.y")] <- NULL # cleanup unnecessary columns

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM