[英]Filling in gaps in column based on matching rows in two columns R
In the df2
I would like to fill in gaps in column d
based on matching records in columns b
and c
between two dataframes. 在
df2
我想基于两个数据帧之间的b
和c
列中的匹配记录来填充d
列中的间隙。 What would be a quick and elegant way to do that? 什么是快速而优雅的方法呢? It is important to mention it should work well for occasions where matching rows might have different locations in both dataframes.
重要的是要提到它在匹配的行在两个数据帧中可能具有不同位置的情况下应该能很好地工作。
df1 <- data.frame( a = c(1,1,1,1,1,2,2,2,2,2) ,b = rep(seq(41,45,1),each=2), c = c(101:105,101:105), d = LETTERS[seq( from = 1, to = 10 )])
df2 <- data.frame( a = c(1,1,1,1,1,2,2,2,2,2) ,b = rep(seq(41,45,1),each=2), c = c(101:105,101:105), d = c(LETTERS[seq( from = 1, to = 6 )],rep(NA,4)))
> df1
a b c d
1 1 41 101 A
2 1 41 102 B
3 1 42 103 C
4 1 42 104 D
5 1 43 105 E
6 2 43 101 F
7 2 44 102 G
8 2 44 103 H
9 2 45 104 I
10 2 45 105 J
> df2
a b c d
1 1 41 101 A
2 1 41 102 B
3 1 42 103 C
4 1 42 104 D
5 1 43 105 E
6 2 43 101 F
7 2 44 102 <NA>
8 2 44 103 <NA>
9 2 45 104 <NA>
10 2 45 105 <NA>
The result should be following: 结果应为:
a b c d
1 1 41 101 A
2 1 41 102 B
3 1 42 103 C
4 1 42 104 D
5 1 43 105 E
6 2 43 101 F
7 2 44 102 G
8 2 44 103 H
9 2 45 104 I
10 2 45 105 J
While you can do lookups with match
and perhaps %in%
, I'd think another (robust) way to do it is with a merge/join: 虽然您可以使用
match
或%in%
进行查找,但我认为另一种(健壮)的方法是使用合并/联接:
df2mod <- merge(df2, df1[,c('b','c','d')], by = c("b", "c"), all=TRUE)
df2mod
# b c a d.x d.y
# 1 41 101 1 A A
# 2 41 102 1 B B
# 3 42 103 1 C C
# 4 42 104 1 D D
# 5 43 101 2 F F
# 6 43 105 1 E E
# 7 44 102 2 <NA> G
# 8 44 103 2 <NA> H
# 9 45 104 2 <NA> I
# 10 45 105 2 <NA> J
In this case, dx
are the original df2$d
. 在这种情况下,
dx
是原始df2$d
。 Because your data is factor
s, some extra parts are necessary ( as.character
and the re factor
). 由于您的数据是s
factor
,因此需要一些额外的部分( as.character
和re factor
)。
df2mod$d <- with(df2mod, ifelse(is.na(d.x), as.character(d.y), as.character(d.x)))
df2mod$d <- factor(df2mod$d, levels = levels(df1$d))
df2mod
# b c a d.x d.y d
# 1 41 101 1 A A A
# 2 41 102 1 B B B
# 3 42 103 1 C C C
# 4 42 104 1 D D D
# 5 43 101 2 F F F
# 6 43 105 1 E E E
# 7 44 102 2 <NA> G G
# 8 44 103 2 <NA> H H
# 9 45 104 2 <NA> I I
# 10 45 105 2 <NA> J J
df2mod[,c("d.x", "d.y")] <- NULL # cleanup unnecessary columns
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.