[英]dplyr inner_join with NAs on character columns
I have two equal data frames 我有两个相同的数据帧
a <- c(1,2,3)
b <- c(3,2,1)
c <- c('a','b',NA)
df1 <- data.frame(a=a, b=b, c=c, stringsAsFactors=FALSE)
df2 <- data.frame(a=a, b=b, c=c, stringsAsFactors=FALSE)
I would like to use dplyr::inner_join
to 我想用
dplyr::inner_join
来
"return all rows from x where there are matching values in y, and all columns from x and y" dplyr documentation
“返回x中匹配值的所有行,x和y中的所有列” dplyr文档
(which is everything as they are equal) but it doesn't seem to work with an NA
in column c
(type chr
). (它们是相同的一切)但它似乎不适用于列
c
(类型chr
)中的NA
。 Is this standard behaviour to not join on the NA
s? 这种标准行为是否不加入
NA
?
For example 例如
library(dplyr)
> inner_join(df1, df2)
Joining by: c("a", "b", "c")
a b c
1 1 3 a
2 2 2 b
doesn't join on the NA
. 没有加入
NA
。 However, I would like it to return the same as merge
但是,我希望它返回与
merge
相同
> merge(df1, df2)
a b c
1 1 3 a
2 2 2 b
3 3 1 <NA>
Have I misunderstood how inner_join
works in this instance and is this behaving as described? 我是否误解了
inner_join
在这个实例中是如何工作的并且这个行为inner_join
?
Further Detail 更多细节
inner_join
matches NA
on a numeric column inner_join
匹配数字列上的NA
a <- c(1,2,3)
b <- c(3,2,NA)
c <- c('a','b','c')
df1 <- data.frame(a=a, b=b, c=c, stringsAsFactors=FALSE)
df2 <- data.frame(a=a, b=b, c=c, stringsAsFactors=FALSE)
> inner_join(df1, df2)
Joining by: c("a", "b", "c")
a b c
1 1 3 a
2 2 2 b
3 3 NA c
Edit 编辑
As @thelatemail points out, inner_join
also works as merge
when the NA
is in a factor column 正如@thelatemail指出的那样,当
NA
在因子列中时, inner_join
也可以作为merge
df1 <- data.frame(a=a, b=b, c=c, stringsAsFactors=T)
df2 <- data.frame(a=a, b=b, c=c, stringsAsFactors=T)
inner_join(df1, df2)
Joining by: c("a", "b", "c")
a b c
1 1 3 a
2 2 2 b
3 3 3 <NA>
Edit 2 编辑2
Thanks to @shadow for pointing out this is a known issue here and here 由于@shadow指出这个是一个已知的问题, 在这里和这里
This issue was occurring in version 0.4.1. 此问题发生在0.4.1版本中。 This is now fixed in version 0.4.2:
现在已在版本0.4.2中修复此问题:
sessionInfo()
...
other attached packages:
[1] dplyr_0.4.2
...
> inner_join(df1, df2)
Joining by: c("a", "b", "c")
a b c
1 1 3 a
2 2 2 b
3 3 1 <NA>
Check with merge: 检查合并:
> merge(df1, df2)
a b c
1 1 3 a
2 2 2 b
3 3 1 <NA>
> all.equal(inner_join(df1, df2), merge(df1, df2))
Joining by: c("a", "b", "c")
[1] TRUE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.