简体   繁体   English

dplyr inner_join与字符列上的NAs

[英]dplyr inner_join with NAs on character columns

I have two equal data frames 我有两个相同的数据帧

a <- c(1,2,3)
b <- c(3,2,1)
c <- c('a','b',NA)

df1 <- data.frame(a=a, b=b, c=c, stringsAsFactors=FALSE)
df2 <- data.frame(a=a, b=b, c=c, stringsAsFactors=FALSE)

I would like to use dplyr::inner_join to 我想用dplyr::inner_join

"return all rows from x where there are matching values in y, and all columns from x and y" dplyr documentation “返回x中匹配值的所有行,x和y中的所有列” dplyr文档

(which is everything as they are equal) but it doesn't seem to work with an NA in column c (type chr ). (它们是相同的一切)但它似乎不适用于列c (类型chr )中的NA Is this standard behaviour to not join on the NA s? 这种标准行为是否不加入NA

For example 例如

library(dplyr)
> inner_join(df1, df2)
Joining by: c("a", "b", "c")
  a b c
1 1 3 a
2 2 2 b

doesn't join on the NA . 没有加入NA However, I would like it to return the same as merge 但是,我希望它返回与merge相同

> merge(df1, df2)
  a b    c
1 1 3    a
2 2 2    b
3 3 1 <NA>

Have I misunderstood how inner_join works in this instance and is this behaving as described? 我是否误解了inner_join在这个实例中是如何工作的并且这个行为inner_join

Further Detail 更多细节

inner_join matches NA on a numeric column inner_join匹配数字列上的NA

a <- c(1,2,3)
b <- c(3,2,NA)
c <- c('a','b','c')

df1 <- data.frame(a=a, b=b, c=c, stringsAsFactors=FALSE)
df2 <- data.frame(a=a, b=b, c=c, stringsAsFactors=FALSE)

> inner_join(df1, df2)
Joining by: c("a", "b", "c")
  a  b c
1 1  3 a
2 2  2 b
3 3 NA c

Edit 编辑

As @thelatemail points out, inner_join also works as merge when the NA is in a factor column 正如@thelatemail指出的那样,当NA在因子列中时, inner_join也可以作为merge

df1 <- data.frame(a=a, b=b, c=c, stringsAsFactors=T)
df2 <- data.frame(a=a, b=b, c=c, stringsAsFactors=T)
inner_join(df1, df2)
Joining by: c("a", "b", "c")
  a b    c
1 1 3    a
2 2 2    b
3 3 3 <NA>

Edit 2 编辑2

Thanks to @shadow for pointing out this is a known issue here and here 由于@shadow指出这个是一个已知的问题, 在这里这里

This issue was occurring in version 0.4.1. 此问题发生在0.4.1版本中。 This is now fixed in version 0.4.2: 现在已在版本0.4.2中修复此问题:

sessionInfo()
...
other attached packages:
[1] dplyr_0.4.2
...

> inner_join(df1, df2)
Joining by: c("a", "b", "c")
  a b    c
1 1 3    a
2 2 2    b
3 3 1 <NA>

Check with merge: 检查合并:

> merge(df1, df2)
  a b    c
1 1 3    a
2 2 2    b
3 3 1 <NA>

> all.equal(inner_join(df1, df2), merge(df1, df2))
Joining by: c("a", "b", "c")
[1] TRUE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM