简体   繁体   English

使用 dplyr 对多列求和时忽略 NA

[英]Ignoring NA when summing multiple columns with dplyr

I am summing across multiple columns, some that have NA.我正在对多列求和,有些列有 NA。 I am using我正在使用

 dplyr::mutate

and then writing out the arithmetic sum of the columns to get the sum.然后写出列的算术和以获得总和。 But the columns have NA and I would like to treat them as zero.但是列有 NA,我想将它们视为零。 I was able to get it to work with rowSums (see below), but now using mutate.我能够让它与 rowSums 一起工作(见下文),但现在使用 mutate。 Using mutate allows to make it more readable, but can also allow me to subtract columns.使用 mutate 可以使其更具可读性,但也可以让我减去列。 The example is below.示例如下。

require(dplyr)
data(iris)
iris <- tbl_df(iris)
iris[2,3] <- NA
iris <- mutate(iris, sum = Sepal.Length + Petal.Length)

How do I ensure that NA in Petal.Length is handled as zero in the above expression?如何确保 Petal.Length 中的 NA 在上述表达式中被处理为零? I know using rowSums I can do something like:我知道使用 rowSums 我可以执行以下操作:

iris$sum <- rowSums(DF[,c("Sepal.Length","Petal.Length")], na.rm = T)

but with mutate it is easier to set even diff = Sepal.Length - Petal.Length.但是使用 mutate 甚至可以更容易地设置 diff = Sepal.Length - Petal.Length。 What would be a suggested way to accomplish this using mutate?使用 mutate 完成此操作的建议方法是什么?

Note the post is similar to below stackoverflow posts.请注意,该帖子类似于以下 stackoverflow 帖子。

Sum across multiple columns with dplyr 使用 dplyr 对多列求和

Subtract multiple columns ignoring NA 减去多列忽略 NA

The problem with your rowSums is the reference to DF (which is undefined). rowSums的问题是对DF (未定义)的引用。 This works:这有效:

mutate(iris, sum2 = rowSums(cbind(Sepal.Length, Petal.Length), na.rm = T))

For difference, you could of course use a negative: rowSums(cbind(Sepal.Length, -Petal.Length), na.rm = T)对于差异,您当然可以使用负数: rowSums(cbind(Sepal.Length, -Petal.Length), na.rm = T)

The general solution is to use ifelse or similar to set the missing values to 0 (or whatever else is appropriate):一般的解决方案是使用ifelse或类似方法将缺失值设置为 0(或其他任何合适的值):

mutate(iris, sum2 = Sepal.Length + ifelse(is.na(Petal.Length), 0, Petal.Length))

More efficient than ifelse would be an implementation of coalesce , see examples here .ifelse更有效的是coalesce的实现,请参见此处的示例 This uses @krlmlr's answer from the previous link (see bottom for the code or use the kimisc package ).这使用来自上一个链接的@krlmlr 的回答(代码见底部或使用kimisc 包)。

mutate(iris, sum2 = Sepal.Length + coalesce.na(Petal.Length, 0))

To replace missing values data-set wide, there is replace_na in the tidyr package.要替换数据集范围内的缺失值, tidyr包中有replace_na


@krlmlr's coalesce.na , as found here @krlmlr 的coalesce.na如在此处找到的

coalesce.na <- function(x, ...) {
  x.len <- length(x)
  ly <- list(...)
  for (y in ly) {
    y.len <- length(y)
    if (y.len == 1) {
      x[is.na(x)] <- y
    } else {
      if (x.len %% y.len != 0)
        warning('object length is not a multiple of first object length')
      pos <- which(is.na(x))
      x[pos] <- y[(pos - 1) %% y.len + 1]
    }
  }
  x
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM