简体   繁体   中英

Aggregation of NAs results in 0s instead of NAs

I am trying to find the sum of three different variables in a data frame while grouping by another variable, but there are several NAs. The sum of the NAs is interpreted as zero instead of NA. Here is an example:

my_data <- data.frame(Month = c("1995-01-01", "1995-01-01", "1995-01-01",
                            "1995-02-01", "1995-02-01"),
                  Value_1 = c(1, NA, 2, NA, NA),
                  Value_2 = c(2, 2, 3, NA, 1),
                  Value_3 = c(NA, NA, NA, NA, NA))

#summing through dplyr
my_data %>%
  group_by(Month) %>%
  summarise_each(funs(sum(.,na.rm = TRUE)))

#summing through base R
my_vars = c("Value_1", "Value_2", "Value_3")
aggregate(x = my_data[my_vars], by = my_data["Month"], FUN = sum, 
na.rm = TRUE) 

For Value_3 in both months, for instance, I get that the sum is zero instead of NA. Any advice for how to sum NAs to get NA instead of zero would be greatly appreciated.

You can add an if/else to return NA if all the values in the variable is NA:

my_data %>% 
    group_by(Month) %>% 
    summarise_all(
        funs(if(all(is.na(.))) NA else sum(., na.rm = TRUE))
    )

# A tibble: 2 x 4
#       Month Value_1 Value_2 Value_3
#      <fctr>   <dbl>   <dbl>   <lgl>
#1 1995-01-01       3       7      NA
#2 1995-02-01      NA       1      NA

Base on you own approach, add ifelse

my_data %>%
    group_by(Month) %>%
    summarise_each(funs(ifelse(sum(is.na(.))==length(.),NA,sum(.,na.rm = TRUE))))

We can also do this using

library(data.table)
setDT(my_data)[, lapply(.SD, function(x)  sum(x, na.rm = TRUE) *NA^(all(is.na(x)))), Month]
#       Month Value_1 Value_2 Value_3
#1: 1995-01-01       3       7      NA
#2: 1995-02-01      NA       1      NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM