简体   繁体   中英

Sum rows with start_with and don't ignore na when all rows are na in R

I'm working with a huge database and need to sum all columns that start with a specific string. I also need to ignore NA's when at least one of the columns is not NA (for example 1+5+NA = 6); but when all columns are NA, the sum must be NA.

I have done the NA ignore thing successfully with sum_ from package hablar, but when I tried to combine it with start_With, it took endless time and did not seem to work. Here is a reproducible example:

library(dplyr)
set.seed(8)
id <- seq(1,700, by = 1)
a1_x <- runif(700, 0, 10)
a1_y <- runif(700, 0, 10)
a2_x <- runif(700, 0, 10)

df <- data.frame(id, a1_x, a1_y, a2_x)

df <- round(df, digits = -0.1) 
df[df == 0] <- NA

#Try with rowsums. This works but I need to ignore na just when both columns are NA
df <- df %>%
  mutate(a1_sum = rowSums(select(., starts_with('a1'))))

#Tried with sum_ of package 'hablar' but it takes endless time
library(hablar)
df <- df %>%
  mutate(a1_sum = sum_(select(., starts_with('a1'))))

Does anybody knows how to achieve this? Thanks in advance!

Maybe this is also of help. Take a helper function and call it with c_across and rowwise().

my_fun <- function(x){
  if(all(is.na(x))) return(NA) else return(sum(x,na.rm = TRUE))
  }
df %>%
  rowwise() %>%
  mutate(a1_sum = my_fun(c_across(starts_with("a1"))))

We can use two sequential calls to mutate() . In the second call, we can use if_else() to replace values with NA if_all() values are NA in that row. We can also use replace or purrr::modify_if .

df %>% mutate(a1_sum = rowSums(across(starts_with('a1')), na.rm = TRUE),
              a1_sum = if_else(if_all(c(starts_with('a1'), -a1_sum), is.na),
                               NA_real_,
                               a1_sum))

### OR ###

df %>% mutate(a1_sum = rowSums(across(starts_with('a1')), na.rm = TRUE),
              a1_sum = replace(a1_sum, if_all(c(starts_with('a1'), -a1_sum), is.na),
                               NA))

In case it is impossible to have zeroes in the actual data (the example df was hard coded to have no zeroes, not sure if it would happen with actual data), we can also use the simpler mutate %>% na_if() :

df %>% mutate(a1_sum = rowSums(across(starts_with('a1')), na.rm = TRUE) %>%
                  na_if(0))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM