简体   繁体   中英

R rowSums for multiple groups of variables using mutate and for loops by prefix of variable names

I have multiple variables grouped together by prefixes (par___, fri___, gp___ etc) there are 29 of these groups.

Each variable has a value of 0 or 1. What I need to do is sum these groups (ie, partner___1 + partner___2 etc) and if the rowSums = 0, make each of the variables NA.

for example. My data looks like this:

par___ par___2 fri___1 fri___2
0 0 1 1
0 1 0 0
0 0 1 0
0 0 0 0

and I want it to look like this:

par___ par___2 fri___1 fri___2
NA NA 1 1
0 1 NA NA
NA NA 1 0
NA NA NA NA

I can do it individually like this:

  df<- df%>%
    mutate(rowsum = rowSums(.[grep("par___", names(.))])) %>% 
    mutate_at(grep("par___", names(.)), funs(ifelse(rowsum == 0, NA, .))) %>%
    select(-rowsum) 

And I figured I could do something like this:

vars <- c('par___', "fri___','gp___')


for (i in vars) {
  df<- df%>%
    # creates a "rowsum" column storing the sum of columns 1:2 
    mutate(rowsum = rowSums(.[grep(i, names(.))])) %>% 
    # applies, to columns 1:2, a function that puts NA when the sum of the rows is 0
    mutate_at(grep(i, names(.)), funs(ifelse(rowsum == 0, NA, .))) %>%
    select(-rowsum) 
    }

There are no error messages but it doesn't work.

Also, I've tried mutate(across()) instead of mutate_at() and get this error:

Error: Problem with mutate() input ..1 . x Can't convert a list to function i Input ..1 is across(grep(i, names(.)), list(ifelse(rowsum == 0, NA, .))) .

And, I've tried list instead of funs and get this error:

Error in rowsum == 0: comparison (1) is possible only for atomic and list types

Any help would be greatly appreciated!

Thanks heaps.

A tidyverse option will be:

df %>%
  stack() %>%
  group_by(ind) %>%
  group_by(grp = row_number(), grp2 = str_remove(ind, "_.*")) %>%
  mutate(values = values + na_if(all(values==0), 1)) %>%
  pivot_wider(grp, ind, values_from = values)
  
# A tibble: 4 x 5
# Groups:   grp [4]
    grp par___1 par___2 fri___1 fri___2
  <int>   <int>   <int>   <int>   <int>
1     1      NA      NA       1       1
2     2       0       1      NA      NA
3     3      NA      NA       1       0
4     4      NA      NA      NA      NA

If on the other hand, you will prefer base R, then you could do:

d <- ave(unlist(df), row(df), sub("_.*", "", names(df))[col(df)], FUN = function(x) x * NA ^ all(x==0))
array(d, dim(df), dimnames(df))

  par___1 par___2 fri___1 fri___2
1      NA      NA       1       1
2       0       1      NA      NA
3      NA      NA       1       0
4      NA      NA      NA      NA

Take note that the last one is a matrix and you can turn it to a dataframe.

Base R option using split.default :

do.call(cbind, unname(lapply(split.default(df, 
     sub('(\\w+)_.*', '\\1', names(df))), function(x) {
           x[rowSums(x) == 0, ] <- NA
           x
})))

#  fri___1 fri___2 par___ par___2
#1       1       1     NA      NA
#2      NA      NA      0       1
#3       1       0     NA      NA
#4      NA      NA     NA      NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM