简体   繁体   中英

Replicating a grouped function across multiple variables to generate many new variables

I have a large data frame with 10's of variables and each variable has been assigned a group. Below is an example data frame.

test <- data.frame(1:10)
test$ID <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
test$Zone1 <- c(1,1,1,2,3,2,5,6,4,1)
test$Zone2 <- c(1,2,1,2,2,2,4,8,6,1)
test$Zone3 <- c(1,1,1,2,2,2,3,3,3,1)
test$Zone1_group<- c(1,1,1,2,2,2,3,3,3,4)
test$Zone2_group<- c(1,1,1,2,2,2,3,3,3,4)
test$Zone3_group<- c(1,1,1,2,2,2,3,3,3,4)

I would like to determine if a group for a given variable has any variance. If a group doesn't have any variance I would to replace its value with NA. Below is the desired output I was able to achieve for one variable (if I exclude Zone1_group ==4) in dplyr using the following:

test2 <- test %>% group_by(Zone1_group) %>% summarise(Zone1_variance = SD(Zone1)) 
test3 <- left_join(test, test2, by = "Zone1_group")
test3 %>% mutate(Zone1_new = if_else(Zone1_variance == 0, NA_real_, Zone1))

  X1.9 ID Zone1 Zone2 Zone3 Zone1_group Zone2_group Zone3_group Zone1_variance Zone1_new
1    1  A     1     1     1           1           1           1      0.0000000        NA
2    2  B     1     2     1           1           1           1      0.0000000        NA
3    3  C     1     1     1           1           1           1      0.0000000        NA
4    4  D     2     2     2           2           2           2      0.5773503         2
5    5  E     3     2     2           2           2           2      0.5773503         3
6    6  F     2     2     2           2           2           2      0.5773503         2
7    7  G     5     4     3           3           3           3      1.0000000         5
8    8  H     6     8     3           3           3           3      1.0000000         6
9    9  I     4     6     3           3           3           3      1.0000000         4

As I need to replicate this process (and other similar processes) for 10's of variables I was wondering if there is a way I can do this more elegantly than having to copy and paste and update for each variable name?

Here's one way to do this:

library(dplyr)
library(purrr)
library(rlang)

add_new_var_cols <- function(data, col) {
    group_col <- paste0(col, '_group')
    col1 <- sym(col)

    data %>% 
     group_by(!!sym(group_col)) %>% 
     transmute(!!paste0(col, '_new') := if(length(!!col1) > 1 && 
                                    sd(!!col1) != 0) !!col1 else NA_real_) %>%
     ungroup %>%
     select(-group_col)
}

Now apply this function to every 'Zone' columns:

cols <- paste0('Zone', 1:3)
bind_cols(test, map_dfc(cols, add_new_var_cols, data = test))


#  X1.9 ID Zone1 Zone2 Zone3 Zone1_group Zone2_group Zone3_group Zone1_new Zone2_new Zone3_new
#1    1  A     1     1     1           1           1           1        NA         1        NA
#2    2  B     1     2     1           1           1           1        NA         2        NA
#3    3  C     1     1     1           1           1           1        NA         1        NA
#4    4  D     2     2     2           2           2           2         2        NA        NA
#5    5  E     3     2     2           2           2           2         3        NA        NA
#6    6  F     2     2     2           2           2           2         2        NA        NA
#7    7  G     5     4     3           3           3           3         5         4        NA
#8    8  H     6     8     3           3           3           3         6         8        NA
#9    9  I     4     6     3           3           3           3         4         6        NA

We pass character variables in cols , using sym and !! we evaluate them as column values to use it in the function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM