Using rlang and purrr to create new columns based on subset of existing columns

Question

I have a dataframe that covers multiple years like this:

library(dplyr)

df <- tibble(good_2018 = 0,
             bad_2018 = 1,
             id_2018 = 0,
             good_2019 = 3,
             bad_2019 = 1,
             id_2019 = 1)

I want to derive new columns based on the data for each year t (eg, 2018 and 2019). If the id variable for year t does not equal 0, then the outcome should be the percentage identified as good for year t . The resulting dataset should look like this:

df %>% 
  mutate(pct_good_2018 = if_else(id_2018 == 0, 0,
                                 100*good_2018/(good_2018 + bad_2018)),
         pct_good_2019 = if_else(id_2019 == 0, 0,
                                 100*good_2019/(good_2019 + bad_2019)))
#> # A tibble: 1 × 8
#>   good_2018 bad_2018 id_2018 good_2019 bad_2019 id_2019 pct_good_2018 pct_good…¹
#>       <dbl>    <dbl>   <dbl>     <dbl>    <dbl>   <dbl>         <dbl>      <dbl>
#> 1         0        1       0         3        1       1             0         75
#> # … with abbreviated variable name ¹pct_good_2019

Instead of generating the pct_good columns for each year individually, would like to use the purrr package, but I cannot figure out how to do it. I believe it requires rlang , but the various configurations of != and {{}} that I try yield errors that I do not understand.

Answer 1

We can use glue to create dynamic column names to use in a custom-function:

library(purrr)
library(glue)
pct_good <-function(df, year) {
    if_else(pull(df, glue('id_{year}')) == 0,
            0,
            100 * pull(df, glue('good_{year}')) / (pull(df, glue('good_{year}')) + pull(df, glue('bad_{year}'))))
}

Then we can use purrr:map_dfc to create a dataframe column for every iteration:

df %>%
    mutate(map_dfc(c(2018, 2019), ~pct_good(df, .x))

# A tibble: 1 × 8
  good_2018 bad_2018 id_2018 good_2019 bad_2019 id_2019  ...1  ...2
      <dbl>    <dbl>   <dbl>     <dbl>    <dbl>   <dbl> <dbl> <dbl>
1         0        1       0         3        1       1     0    75

Answer 2

You can try this approach using data.table

get the library, set df to data.table, and make a vector of yrs

library(data.table)
setDT(df)
yrs = c("2018","2019")

make a function that returns 0, or the percentage

f <- function(d) fifelse(d[3]==0,0,d[1]*100/(d[1]+d[2]))

apply the function to each of the years, by row.

df[, (paste0("pct_good_",yrs)):=lapply(yrs, \(y) {.SD[,f(t(.SD)),.SDcols = patterns(paste0("_",y,"$"))]}), by=.I]

Output:

   good_2018 bad_2018 id_2018 good_2019 bad_2019 id_2019 pct_good_2018 pct_good_2019
1:         0        1       0         3        1       1             0            75

However, as pointed out the main comments of the OP, you are generally better off with long formatted data.

Using rlang and purrr to create new columns based on subset of existing columns

Question

2 answers

solution1
1 2022-09-24 22:27:16

solution2
0 2022-09-25 02:46:10

Using rlang and purrr to create new columns based on subset of existing columns

Question

2 answers

solution1 1 2022-09-24 22:27:16

solution2 0 2022-09-25 02:46:10

solution1
1 2022-09-24 22:27:16

solution2
0 2022-09-25 02:46:10