简体   繁体   中英

Using rlang and purrr to create new columns based on subset of existing columns

I have a dataframe that covers multiple years like this:

library(dplyr)

df <- tibble(good_2018 = 0,
             bad_2018 = 1,
             id_2018 = 0,
             good_2019 = 3,
             bad_2019 = 1,
             id_2019 = 1)

I want to derive new columns based on the data for each year t (eg, 2018 and 2019). If the id variable for year t does not equal 0, then the outcome should be the percentage identified as good for year t . The resulting dataset should look like this:

df %>% 
  mutate(pct_good_2018 = if_else(id_2018 == 0, 0,
                                 100*good_2018/(good_2018 + bad_2018)),
         pct_good_2019 = if_else(id_2019 == 0, 0,
                                 100*good_2019/(good_2019 + bad_2019)))
#> # A tibble: 1 × 8
#>   good_2018 bad_2018 id_2018 good_2019 bad_2019 id_2019 pct_good_2018 pct_good…¹
#>       <dbl>    <dbl>   <dbl>     <dbl>    <dbl>   <dbl>         <dbl>      <dbl>
#> 1         0        1       0         3        1       1             0         75
#> # … with abbreviated variable name ¹​pct_good_2019

Instead of generating the pct_good columns for each year individually, would like to use the purrr package, but I cannot figure out how to do it. I believe it requires rlang , but the various configurations of != and {{}} that I try yield errors that I do not understand.

We can use glue to create dynamic column names to use in a custom-function:

library(purrr)
library(glue)
pct_good <-function(df, year) {
    if_else(pull(df, glue('id_{year}')) == 0,
            0,
            100 * pull(df, glue('good_{year}')) / (pull(df, glue('good_{year}')) + pull(df, glue('bad_{year}'))))
}

Then we can use purrr:map_dfc to create a dataframe column for every iteration:

df %>%
    mutate(map_dfc(c(2018, 2019), ~pct_good(df, .x))

# A tibble: 1 × 8
  good_2018 bad_2018 id_2018 good_2019 bad_2019 id_2019  ...1  ...2
      <dbl>    <dbl>   <dbl>     <dbl>    <dbl>   <dbl> <dbl> <dbl>
1         0        1       0         3        1       1     0    75

You can try this approach using data.table

  1. get the library, set df to data.table, and make a vector of yrs
library(data.table)
setDT(df)
yrs = c("2018","2019")
  1. make a function that returns 0, or the percentage
f <- function(d) fifelse(d[3]==0,0,d[1]*100/(d[1]+d[2]))
  1. apply the function to each of the years, by row.
df[, (paste0("pct_good_",yrs)):=lapply(yrs, \(y) {.SD[,f(t(.SD)),.SDcols = patterns(paste0("_",y,"$"))]}), by=.I]

Output:

   good_2018 bad_2018 id_2018 good_2019 bad_2019 id_2019 pct_good_2018 pct_good_2019
1:         0        1       0         3        1       1             0            75

However, as pointed out the main comments of the OP, you are generally better off with long formatted data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM