dplyr::mutate_if - 使用创建的变量来构建新的变量

Question

I'm using mutate_if to modify columns of some dataframes in my workspace.我正在使用mutate_if修改工作区中某些数据框的列。 When using only mutate I can create variables based on pre-created ones, eg仅使用mutate时，我可以根据预先创建的变量创建变量，例如

x %>% 
mutate(new = column_a * 2,
       new_2 = new * 2)

But this approach doesn't work with mutate_if so I have to make some kind of 'recursive method' creating each variable from the 'beginning' eg但是这种方法不适用于mutate_if所以我必须制作某种“递归方法”，从“开始”创建每个变量，例如

mutate_if(!str_detect(names(.), 'date|PIB|Deflator|[$]'), 
          .funs = list(Real =     ~ . / Deflator, 
                       Real_YoY = ~ (((. / Deflator) / lag((. / Deflator), 12))-1) * 100))

Which the desired output is like:所需的 output 是这样的：

mutate_if(!str_detect(names(.), 'date|PIB|Deflator|[$]'), 
          .funs = list(Real =     ~ . / Deflator, 
                       Real_YoY = ~ ((Real / lag(Real, 12))-1) * 100))

Is there some way to organize the code to get close this?有没有办法组织代码来接近这个？ Thank you!谢谢！

Reproducible example:可重现的例子：

 x <- data.frame(x = seq(1,10),
                 x1 = seq(21,30),
                 y = seq(10,19))
 
 x %>% mutate_if(str_detect(colnames(.), 'x'), 
                 .funs = list(new = ~ (. * 2),
                              new2 = ~ (. * 2) * 4)) # where (. * 2) could make reference to the variable 'new'

Answer 1

Instead of a list , return a tibble which can also get the previous column value from its name and then unnest the tibble columns代替list ，返回一个tibble ，它也可以从其名称中获取上一个列值，然后取消tibble unnest

library(dplyr)
library(tidyr)
x %>% 
 mutate(across(starts_with('x'), 
                  ~ tibble(`1` =  (.x * 2),
                              `2` = `1` * 4), .names = "{.col}_new")) %>% 
  unnest(where(is.tibble), names_sep = "")

-output -输出

# A tibble: 10 × 7
       x    x1     y x_new1 x_new2 x1_new1 x1_new2
   <int> <int> <int>  <dbl>  <dbl>   <dbl>   <dbl>
 1     1    21    10      2      8      42     168
 2     2    22    11      4     16      44     176
 3     3    23    12      6     24      46     184
 4     4    24    13      8     32      48     192
 5     5    25    14     10     40      50     200
 6     6    26    15     12     48      52     208
 7     7    27    16     14     56      54     216
 8     8    28    17     16     64      56     224
 9     9    29    18     18     72      58     232
10    10    30    19     20     80      60     240

Or could also use mutate after converting to tibble或者也可以在转换为tibble后使用mutate

x %>%
   transmute(across(starts_with('x'), ~ tibble(new1  = .x *2) %>% 
        mutate(new2 = new1 *4))) %>%
    unnest(where(is_tibble), names_sep = "_") %>% 
    bind_cols(x, .)

-output -输出

    x x1  y x_new1 x_new2 x1_new1 x1_new2
1   1 21 10      2      8      42     168
2   2 22 11      4     16      44     176
3   3 23 12      6     24      46     184
4   4 24 13      8     32      48     192
5   5 25 14     10     40      50     200
6   6 26 15     12     48      52     208
7   7 27 16     14     56      54     216
8   8 28 17     16     64      56     224
9   9 29 18     18     72      58     232
10 10 30 19     20     80      60     240

Or block the multiple statements within {}或阻止{}中的多个语句

x %>%
   mutate(across(starts_with('x'), ~ 
      {
     new <- .x * 2
     new2 <- new * 4
     tibble(new, new2)}, .names = "{.col}_")) %>% 
   unnest(where(is_tibble), names_sep = "")
# A tibble: 10 × 7
       x    x1     y x_new x_new2 x1_new x1_new2
   <int> <int> <int> <dbl>  <dbl>  <dbl>   <dbl>
 1     1    21    10     2      8     42     168
 2     2    22    11     4     16     44     176
 3     3    23    12     6     24     46     184
 4     4    24    13     8     32     48     192
 5     5    25    14    10     40     50     200
 6     6    26    15    12     48     52     208
 7     7    27    16    14     56     54     216
 8     8    28    17    16     64     56     224
 9     9    29    18    18     72     58     232
10    10    30    19    20     80     60     240

Answer 2

You need to do this in two mutate calls.您需要在两个 mutate 调用中执行此操作。 With across it is not aware of the new columns. across它不知道新列。 For example, even if you try to use a specific column you know will be created, this will cause an error:例如，即使您尝试使用您知道将创建的特定列，这也会导致错误：

x %>% 
  mutate(across(
    .cols = contains('x'),
    .fns = list(
      new = ~(.x*2),
      new2 = x_new
    )
  ))
#> Error in `mutate()`:
#> ! Problem while computing `..1 = across(.cols = contains("x"), .fns =
#>   list(new = ~(.x * 2), new2 = x_new))`.
#> Caused by error:
#> ! object 'x_new' not found

The second issue is that you need to make sure it's calling the appropriate *_new column.第二个问题是您需要确保它调用了适当的*_new列。 This can be done by accessing the cur_column() to create a symbol which to evaluate in the context of the data.frame.这可以通过访问cur_column()来创建一个符号来完成，该符号在 data.frame 的上下文中进行评估。

x %>% 
  mutate(across(
    .cols = contains('x'),
    .fns = list(
      new = ~(.x*2)
    )
  )) %>%
  mutate(across(
    .cols = matches("x[[:digit:]]?$"),
    .fns = list(
      new2 = ~eval(as.symbol(paste0(cur_column(), "_new"))) * 4
    )
  ))

dplyr::mutate_if - 使用创建的变量来构建新的变量

问题描述

2 个解决方案

解决方案1
3 2022-08-05 17:42:16

解决方案2
2 2022-08-05 17:00:35

dplyr::mutate_if - 使用创建的变量来构建新的变量

问题描述

2 个解决方案

解决方案1 3 2022-08-05 17:42:16

解决方案2 2 2022-08-05 17:00:35

解决方案1
3 2022-08-05 17:42:16

解决方案2
2 2022-08-05 17:00:35