简体   繁体   English

dplyr::mutate_if - 使用创建的变量来构建新的变量

[英]dplyr::mutate_if - Using created variables to build new ones

I'm using mutate_if to modify columns of some dataframes in my workspace.我正在使用mutate_if修改工作区中某些数据框的列。 When using only mutate I can create variables based on pre-created ones, eg仅使用mutate时,我可以根据预先创建的变量创建变量,例如

x %>% 
mutate(new = column_a * 2,
       new_2 = new * 2)

But this approach doesn't work with mutate_if so I have to make some kind of 'recursive method' creating each variable from the 'beginning' eg但是这种方法不适用于mutate_if所以我必须制作某种“递归方法”,从“开始”创建每个变量,例如

mutate_if(!str_detect(names(.), 'date|PIB|Deflator|[$]'), 
          .funs = list(Real =     ~ . / Deflator, 
                       Real_YoY = ~ (((. / Deflator) / lag((. / Deflator), 12))-1) * 100)) 

Which the desired output is like:所需的 output 是这样的:

mutate_if(!str_detect(names(.), 'date|PIB|Deflator|[$]'), 
          .funs = list(Real =     ~ . / Deflator, 
                       Real_YoY = ~ ((Real / lag(Real, 12))-1) * 100))

Is there some way to organize the code to get close this?有没有办法组织代码来接近这个? Thank you!谢谢!

Reproducible example:可重现的例子:

 x <- data.frame(x = seq(1,10),
                 x1 = seq(21,30),
                 y = seq(10,19))
 
 x %>% mutate_if(str_detect(colnames(.), 'x'), 
                 .funs = list(new = ~ (. * 2),
                              new2 = ~ (. * 2) * 4)) # where (. * 2) could make reference to the variable 'new'

Instead of a list , return a tibble which can also get the previous column value from its name and then unnest the tibble columns代替list ,返回一个tibble ,它也可以从其名称中获取上一个列值,然后取消tibble unnest

library(dplyr)
library(tidyr)
x %>% 
 mutate(across(starts_with('x'), 
                  ~ tibble(`1` =  (.x * 2),
                              `2` = `1` * 4), .names = "{.col}_new")) %>% 
  unnest(where(is.tibble), names_sep = "")

-output -输出

# A tibble: 10 × 7
       x    x1     y x_new1 x_new2 x1_new1 x1_new2
   <int> <int> <int>  <dbl>  <dbl>   <dbl>   <dbl>
 1     1    21    10      2      8      42     168
 2     2    22    11      4     16      44     176
 3     3    23    12      6     24      46     184
 4     4    24    13      8     32      48     192
 5     5    25    14     10     40      50     200
 6     6    26    15     12     48      52     208
 7     7    27    16     14     56      54     216
 8     8    28    17     16     64      56     224
 9     9    29    18     18     72      58     232
10    10    30    19     20     80      60     240

Or could also use mutate after converting to tibble或者也可以在转换为tibble后使用mutate

x %>%
   transmute(across(starts_with('x'), ~ tibble(new1  = .x *2) %>% 
        mutate(new2 = new1 *4))) %>%
    unnest(where(is_tibble), names_sep = "_") %>% 
    bind_cols(x, .)

-output -输出

    x x1  y x_new1 x_new2 x1_new1 x1_new2
1   1 21 10      2      8      42     168
2   2 22 11      4     16      44     176
3   3 23 12      6     24      46     184
4   4 24 13      8     32      48     192
5   5 25 14     10     40      50     200
6   6 26 15     12     48      52     208
7   7 27 16     14     56      54     216
8   8 28 17     16     64      56     224
9   9 29 18     18     72      58     232
10 10 30 19     20     80      60     240

Or block the multiple statements within {}或阻止{}中的多个语句

x %>%
   mutate(across(starts_with('x'), ~ 
      {
     new <- .x * 2
     new2 <- new * 4
     tibble(new, new2)}, .names = "{.col}_")) %>% 
   unnest(where(is_tibble), names_sep = "")
# A tibble: 10 × 7
       x    x1     y x_new x_new2 x1_new x1_new2
   <int> <int> <int> <dbl>  <dbl>  <dbl>   <dbl>
 1     1    21    10     2      8     42     168
 2     2    22    11     4     16     44     176
 3     3    23    12     6     24     46     184
 4     4    24    13     8     32     48     192
 5     5    25    14    10     40     50     200
 6     6    26    15    12     48     52     208
 7     7    27    16    14     56     54     216
 8     8    28    17    16     64     56     224
 9     9    29    18    18     72     58     232
10    10    30    19    20     80     60     240

You need to do this in two mutate calls.您需要在两个 mutate 调用中执行此操作。 With across it is not aware of the new columns. across它不知道新列。 For example, even if you try to use a specific column you know will be created, this will cause an error:例如,即使您尝试使用您知道将创建的特定列,这也会导致错误:

x %>% 
  mutate(across(
    .cols = contains('x'),
    .fns = list(
      new = ~(.x*2),
      new2 = x_new
    )
  ))
#> Error in `mutate()`:
#> ! Problem while computing `..1 = across(.cols = contains("x"), .fns =
#>   list(new = ~(.x * 2), new2 = x_new))`.
#> Caused by error:
#> ! object 'x_new' not found

The second issue is that you need to make sure it's calling the appropriate *_new column.第二个问题是您需要确保它调用了适当的*_new列。 This can be done by accessing the cur_column() to create a symbol which to evaluate in the context of the data.frame.这可以通过访问cur_column()来创建一个符号来完成,该符号在 data.frame 的上下文中进行评估。

x %>% 
  mutate(across(
    .cols = contains('x'),
    .fns = list(
      new = ~(.x*2)
    )
  )) %>%
  mutate(across(
    .cols = matches("x[[:digit:]]?$"),
    .fns = list(
      new2 = ~eval(as.symbol(paste0(cur_column(), "_new"))) * 4
    )
  ))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM