使用 dplyr quosure 自定义 function 与 mutate_at

Question

I am trying to build a helper function that extract the digits in the column given in argument.我正在尝试构建一个帮助器 function 来提取参数中给出的列中的数字。 I'm able to use my function inside mutate (and repeat it for all columns of interest), but it doesn't seems to work inside mutate_at .我可以在mutate中使用我的 function （并对所有感兴趣的列重复它），但它似乎在mutate_at中不起作用。

Here is an example of what my data looks like:这是我的数据的示例：

> set.seed(20190928)
> evalYr <- 2018
> n <- 5
> (df <- data.frame(
+     AY = sample(2016:2019, n, replace = T),
+     Pay00 = rgamma(n, 2, 1/1000),
+     Pay01 = rgamma(n, 2, 1/1000),
+     Pay02 = rgamma(n, 2, 1/1000),
+     Pay03 = rgamma(n, 2, 1/1000)
+ ))
    AY     Pay00     Pay01     Pay02     Pay03
1 2018 2520.3772 2338.9490  919.8245  629.1657
2 2016  259.7804 1543.4450  661.6488 2382.7916
3 2018 2446.3075  312.5143 2297.9717  942.5627
4 2017 1386.6288 4179.0352 2370.2669 1846.5838
5 2018  541.8261 2104.4589 2622.1758 2606.0694

So I've build (using dplyr syntax) this helper to mutate on every PayXX column I have:所以我构建了（使用dplyr语法）这个助手来改变我拥有的每个PayXX列：

# Helper function to get the number inside column `PayXX` name
f1 <- function(pmt) enquo(pmt) %>% quo_name() %>% str_extract('(\\d)+') %>% as.numeric()

This function is working fine with dplyr::mutate :这个 function 与dplyr::mutate工作正常：

> df %>% mutate(Pay00_numcol = f1(Pay00),
+               Pay01_numcol = f1(Pay01),
+               Pay02_numcol = f1(Pay02),
+               Pay03_numcol = f1(Pay03))
    AY     Pay00     Pay01     Pay02     Pay03 Pay00_numcol Pay01_numcol Pay02_numcol Pay03_numcol
1 2018 2520.3772 2338.9490  919.8245  629.1657            0            1            2            3
2 2016  259.7804 1543.4450  661.6488 2382.7916            0            1            2            3
3 2018 2446.3075  312.5143 2297.9717  942.5627            0            1            2            3
4 2017 1386.6288 4179.0352 2370.2669 1846.5838            0            1            2            3
5 2018  541.8261 2104.4589 2622.1758 2606.0694            0            1            2            3

But when I try to use the same function inside mutate_at , it returns NA's:但是当我尝试在 mutate_at 中使用相同的mutate_at ，它会返回 NA：

> df %>% mutate_at(vars(starts_with('Pay')), list(numcol = ~f1(.)))
    AY     Pay00     Pay01     Pay02     Pay03 Pay00_numcol Pay01_numcol Pay02_numcol Pay03_numcol
1 2018 2520.3772 2338.9490  919.8245  629.1657           NA           NA           NA           NA
2 2016  259.7804 1543.4450  661.6488 2382.7916           NA           NA           NA           NA
3 2018 2446.3075  312.5143 2297.9717  942.5627           NA           NA           NA           NA
4 2017 1386.6288 4179.0352 2370.2669 1846.5838           NA           NA           NA           NA
5 2018  541.8261 2104.4589 2622.1758 2606.0694           NA           NA           NA           NA

Anyone ever had a similar problem?有人遇到过类似的问题吗？ How do I deal with the mutate_at function in this case?在这种情况下，我该如何处理mutate_at function？

Thanks,谢谢，

Reproductible example可复制的例子

library(tidyverse)
library(stringr)
set.seed(20190928)
evalYr <- 2018
n <- 5
(df <- data.frame(
    AY = sample(2016:2019, n, replace = T),
    Pay00 = rgamma(n, 2, 1/1000),
    Pay01 = rgamma(n, 2, 1/1000),
    Pay02 = rgamma(n, 2, 1/1000),
    Pay03 = rgamma(n, 2, 1/1000)
))

# Helper function to get the number inside column `PayXX` name
f1 <- function(pmt) enquo(pmt) %>% quo_name() %>% str_extract('(\\d)+') %>% as.numeric()

# Working
df %>% mutate(Pay00_numcol = f1(Pay00),
              Pay01_numcol = f1(Pay01),
              Pay02_numcol = f1(Pay02),
              Pay03_numcol = f1(Pay03))

# Not working
df %>% mutate_at(vars(starts_with('Pay')), list(numcol = ~f1(.)))

Answer 1

The first way I thought of is that this might be easier with reshaping the data.我想到的第一种方法是重塑数据可能更容易。 However, it still takes a tangle of tidyr functions to get 1) a column of "Pay00", "Pay01", etc;但是，仍然需要tidyr函数来获得 1）“Pay00”、“Pay01”等列； 2) extract the numbers; 2）提取数字； 3) manipulate so you can use tidyr::spread to get back to wide-shaped; 3) 进行操作，以便您可以使用tidyr::spread恢复宽形； and 4) spread and remove the "_value" bit I tacked on.和 4) 传播并删除我添加的“_value”位。

I believe there's a nicer way to do this with the recent version of tidyr , since the new pivot_wider function should be able to take more than one column as value .我相信最近版本的tidyr有更好的方法来做到这一点，因为新的pivot_wider function 应该能够将多个列作为value 。 I haven't messed with this at all, but maybe someone else can write that up.我根本没有搞砸这个，但也许其他人可以写出来。

library(tidyverse)

df %>%
  rowid_to_column() %>%
  gather(key, value, -AY, -rowid) %>%
  mutate(numcol = as.numeric(str_extract(key, "\\d+$"))) %>%
  gather(key = coltype, value, value, numcol) %>%
  unite(key, key, coltype) %>%
  spread(key, value) %>%
  select(AY, ends_with("value"), ends_with("numcol")) %>%
  rename_all(str_remove, "_value")
#>     AY     Pay00     Pay01     Pay02     Pay03 Pay00_numcol Pay01_numcol
#> 1 2018 2520.3772 2338.9490  919.8245  629.1657            0            1
#> 2 2016  259.7804 1543.4450  661.6488 2382.7916            0            1
#> 3 2018 2446.3075  312.5143 2297.9717  942.5627            0            1
#> 4 2017 1386.6288 4179.0352 2370.2669 1846.5838            0            1
#> 5 2018  541.8261 2104.4589 2622.1758 2606.0694            0            1
#>   Pay02_numcol Pay03_numcol
#> 1            2            3
#> 2            2            3
#> 3            2            3
#> 4            2            3
#> 5            2            3

Or, if you want to stick with the tidyeval approach: get the names of the columns-as-quosures you're calling your function on.或者，如果您想坚持使用 tidyeval 方法：获取您正在调用 function 的列的名称。 Just be careful that if you use list(numcol = ~f1(.)) notation, all of those quosures will just come up as .请注意，如果您使用list(numcol = ~f1(.))表示法，所有这些 quosures 都会出现为.

f1 <- function(pmt) {
  str_extract(rlang::as_name(enquo(pmt)), "\\d+$") %>%
    as.numeric()
}

df %>%
  mutate_at(vars(starts_with("Pay")), list(numcol = f1))
# same output as prev

使用 dplyr quosure 自定义 function 与 mutate_at

问题描述

Reproductible example可复制的例子

1 个解决方案

解决方案1
0 已采纳 2019-09-27 17:59:15

使用 dplyr quosure 自定义 function 与 mutate_at

问题描述

Reproductible example可复制的例子

1 个解决方案

解决方案1 0 已采纳 2019-09-27 17:59:15

解决方案1
0 已采纳 2019-09-27 17:59:15