[英]Using dplyr quosure custom function with mutate_at
I am trying to build a helper function that extract the digits in the column given in argument.我正在尝试构建一个帮助器 function 来提取参数中给出的列中的数字。 I'm able to use my function inside
mutate
(and repeat it for all columns of interest), but it doesn't seems to work inside mutate_at
.我可以在
mutate
中使用我的 function (并对所有感兴趣的列重复它),但它似乎在mutate_at
中不起作用。
Here is an example of what my data looks like:这是我的数据的示例:
> set.seed(20190928)
> evalYr <- 2018
> n <- 5
> (df <- data.frame(
+ AY = sample(2016:2019, n, replace = T),
+ Pay00 = rgamma(n, 2, 1/1000),
+ Pay01 = rgamma(n, 2, 1/1000),
+ Pay02 = rgamma(n, 2, 1/1000),
+ Pay03 = rgamma(n, 2, 1/1000)
+ ))
AY Pay00 Pay01 Pay02 Pay03
1 2018 2520.3772 2338.9490 919.8245 629.1657
2 2016 259.7804 1543.4450 661.6488 2382.7916
3 2018 2446.3075 312.5143 2297.9717 942.5627
4 2017 1386.6288 4179.0352 2370.2669 1846.5838
5 2018 541.8261 2104.4589 2622.1758 2606.0694
So I've build (using dplyr
syntax) this helper to mutate on every PayXX
column I have:所以我构建了(使用
dplyr
语法)这个助手来改变我拥有的每个PayXX
列:
# Helper function to get the number inside column `PayXX` name
f1 <- function(pmt) enquo(pmt) %>% quo_name() %>% str_extract('(\\d)+') %>% as.numeric()
This function is working fine with dplyr::mutate
:这个 function 与
dplyr::mutate
工作正常:
> df %>% mutate(Pay00_numcol = f1(Pay00),
+ Pay01_numcol = f1(Pay01),
+ Pay02_numcol = f1(Pay02),
+ Pay03_numcol = f1(Pay03))
AY Pay00 Pay01 Pay02 Pay03 Pay00_numcol Pay01_numcol Pay02_numcol Pay03_numcol
1 2018 2520.3772 2338.9490 919.8245 629.1657 0 1 2 3
2 2016 259.7804 1543.4450 661.6488 2382.7916 0 1 2 3
3 2018 2446.3075 312.5143 2297.9717 942.5627 0 1 2 3
4 2017 1386.6288 4179.0352 2370.2669 1846.5838 0 1 2 3
5 2018 541.8261 2104.4589 2622.1758 2606.0694 0 1 2 3
But when I try to use the same function inside mutate_at
, it returns NA's:但是当我尝试在 mutate_at 中使用相同的
mutate_at
,它会返回 NA:
> df %>% mutate_at(vars(starts_with('Pay')), list(numcol = ~f1(.)))
AY Pay00 Pay01 Pay02 Pay03 Pay00_numcol Pay01_numcol Pay02_numcol Pay03_numcol
1 2018 2520.3772 2338.9490 919.8245 629.1657 NA NA NA NA
2 2016 259.7804 1543.4450 661.6488 2382.7916 NA NA NA NA
3 2018 2446.3075 312.5143 2297.9717 942.5627 NA NA NA NA
4 2017 1386.6288 4179.0352 2370.2669 1846.5838 NA NA NA NA
5 2018 541.8261 2104.4589 2622.1758 2606.0694 NA NA NA NA
Anyone ever had a similar problem?有人遇到过类似的问题吗? How do I deal with the
mutate_at
function in this case?在这种情况下,我该如何处理
mutate_at
function?
Thanks,谢谢,
library(tidyverse)
library(stringr)
set.seed(20190928)
evalYr <- 2018
n <- 5
(df <- data.frame(
AY = sample(2016:2019, n, replace = T),
Pay00 = rgamma(n, 2, 1/1000),
Pay01 = rgamma(n, 2, 1/1000),
Pay02 = rgamma(n, 2, 1/1000),
Pay03 = rgamma(n, 2, 1/1000)
))
# Helper function to get the number inside column `PayXX` name
f1 <- function(pmt) enquo(pmt) %>% quo_name() %>% str_extract('(\\d)+') %>% as.numeric()
# Working
df %>% mutate(Pay00_numcol = f1(Pay00),
Pay01_numcol = f1(Pay01),
Pay02_numcol = f1(Pay02),
Pay03_numcol = f1(Pay03))
# Not working
df %>% mutate_at(vars(starts_with('Pay')), list(numcol = ~f1(.)))
The first way I thought of is that this might be easier with reshaping the data.我想到的第一种方法是重塑数据可能更容易。 However, it still takes a tangle of
tidyr
functions to get 1) a column of "Pay00", "Pay01", etc;但是,仍然需要
tidyr
函数来获得 1)“Pay00”、“Pay01”等列; 2) extract the numbers; 2)提取数字; 3) manipulate so you can use
tidyr::spread
to get back to wide-shaped; 3) 进行操作,以便您可以使用
tidyr::spread
恢复宽形; and 4) spread and remove the "_value" bit I tacked on.和 4) 传播并删除我添加的“_value”位。
I believe there's a nicer way to do this with the recent version of tidyr
, since the new pivot_wider
function should be able to take more than one column as value
.我相信最近版本的
tidyr
有更好的方法来做到这一点,因为新的pivot_wider
function 应该能够将多个列作为value
。 I haven't messed with this at all, but maybe someone else can write that up.我根本没有搞砸这个,但也许其他人可以写出来。
library(tidyverse)
df %>%
rowid_to_column() %>%
gather(key, value, -AY, -rowid) %>%
mutate(numcol = as.numeric(str_extract(key, "\\d+$"))) %>%
gather(key = coltype, value, value, numcol) %>%
unite(key, key, coltype) %>%
spread(key, value) %>%
select(AY, ends_with("value"), ends_with("numcol")) %>%
rename_all(str_remove, "_value")
#> AY Pay00 Pay01 Pay02 Pay03 Pay00_numcol Pay01_numcol
#> 1 2018 2520.3772 2338.9490 919.8245 629.1657 0 1
#> 2 2016 259.7804 1543.4450 661.6488 2382.7916 0 1
#> 3 2018 2446.3075 312.5143 2297.9717 942.5627 0 1
#> 4 2017 1386.6288 4179.0352 2370.2669 1846.5838 0 1
#> 5 2018 541.8261 2104.4589 2622.1758 2606.0694 0 1
#> Pay02_numcol Pay03_numcol
#> 1 2 3
#> 2 2 3
#> 3 2 3
#> 4 2 3
#> 5 2 3
Or, if you want to stick with the tidyeval approach: get the names of the columns-as-quosures you're calling your function on.或者,如果您想坚持使用 tidyeval 方法:获取您正在调用 function 的列的名称。 Just be careful that if you use
list(numcol = ~f1(.))
notation, all of those quosures will just come up as .
请注意,如果您使用
list(numcol = ~f1(.))
表示法,所有这些 quosures 都会出现为.
f1 <- function(pmt) {
str_extract(rlang::as_name(enquo(pmt)), "\\d+$") %>%
as.numeric()
}
df %>%
mutate_at(vars(starts_with("Pay")), list(numcol = f1))
# same output as prev
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.