[英]Refering to column names inside dplyr's across()
Is it possible to refer to column names in a lambda function inside across()
?是否可以在 cross
across()
内引用 lambda function 中的列名?
df <- tibble(age = c(12, 45), sex = c('f', 'f'))
allowed_values <- list(age = 18:100, sex = c("f", "m"))
df %>%
mutate(across(c(age, sex),
c(valid = ~ .x %in% allowed_values[[COLNAME]])))
I just came across this question where OP asks about validating columns in a dataframe based on a list of allowed values.我刚刚遇到这个问题,其中 OP 根据允许值列表询问有关验证 dataframe 中的列的问题。
dplyr
just gained across()
and it seems like a natural choice, but we need columns names to look up the allowed values. dplyr
刚刚获得了cross across()
,这似乎是一个自然的选择,但我们需要列名来查找允许的值。
The best I could come up with was a call to imap_dfr
, but it is more cumbersome to integrate into an anlysis pipeline, because the results need to be re-combined with the original dataframe.我能想到的最好的方法是调用
imap_dfr
,但集成到分析管道中更加麻烦,因为需要将结果与原始 dataframe 重新组合。
The answer is yes, you can refer to column names in dplyr
's across
.答案是肯定的,您可以参考
dplyr
中across
列名。 You need to use cur_column()
.您需要使用
cur_column()
。 Your original answer was so close!你原来的答案是如此接近! Insert
cur_column()
into your solution where you want the column name: cur_column()
插入到您想要列名的解决方案中:
library(tidyverse)
df <- tibble(age = c(12, 45), sex = c('f', 'f'))
allowed_values <- list(age = 18:100, sex = c("f", "m"))
df %>%
mutate(across(c(age, sex),
c(valid = ~ .x %in% allowed_values[[cur_column()]])
)
)
Reference: https://dplyr.tidyverse.org/articles/colwise.html#current-column参考: https://dplyr.tidyverse.org/articles/colwise.html#current-column
I think that you may be asking too much of across
at this point (but this may spur additional development, so maybe someday it will work the way you suggest).我认为您此时可能要求过多(但这可能会刺激额外
across
发展,所以也许有一天它会按照您的建议方式工作)。
I think that the imap
functions from the purrr package may give you what you want at this point:我认为来自 purrr package 的
imap
函数可能会给你你想要的东西:
> df <- tibble(age = c(12, 45), sex = c('f', 'f'))
> allowed_values <- list(age = 18:100, sex = c("f", "m"))
>
> df %>% imap( ~ .x %in% allowed_values[[.y]])
$age
[1] FALSE TRUE
$sex
[1] TRUE TRUE
> df %>% imap_dfc( ~ .x %in% allowed_values[[.y]])
# A tibble: 2 x 2
age sex
<lgl> <lgl>
1 FALSE TRUE
2 TRUE TRUE
If you want a single column with the combined validity then you can pass the result through reduce
:如果您想要一个具有组合有效性的列,那么您可以通过
reduce
传递结果:
> df %>% imap( ~ .x %in% allowed_values[[.y]]) %>%
+ reduce(`&`)
[1] FALSE TRUE
This could then be added as a new column to the original data, or just used for subsetting the data.然后可以将其作为新列添加到原始数据中,或者仅用于对数据进行子集化。 I am not expert enough with the tidyverse yet to know if this could be combined with
mutate
to add the columns directly.我对 tidyverse 还不够熟练,还不知道这是否可以与
mutate
结合以直接添加列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.