引用 dplyr 的 cross() 中的列名

Question

Is it possible to refer to column names in a lambda function inside across() ?是否可以在 cross across()内引用 lambda function 中的列名？

df <- tibble(age = c(12, 45), sex = c('f', 'f'))
allowed_values <- list(age = 18:100, sex = c("f", "m"))

df %>%
  mutate(across(c(age, sex),
                c(valid = ~ .x %in% allowed_values[[COLNAME]])))

I just came across this question where OP asks about validating columns in a dataframe based on a list of allowed values.我刚刚遇到这个问题，其中 OP 根据允许值列表询问有关验证 dataframe 中的列的问题。

dplyr just gained across() and it seems like a natural choice, but we need columns names to look up the allowed values. dplyr刚刚获得了cross across() ，这似乎是一个自然的选择，但我们需要列名来查找允许的值。

The best I could come up with was a call to imap_dfr , but it is more cumbersome to integrate into an anlysis pipeline, because the results need to be re-combined with the original dataframe.我能想到的最好的方法是调用imap_dfr ，但集成到分析管道中更加麻烦，因为需要将结果与原始 dataframe 重新组合。

Answer 1

The answer is yes, you can refer to column names in dplyr 's across .答案是肯定的，您可以参考dplyr中across列名。 You need to use cur_column() .您需要使用cur_column() 。 Your original answer was so close!你原来的答案是如此接近！ Insert cur_column() into your solution where you want the column name: cur_column()插入到您想要列名的解决方案中：

library(tidyverse)

df <- tibble(age = c(12, 45), sex = c('f', 'f'))
allowed_values <- list(age = 18:100, sex = c("f", "m"))

df %>%
  mutate(across(c(age, sex),
                c(valid = ~ .x %in% allowed_values[[cur_column()]])
                )
         )

Reference: https://dplyr.tidyverse.org/articles/colwise.html#current-column参考： https://dplyr.tidyverse.org/articles/colwise.html#current-column

Answer 2

I think that you may be asking too much of across at this point (but this may spur additional development, so maybe someday it will work the way you suggest).我认为您此时可能要求过多（但这可能会刺激额外across发展，所以也许有一天它会按照您的建议方式工作）。

I think that the imap functions from the purrr package may give you what you want at this point:我认为来自 purrr package 的imap函数可能会给你你想要的东西：

> df <- tibble(age = c(12, 45), sex = c('f', 'f'))
> allowed_values <- list(age = 18:100, sex = c("f", "m"))
> 
> df %>% imap( ~ .x %in% allowed_values[[.y]])
$age
[1] FALSE  TRUE

$sex
[1] TRUE TRUE

> df %>% imap_dfc( ~ .x %in% allowed_values[[.y]])
# A tibble: 2 x 2
  age   sex  
  <lgl> <lgl>
1 FALSE TRUE 
2 TRUE  TRUE

If you want a single column with the combined validity then you can pass the result through reduce :如果您想要一个具有组合有效性的列，那么您可以通过reduce传递结果：

> df %>% imap( ~ .x %in% allowed_values[[.y]]) %>%
+   reduce(`&`)
[1] FALSE  TRUE

This could then be added as a new column to the original data, or just used for subsetting the data.然后可以将其作为新列添加到原始数据中，或者仅用于对数据进行子集化。 I am not expert enough with the tidyverse yet to know if this could be combined with mutate to add the columns directly.我对 tidyverse 还不够熟练，还不知道这是否可以与mutate结合以直接添加列。

引用 dplyr 的 cross() 中的列名

问题描述

2 个解决方案

解决方案1
6 2020-12-10 11:51:22

解决方案2
2 已采纳 2020-06-02 18:14:35

引用 dplyr 的 cross() 中的列名

问题描述

2 个解决方案

解决方案1 6 2020-12-10 11:51:22

解决方案2 2 已采纳 2020-06-02 18:14:35

解决方案1
6 2020-12-10 11:51:22

解决方案2
2 已采纳 2020-06-02 18:14:35