映射列并应用自定义函数

Question

Missing something small here and struggling to pass columns to function.这里缺少一些小东西，并且努力将列传递给函数。 I just want to map (or lapply ) over columns and perform a custom function on each of the columns.我只想在列上map （或lapply ）并对每一列执行自定义函数。 Minimal example here:这里的最小例子：

library(tidyverse)
set.seed(10)
df <- data.frame(id = c(1,1,1,2,3,3,3,3),
                    r_r1 = sample(c(0,1), 8, replace =  T),
                    r_r2 = sample(c(0,1), 8, replace =  T),
                    r_r3 = sample(c(0,1), 8, replace =  T))
df
#   id r_r1 r_r2 r_r3
# 1  1    0    0    1
# 2  1    0    0    1
# 3  1    1    0    1
# 4  2    1    1    0
# 5  3    1    0    0
# 6  3    0    0    1
# 7  3    1    1    1
# 8  3    1    0    0

a function just to filter and counts unique ids remaining in the dataset:一个用于过滤和计算数据集中剩余的唯一 id 的函数：

cnt_un <-  function(var) {
  df %>% 
    filter({{var}} == 1) %>% 
    group_by({{var}}) %>% 
    summarise(n_uniq = n_distinct(id)) %>% 
    ungroup()
}

it works outside of map它在地图之外工作

cnt_un(r_r1)
# A tibble: 1 x 2
   r_r1 n_uniq
  <dbl>  <int>
1     1      3

I want to apply the function over all r_r columns to get something like:我想在所有r_r列上应用该函数以获得类似的结果：

df2
#      y n_uniq
# 1 r_r1      3
# 2 r_r2      2
# 3 r_r3      2

I thought the following would work but doesnt我认为以下会起作用但没有

map(dplyr::select(df, matches("r_r")), ~ cnt_un(.x))

any suggestions?有什么建议？ thanks谢谢

Answer 1

I'm not sure if there's a direct tidyeval way to do this with something like map .我不确定是否有直接的 tidyeval 方法来使用map东西来做到这一点。 The issue you're running into is that in calling map(df, *whatever_function*) , the function is being called on each column of df as a vector, whereas your function expects a bare column name in the tidyeval style.您遇到的问题是，在调用map(df, *whatever_function*) ，函数在df每一列上作为向量被调用，而您的函数需要 tidyeval 样式的裸列名称。 To verify that:要验证：

map(df, class)

will return "numeric" for each column.将为每一列返回"numeric" 。

An alternative is to iterate over column names as strings, and convert those to symbols;另一种方法是将列名作为字符串进行迭代，然后将它们转换为符号； this takes just one additional line in the function.这在函数中只需要额外的一行。

library(dplyr)
library(tidyr)
library(purrr)

cnt_un_name <- function(varname) {
  var <- ensym(varname)
  df %>% 
    filter({{var}} == 1) %>% 
    group_by({{var}}) %>% 
    summarise(n_uniq = n_distinct(id)) %>% 
    ungroup()
}

Calling the function is a little awkward because it keeps only the relevant column names (calling on "r_r1" gets columns "r_r1" and "n_uniq" , etc).调用该函数有点尴尬，因为它只保留相关的列名（调用"r_r1"获取列"r_r1"和"n_uniq"等）。 One way is to get the vector of column names you want, name it so you can add an ID column in map_dfr , and drop the extra columns, since they'll be mostly NA .一种方法是获取您想要的列名称向量，命名它以便您可以在map_dfr添加一个 ID 列，并删除额外的列，因为它们主要是NA 。

grep("^r_r\\d+", names(df), value = TRUE) %>%
  set_names() %>%
  map_dfr(cnt_un_name, .id = "y") %>%
  select(y, n_uniq)
#> # A tibble: 3 x 2
#>   y     n_uniq
#>   <chr>  <int>
#> 1 r_r1       3
#> 2 r_r2       2
#> 3 r_r3       2

A better way is to call the function, then bind after reshaping.更好的方法是调用该函数，然后在整形后绑定。

grep("^r_r\\d+", names(df), value = TRUE) %>%
  map(cnt_un_name) %>%
  map_dfr(pivot_longer, 1, names_to = "y") %>%
  select(y, n_uniq)
# same output as above

Alternatively (and maybe better/more scaleable) would be to do the column renaming inside the function definition.或者（也许更好/更可扩展）是在函数定义中重命名列。

Answer 2

Here's a base R solution that uses lapply .这是一个使用lapply的基本 R 解决方案。 The tricky bit is that your function isn't actually running on single columns;棘手的一点是您的函数实际上并未在单列上运行； it's using id , too, so you can't use canned functions that iterate column-wise.它也使用id ，因此您不能使用按列迭代的固定函数。

do.call(rbind, lapply(grep("r_r", colnames(df), value = TRUE), function(i) {

  X <- subset(df, df[,i] == 1)

  row <- data.frame(y = i, n_uniq = length(unique(X$id)), stringsAsFactors = FALSE)

}))

     y n_uniq
1 r_r1      2
2 r_r2      3
3 r_r3      2

Answer 3

Here is another solution.这是另一种解决方案。 I changed the syntax of your function.我改变了你的函数的语法。 Now you supply the pattern of the columns you want to select.现在您提供要选择的列的模式。

cnt_un <-  function(var_pattern) {
  df %>%
    pivot_longer(cols = contains(var_pattern), values_to = "vals", names_to = "y") %>%
    filter(vals == 1) %>%
    group_by(y) %>%
    summarise(n_uniq = n_distinct(id)) %>% 
    ungroup()
}

cnt_un("r_r")
#> # A tibble: 3 x 2
#>   y     n_uniq
#>   <chr>  <int>
#> 1 r_r1       2
#> 2 r_r2       3
#> 3 r_r3       2

映射列并应用自定义函数

问题描述

3 个解决方案

解决方案1
3 2020-01-14 14:56:24

解决方案2
2 2020-01-14 13:18:06

解决方案3
1 已采纳 2020-01-14 13:20:28

映射列并应用自定义函数

问题描述

3 个解决方案

解决方案1 3 2020-01-14 14:56:24

解决方案2 2 2020-01-14 13:18:06

解决方案3 1 已采纳 2020-01-14 13:20:28

解决方案1
3 2020-01-14 14:56:24

解决方案2
2 2020-01-14 13:18:06

解决方案3
1 已采纳 2020-01-14 13:20:28