简体   繁体   English

映射列并应用自定义函数

[英]map over columns and apply custom function

Missing something small here and struggling to pass columns to function.这里缺少一些小东西,并且努力将列传递给函数。 I just want to map (or lapply ) over columns and perform a custom function on each of the columns.我只想在列上map (或lapply )并对每一列执行自定义函数。 Minimal example here:这里的最小例子:

library(tidyverse)
set.seed(10)
df <- data.frame(id = c(1,1,1,2,3,3,3,3),
                    r_r1 = sample(c(0,1), 8, replace =  T),
                    r_r2 = sample(c(0,1), 8, replace =  T),
                    r_r3 = sample(c(0,1), 8, replace =  T))
df
#   id r_r1 r_r2 r_r3
# 1  1    0    0    1
# 2  1    0    0    1
# 3  1    1    0    1
# 4  2    1    1    0
# 5  3    1    0    0
# 6  3    0    0    1
# 7  3    1    1    1
# 8  3    1    0    0

a function just to filter and counts unique ids remaining in the dataset:一个用于过滤和计算数据集中剩余的唯一 id 的函数:

cnt_un <-  function(var) {
  df %>% 
    filter({{var}} == 1) %>% 
    group_by({{var}}) %>% 
    summarise(n_uniq = n_distinct(id)) %>% 
    ungroup()
}

it works outside of map它在地图之外工作

cnt_un(r_r1)
# A tibble: 1 x 2
   r_r1 n_uniq
  <dbl>  <int>
1     1      3

I want to apply the function over all r_r columns to get something like:我想在所有r_r列上应用该函数以获得类似的结果:

df2
#      y n_uniq
# 1 r_r1      3
# 2 r_r2      2
# 3 r_r3      2

I thought the following would work but doesnt我认为以下会起作用但没有

map(dplyr::select(df, matches("r_r")), ~ cnt_un(.x))

any suggestions?有什么建议? thanks谢谢

I'm not sure if there's a direct tidyeval way to do this with something like map .我不确定是否有直接的 tidyeval 方法来使用map东西来做到这一点。 The issue you're running into is that in calling map(df, *whatever_function*) , the function is being called on each column of df as a vector, whereas your function expects a bare column name in the tidyeval style.您遇到的问题是,在调用map(df, *whatever_function*) ,函数在df每一列上作为向量被调用,而您的函数需要 tidyeval 样式的裸列名称。 To verify that:要验证:

map(df, class)

will return "numeric" for each column.将为每一列返回"numeric"

An alternative is to iterate over column names as strings, and convert those to symbols;另一种方法是将列名作为字符串进行迭代,然后将它们转换为符号; this takes just one additional line in the function.这在函数中只需要额外的一行。

library(dplyr)
library(tidyr)
library(purrr)

cnt_un_name <- function(varname) {
  var <- ensym(varname)
  df %>% 
    filter({{var}} == 1) %>% 
    group_by({{var}}) %>% 
    summarise(n_uniq = n_distinct(id)) %>% 
    ungroup()
}

Calling the function is a little awkward because it keeps only the relevant column names (calling on "r_r1" gets columns "r_r1" and "n_uniq" , etc).调用该函数有点尴尬,因为它只保留相关的列名(调用"r_r1"获取列"r_r1""n_uniq"等)。 One way is to get the vector of column names you want, name it so you can add an ID column in map_dfr , and drop the extra columns, since they'll be mostly NA .一种方法是获取您想要的列名称向量,命名它以便您可以在map_dfr添加一个 ID 列,并删除额外的列,因为它们主要是NA

grep("^r_r\\d+", names(df), value = TRUE) %>%
  set_names() %>%
  map_dfr(cnt_un_name, .id = "y") %>%
  select(y, n_uniq)
#> # A tibble: 3 x 2
#>   y     n_uniq
#>   <chr>  <int>
#> 1 r_r1       3
#> 2 r_r2       2
#> 3 r_r3       2

A better way is to call the function, then bind after reshaping.更好的方法是调用该函数,然后在整形后绑定。

grep("^r_r\\d+", names(df), value = TRUE) %>%
  map(cnt_un_name) %>%
  map_dfr(pivot_longer, 1, names_to = "y") %>%
  select(y, n_uniq)
# same output as above

Alternatively (and maybe better/more scaleable) would be to do the column renaming inside the function definition.或者(也许更好/更可扩展)是在函数定义中重命名列。

Here's a base R solution that uses lapply .这是一个使用lapply的基本 R 解决方案。 The tricky bit is that your function isn't actually running on single columns;棘手的一点是您的函数实际上并未在单列上运行; it's using id , too, so you can't use canned functions that iterate column-wise.它也使用id ,因此您不能使用按列迭代的固定函数。

do.call(rbind, lapply(grep("r_r", colnames(df), value = TRUE), function(i) {

  X <- subset(df, df[,i] == 1)

  row <- data.frame(y = i, n_uniq = length(unique(X$id)), stringsAsFactors = FALSE)

}))

     y n_uniq
1 r_r1      2
2 r_r2      3
3 r_r3      2

Here is another solution.这是另一种解决方案。 I changed the syntax of your function.我改变了你的函数的语法。 Now you supply the pattern of the columns you want to select.现在您提供要选择的列的模式。

cnt_un <-  function(var_pattern) {
  df %>%
    pivot_longer(cols = contains(var_pattern), values_to = "vals", names_to = "y") %>%
    filter(vals == 1) %>%
    group_by(y) %>%
    summarise(n_uniq = n_distinct(id)) %>% 
    ungroup()
}

cnt_un("r_r")
#> # A tibble: 3 x 2
#>   y     n_uniq
#>   <chr>  <int>
#> 1 r_r1       2
#> 2 r_r2       3
#> 3 r_r3       2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM