[英]map over columns and apply custom function
Missing something small here and struggling to pass columns to function.这里缺少一些小东西,并且努力将列传递给函数。 I just want to
map
(or lapply
) over columns and perform a custom function on each of the columns.我只想在列上
map
(或lapply
)并对每一列执行自定义函数。 Minimal example here:这里的最小例子:
library(tidyverse)
set.seed(10)
df <- data.frame(id = c(1,1,1,2,3,3,3,3),
r_r1 = sample(c(0,1), 8, replace = T),
r_r2 = sample(c(0,1), 8, replace = T),
r_r3 = sample(c(0,1), 8, replace = T))
df
# id r_r1 r_r2 r_r3
# 1 1 0 0 1
# 2 1 0 0 1
# 3 1 1 0 1
# 4 2 1 1 0
# 5 3 1 0 0
# 6 3 0 0 1
# 7 3 1 1 1
# 8 3 1 0 0
a function just to filter and counts unique ids remaining in the dataset:一个用于过滤和计算数据集中剩余的唯一 id 的函数:
cnt_un <- function(var) {
df %>%
filter({{var}} == 1) %>%
group_by({{var}}) %>%
summarise(n_uniq = n_distinct(id)) %>%
ungroup()
}
it works outside of map它在地图之外工作
cnt_un(r_r1)
# A tibble: 1 x 2
r_r1 n_uniq
<dbl> <int>
1 1 3
I want to apply the function over all r_r
columns to get something like:我想在所有
r_r
列上应用该函数以获得类似的结果:
df2
# y n_uniq
# 1 r_r1 3
# 2 r_r2 2
# 3 r_r3 2
I thought the following would work but doesnt我认为以下会起作用但没有
map(dplyr::select(df, matches("r_r")), ~ cnt_un(.x))
any suggestions?有什么建议? thanks
谢谢
I'm not sure if there's a direct tidyeval way to do this with something like map
.我不确定是否有直接的 tidyeval 方法来使用
map
东西来做到这一点。 The issue you're running into is that in calling map(df, *whatever_function*)
, the function is being called on each column of df
as a vector, whereas your function expects a bare column name in the tidyeval style.您遇到的问题是,在调用
map(df, *whatever_function*)
,函数在df
每一列上作为向量被调用,而您的函数需要 tidyeval 样式的裸列名称。 To verify that:要验证:
map(df, class)
will return "numeric"
for each column.将为每一列返回
"numeric"
。
An alternative is to iterate over column names as strings, and convert those to symbols;另一种方法是将列名作为字符串进行迭代,然后将它们转换为符号; this takes just one additional line in the function.
这在函数中只需要额外的一行。
library(dplyr)
library(tidyr)
library(purrr)
cnt_un_name <- function(varname) {
var <- ensym(varname)
df %>%
filter({{var}} == 1) %>%
group_by({{var}}) %>%
summarise(n_uniq = n_distinct(id)) %>%
ungroup()
}
Calling the function is a little awkward because it keeps only the relevant column names (calling on "r_r1"
gets columns "r_r1"
and "n_uniq"
, etc).调用该函数有点尴尬,因为它只保留相关的列名(调用
"r_r1"
获取列"r_r1"
和"n_uniq"
等)。 One way is to get the vector of column names you want, name it so you can add an ID column in map_dfr
, and drop the extra columns, since they'll be mostly NA
.一种方法是获取您想要的列名称向量,命名它以便您可以在
map_dfr
添加一个 ID 列,并删除额外的列,因为它们主要是NA
。
grep("^r_r\\d+", names(df), value = TRUE) %>%
set_names() %>%
map_dfr(cnt_un_name, .id = "y") %>%
select(y, n_uniq)
#> # A tibble: 3 x 2
#> y n_uniq
#> <chr> <int>
#> 1 r_r1 3
#> 2 r_r2 2
#> 3 r_r3 2
A better way is to call the function, then bind after reshaping.更好的方法是调用该函数,然后在整形后绑定。
grep("^r_r\\d+", names(df), value = TRUE) %>%
map(cnt_un_name) %>%
map_dfr(pivot_longer, 1, names_to = "y") %>%
select(y, n_uniq)
# same output as above
Alternatively (and maybe better/more scaleable) would be to do the column renaming inside the function definition.或者(也许更好/更可扩展)是在函数定义中重命名列。
Here's a base R solution that uses lapply
.这是一个使用
lapply
的基本 R 解决方案。 The tricky bit is that your function isn't actually running on single columns;棘手的一点是您的函数实际上并未在单列上运行; it's using
id
, too, so you can't use canned functions that iterate column-wise.它也使用
id
,因此您不能使用按列迭代的固定函数。
do.call(rbind, lapply(grep("r_r", colnames(df), value = TRUE), function(i) {
X <- subset(df, df[,i] == 1)
row <- data.frame(y = i, n_uniq = length(unique(X$id)), stringsAsFactors = FALSE)
}))
y n_uniq
1 r_r1 2
2 r_r2 3
3 r_r3 2
Here is another solution.这是另一种解决方案。 I changed the syntax of your function.
我改变了你的函数的语法。 Now you supply the pattern of the columns you want to select.
现在您提供要选择的列的模式。
cnt_un <- function(var_pattern) {
df %>%
pivot_longer(cols = contains(var_pattern), values_to = "vals", names_to = "y") %>%
filter(vals == 1) %>%
group_by(y) %>%
summarise(n_uniq = n_distinct(id)) %>%
ungroup()
}
cnt_un("r_r")
#> # A tibble: 3 x 2
#> y n_uniq
#> <chr> <int>
#> 1 r_r1 2
#> 2 r_r2 3
#> 3 r_r3 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.