[英]How to apply the same function to several variables in R?
I know that similar questions have already been asked (eg Passing list element names as a variable to functions within lapply or R - iteratively apply a function of a list of variables ), but I couldn't manage to find a solution for my problem based on these posts.我知道已经问过类似的问题(例如,将列表元素名称作为变量传递给 lapply或R 中的函数 - 迭代地应用变量列表的 function ),但我无法找到基于我的问题的解决方案在这些帖子上。
I have an event dataset (~100 variables, >2000 observations) that contains variables with information on the involved actors.我有一个事件数据集(约 100 个变量,>2000 个观察值),其中包含包含有关参与者信息的变量。 One variable can only contain one actor, so if several actors have been involved in the event, they are spread over several variables (eg actor1 , actor2 , ...).一个变量只能包含一个参与者,因此如果事件中涉及了多个参与者,它们将分布在多个变量中(例如actor1 、 actor2 、...)。 These actors can be classified into two groups ("s" and "nons").这些演员可以分为两组(“s”和“nons”)。 For later use, I need two lists of actors : one that contains all actors of the category "s" and one that contains all actors of "nons".为了以后使用,我需要两个演员列表:一个包含“s”类别的所有演员,另一个包含“nons”类别的所有演员。 "s" only consists of three actors while "nons" consists of dozens of actors. “s”仅由三个演员组成,而“nons”由数十个演员组成。
# create example data
df <- data.frame(id = c(1:8),
actor1 = c("A", "B", "D", "E", "F", "G", "H", NA),
actor2 = c("A", NA, "B", "C", "E", "I", "D", "G"))
df <-
df %>%
mutate(actor1 = as.character(actor1),
actor2 = as.character(actor2))
Since the script I am about to prepare is supposed to be used on updated versions of the dataset in the future, I would like to automate as much as possible and keep the parts of the script that would need to be adapted as limited as possible.由于我即将准备的脚本应该在未来用于数据集的更新版本,我想尽可能地自动化,并尽可能限制脚本中需要调整的部分。 My idea was to create one function per category that extracts the actors of the respective category (eg "nons") from one variable (eg actor1 ) in a list and then "loop" this function over the other variables (ideally with the apply family).我的想法是为每个类别创建一个 function 从列表中的一个变量(例如actor1 )中提取相应类别的参与者(例如“nons”),然后在其他变量上“循环”这个 function(理想情况下使用apply系列)。
I know which category each actor belongs to ("A", "B", and "C" are category "s"), which allows me to define a separation rule as used in the function below (the filter command).我知道每个演员属于哪个类别(“A”、“B”和“C”是类别“s”),这允许我定义一个分隔规则,如下面的 function 中使用的(过滤器命令)。
# create function
nons_function <- function(col) {
col_ <- enquo(col)
nons_list <-
df %>%
filter(!is.na(!!col_), !!col_ != "A", !!col_ != "B", !!col_ != "C") %>%
distinct(!!col_) %>%
pull()
nons_list
}
# create list of variables to "loop" over
actorlist <- c("actor1", "actor2")
This results in the following.这导致以下结果。 Instead of two lists of actors I get a list that contains the variable names as character strings.我得到一个包含变量名称作为字符串的列表,而不是两个演员列表。
> lapply(actorlist, nons_function)
[[1]]
[1] "actor1"
[[2]]
[1] "actor2"
What I would like to get is something like the following:我想得到的是如下内容:
> lapply(actorlist, nons_function)
[[1]]
[1] "D" "E" "F" "G" "H"
[[2]]
[1] "E" "I" "D" "G"
The problem is probably the way I am passing the variable names to my function within lapply .问题可能是我将变量名称传递给 lapply 内的function的方式。 Apparently, my function is not able use a character input as variable names.显然,我的 function 无法使用字符输入作为变量名。 However, I have not found a way to either adapt my function in a way that allows for character input or to provide my function with a list of variables to loop over in a way it can digest.但是,我还没有找到一种方法来调整我的 function 以允许字符输入,或者为我的 function 提供一个变量列表,以便以它可以消化的方式循环。
Any help appreciated!任何帮助表示赞赏!
EDIT: Initially I had named the actors in a misleading way (actor names indicated which category an actor belongs to), which lead to answers that do not really help in my case.编辑:最初我以一种误导性的方式命名了演员(演员名称表明演员属于哪个类别),这导致答案对我的情况没有真正的帮助。 I have changed the actor names from "s1", "s2", "nons1", "nons2" etc to "A", "B", "C" etc now.我现在将演员名称从“s1”、“s2”、“nons1”、“nons2”等更改为“A”、“B”、“C”等。
here is an option using base r.这是使用基础 r 的选项。
for nons-actors:对于非演员:
lapply( df[, 2:3], function(x) grep( "^nons", x, value = TRUE ) )
#$actor1
#[1] "nons1" "nons2" "nons3" "nons4" "nons5"
#
#$actor2
#[1] "nons2" "nons6" "nons1" "nons4"
and for s-actors:对于 s 演员:
lapply( df[, 2:3], function(x) grep( "^s", x, value = TRUE ) )
# $actor1
# [1] "s1" "s2"
#
# $actor2
# [1] "s1" "s2" "s3"
Here is an option这是一个选项
library(dplyr)
library(stringr)
library(purrr)
map(actorlist, ~ df %>%
select(.x) %>%
filter(!str_detect(!! rlang::sym(.x), "^s\\d+$")) %>%
pull(1))
#[[1]]
#[1] "nons1" "nons2" "nons3" "nons4" "nons5"
#[[2]]
#[1] "nons2" "nons6" "nons1" "nons4"
It can be wrapped as a function as well.它也可以包装为 function。 Note that the input is string, so instead of enquo
, use sym
to convert to symbol and then evaluate ( !!
)请注意,输入是字符串,因此不要使用enquo
,而是使用sym
转换为符号然后计算 ( !!
)
f1 <- function(dat, colNm) {
dat %>%
select(colNm) %>%
filter(!str_detect(!! rlang::sym(colNm), "^s\\d+$")) %>%
pull(1) %>%
unique
}
map(actorlist, f1, dat = df)
NOTE: This can be done more easily, but here we are using similar code from the OP's post注意:这可以更容易地完成,但在这里我们使用来自 OP 帖子的类似代码
Another option is to use split
with grepl
in base R
and that returns a list
of both 'nons' and 's' after removing the NA
s另一种选择是在base R
split
与grepl
一起使用,并在删除NA
后返回“nons”和“s”的list
lapply(df[2:3], function(x) {
x1 <- x[!is.na(x)]
split(x1, grepl("nons", x1))})
Check my solution and see if it works for you.检查我的解决方案,看看它是否适合你。
require("dplyr")
# create example data
df <- data.frame(id = c(1:8),
actor1 = c("s1", "s2", "nons1", "nons2", "nons3", "nons4", "nons5", NA),
actor2 = c("s1", NA, "s2", "s3", "nons2", "nons6", "nons1", "nons4"))
df <-
df %>%
mutate(actor1 = as.character(actor1),
actor2 = as.character(actor2))
# Function for getting the category
category_function <- function(col,categ){
if(categ == "non"){
outp = grep("^non",col,value = T)
}else{
outp = grep("^s",col,value = T)
}
return(outp)
}
# Apply the function to all variables whose name starts with "actor"
sapply(df[grep("actor",names(df),value=T)],category_function,categ="non")
sapply(df[grep("actor",names(df),value=T)],category_function,categ="s")
My output was the following:我的 output 如下:
> sapply(df[grep("actor",names(df),value=T)],category_function,categ="non")
$actor1
[1] "nons1" "nons2" "nons3" "nons4" "nons5"
$actor2
[1] "nons2" "nons6" "nons1" "nons4"
> sapply(df[grep("actor",names(df),value=T)],category_function,categ="s")
$actor1
[1] "s1" "s2"
$actor2
[1] "s1" "s2" "s3"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.