[英]How use purrr::map family to apply function to a list of data frames directly, not create new objects
I want to apply a function to a set of data frames and have those data frames be updated directly, instead of creating new output that I use to overwrite the current data frames.我想将一个函数应用于一组数据帧并直接更新这些数据帧,而不是创建用于覆盖当前数据帧的新输出。
As an example, I have two data frames, df_a
and df_b
and a function age_category
, which adds a column, stating whether the person is a child or an adult, depending on their age.例如,我有两个数据框
df_a
和df_b
以及一个函数age_category
,它添加了一个列,说明该人是儿童还是成人,具体取决于他们的年龄。
df_a <- data.frame(Region = c("North", "South"), Age = c(14, 50))
df_b <- data.frame(Staple = c("Rice", "Potato"), Age = c(35, 2))
df_a
> Region Age
> 1 North 14
> 2 South 50
df_b
> Staple Age
> 1 Rice 35
> 2 Potato 2
age_category <- function(x){
x$category <- ifelse(x$Age >= 18, "adult", "child")
return(x)
}
I create a list of the data frames and apply the function to them.我创建了一个数据框列表并将函数应用于它们。
df_list <- list(df_a, df_b)
library(purrr)
exmpl_1 <- purrr::map(df_list, age_category)
exmpl_1
> [[1]]
> Region Age category
> 1 North 14 child
> 2 South 50 adult
> [[2]]
> Staple Age category
> 1 Rice 35 adult
> 2 Potato 2 child
Now I could use exmpl_1[[1]]
to overwrite df_a
( df_a <- exmpl_1[[1]]
) and the same for df_b
.现在我可以使用
exmpl_1[[1]]
覆盖df_a
( df_a <- exmpl_1[[1]]
) 和df_b
相同。
I am looking for a way to directly have the function overwrite the data frames as they go.我正在寻找一种直接让函数在数据帧运行时覆盖它们的方法。 Since I am not creating any output I would think I would need to change the function and use
walk
instead of map
.由于我没有创建任何输出,我认为我需要更改函数并使用
walk
而不是map
。
age_category_alt <- function(x){
x$category <- ifelse(x$Age >= 18, "adult", "child")
assign(deparse(substitute(x)), x)
}
walk(df_list, age_category_alt)
But this does not work.但这不起作用。 The data frames do not change and the only outcome is this warning:
数据框不会改变,唯一的结果是这个警告:
Warning messages:
1: In assign(deparse(substitute(x)), x) :
only the first element is used as variable name
2: In assign(deparse(substitute(x)), x) :
only the first element is used as variable name
I kindly ask for assistance.我恳请帮助。
There are multiple ways to handle this, although I personally prefer to keep data in lists instead of separate dataframes.有多种方法可以解决这个问题,尽管我个人更喜欢将数据保存在列表中而不是单独的数据帧中。
library(purrr)
1) Using named list and age_category
function from the OP, we can use map
and list2env
1)使用来自OP的命名列表和
age_category
函数,我们可以使用map
和list2env
df_list <- list(df_a = df_a, df_b = df_b)
df_list <- map(df_list, age_category)
list2env(df_list, .GlobalEnv)
df_a
# Region Age category
#1 North 14 child
#2 South 50 adult
df_b
# Staple Age category
#1 Rice 35 adult
#2 Potato 2 child
2) Using same named list from above and assign
with imap
. 2) 使用上面相同的命名列表并使用
imap
assign
。
age_category_alt <- function(x, y){
x$category <- ifelse(x$Age >= 18, "adult", "child")
assign(y, x, envir = .GlobalEnv)
}
imap(df_list, age_category_alt)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.