[英]Loop over several dataframes to do several actions in R
I have several dataframes ( dataframe_1
, dataframe_2
...) that I want to loop in order to execute the same functions over all the dataframes.我有几个要循环的数据帧(
dataframe_1
, dataframe_2
...),以便在所有数据帧上执行相同的功能。 These functions are:这些功能是:
dataframe_1 <- dataframe_1[, c("Column_1", "Column_2")]
dataframe_1 <- rename(dtaframe_1, New_Name_for_Column_1 = Column_1)
ifelse()
function:ifelse()
函数:dataframe_1$Column_3 <- ifelse(dataframe_1$Column_1 = 5, 1, 0)
I have proven the code with some dataframes individually without errors.我已经用一些数据帧单独证明了代码,没有错误。
However, if I execute the following loop:但是,如果我执行以下循环:
list_dataframes = list(dataframe_1, dataframe_2)
for (dataframe in 1:length(list_dataframes)){
dataframe <- dataframe[, c("Column_1", "Column_2")]
dataframe <- rename(dtaframe, New_Name_for_Column_1 = Column_1)
dataframe$Column_3 <- ifelse(dataframe$Column_1 = 5, 1, 0)
}
The following error arises:出现以下错误:
Error in dataframe[, c("Column_1", "Column_2", :
incorrect number of dimensions
(All dataframes have the same column names.) (所有数据框都具有相同的列名。)
Any idea?任何想法?
Thanks!谢谢!
You are not iterating over the list of dataframes, but rather over a sequence 1:length(list_dataframes)
.您不是遍历数据帧列表,而是遍历序列
1:length(list_dataframes)
。 Consider the following for illustration:考虑以下说明:
a = list("a", "b")
for (i in a){print(i)}
for (i in 1:length(a)){print(i)}
In your code, you need to explicitly access the list elements like this:在您的代码中,您需要像这样显式访问列表元素:
list_dataframes = list(dataframe_1, dataframe_2)
for (df_number in 1:length(list_dataframes)){
list_dataframes[[df_number]] <- list_dataframes[[df_number]][, c("Column_1", "Column_2")]
list_dataframes[[df_number]] <- rename(list_dataframes[[df_number]], New_Name_for_Column_1 = Column_1)
list_dataframes[[df_number]]$Column_3 <- ifelse(list_dataframes[[df_number]]$Column_1 = 5, 1, 0)
}
the code for (dataframe in 1:length(list_dataframes))
creates a vector of numbers c(1,2)
in which the value of one value at a time is stored in a variable named dataframe
. for (dataframe in 1:length(list_dataframes))
的代码创建了一个数字向量c(1,2)
,其中一次一个值的值存储在名为dataframe
的变量中。 This iteration variable is scalar ie it has 1 dimension and a length of 1. This is why you can not subset doing dataframe[, c("Column_1", "Column_2")]
Do this instead: list_dataframes[[dataframe]][, c("Column_1", "Column_2")]
这个迭代变量是标量的,即它有 1 个维度和 1 个长度。这就是为什么你不能子集做
dataframe[, c("Column_1", "Column_2")]
这样做: list_dataframes[[dataframe]][, c("Column_1", "Column_2")]
You could try to iterate over dataframes using purrr::map_dfr()
, eg您可以尝试使用
purrr::map_dfr()
迭代数据帧,例如
list_dataframes = list(dataframe_1, dataframe_2)
library(dplyr)
library(purrr)
list_dataframes %>%
map_dfr(~.x %>%
select(Column_1, Column_2) %>%
rename(New_Name_for_Column_1 = Column_1) %>%
mutate(Column3= ifelse(Column_1 == 5, 1, 0)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.