简体   繁体   English

循环多个数据帧以在 R 中执行多个操作

[英]Loop over several dataframes to do several actions in R

I have several dataframes ( dataframe_1 , dataframe_2 ...) that I want to loop in order to execute the same functions over all the dataframes.我有几个要循环的数据帧( dataframe_1dataframe_2 ...),以便在所有数据帧上执行相同的功能。 These functions are:这些功能是:

  • Select specific columns:选择特定列:
dataframe_1 <- dataframe_1[, c("Column_1", "Column_2")]

  • Rename the columns:重命名列:
dataframe_1 <- rename(dtaframe_1, New_Name_for_Column_1 = Column_1)
  • Create new columns.创建新列。 For example, by using the ifelse() function:例如,通过使用ifelse()函数:
dataframe_1$Column_3 <- ifelse(dataframe_1$Column_1 = 5, 1, 0)

I have proven the code with some dataframes individually without errors.我已经用一些数据帧单独证明了代码,没有错误。

However, if I execute the following loop:但是,如果我执行以下循环:

list_dataframes = list(dataframe_1, dataframe_2)

for (dataframe in 1:length(list_dataframes)){
 dataframe <- dataframe[, c("Column_1", "Column_2")]
 dataframe <- rename(dtaframe, New_Name_for_Column_1 = Column_1)
 dataframe$Column_3 <- ifelse(dataframe$Column_1 = 5, 1, 0)
}

The following error arises:出现以下错误:

Error in dataframe[, c("Column_1", "Column_2",  : 
  incorrect number of dimensions

(All dataframes have the same column names.) (所有数据框都具有相同的列名。)

Any idea?任何想法?

Thanks!谢谢!

You are not iterating over the list of dataframes, but rather over a sequence 1:length(list_dataframes) .您不是遍历数据帧列表,而是遍历序列1:length(list_dataframes) Consider the following for illustration:考虑以下说明:

a = list("a", "b")
for (i in a){print(i)}
for (i in 1:length(a)){print(i)}

In your code, you need to explicitly access the list elements like this:在您的代码中,您需要像这样显式访问列表元素:

list_dataframes = list(dataframe_1, dataframe_2)

for (df_number in 1:length(list_dataframes)){
  list_dataframes[[df_number]] <- list_dataframes[[df_number]][, c("Column_1", "Column_2")]
  list_dataframes[[df_number]] <- rename(list_dataframes[[df_number]], New_Name_for_Column_1 = Column_1)
  list_dataframes[[df_number]]$Column_3 <- ifelse(list_dataframes[[df_number]]$Column_1 = 5, 1, 0)
}

the code for (dataframe in 1:length(list_dataframes)) creates a vector of numbers c(1,2) in which the value of one value at a time is stored in a variable named dataframe . for (dataframe in 1:length(list_dataframes))的代码创建了一个数字向量c(1,2) ,其中一次一个值的值存储在名为dataframe的变量中。 This iteration variable is scalar ie it has 1 dimension and a length of 1. This is why you can not subset doing dataframe[, c("Column_1", "Column_2")] Do this instead: list_dataframes[[dataframe]][, c("Column_1", "Column_2")]这个迭代变量是标量的,即它有 1 个维度和 1 个长度。这就是为什么你不能子集做dataframe[, c("Column_1", "Column_2")]这样做: list_dataframes[[dataframe]][, c("Column_1", "Column_2")]

You could try to iterate over dataframes using purrr::map_dfr() , eg您可以尝试使用purrr::map_dfr()迭代数据帧,例如

list_dataframes = list(dataframe_1, dataframe_2)

library(dplyr)
library(purrr)

list_dataframes %>% 
  map_dfr(~.x %>% 
            select(Column_1, Column_2) %>% 
            rename(New_Name_for_Column_1 = Column_1) %>% 
            mutate(Column3= ifelse(Column_1 == 5, 1, 0)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM