導入包含多個 .csv 文件的文件夾並在 R 中一次操作所有數據框

Question

我有一個包含 100 個不同 .csv 文件的文件夾。 並非所有文件都包含相同數量的變量（不同的結構），所以我試圖一次導入它們（為每個 csv 創建單獨的數據框）然后通過添加新列或將日期列從字符轉換為日期來標准化數據框並最終再次導出它們。 這是我的嘗試，它將讀取所有 csv 作為單獨的數據框

setwd(C:/Users/...)
files <- list.files(pattern="*.csv")
for(file in files)
{
  perpos <- which(strsplit(file, "")[[1]]==".")
  assign(
    gsub(" ","",substr(file, 1, perpos-1)), 
    read.csv(paste(path,file,sep="")))
}

但是，當我添加mutate以assign函數以添加新列時，腳本將運行但不會添加任何列！ 我在這里缺少什么？ 我的目標是添加/操作一些變量並再次導出它們，最好在 tidyverse 中

for(file in files)
{
  perpos <- which(strsplit(file, "")[[1]]==".")
  assign(
    gsub(" ","",substr(file, 1, perpos-1)), 
    read_csv(paste(path,file,sep="")),
    mutate(. , Heading = "Data"))
}

例子

df1 <- structure(list(datadate = structure(c(17927, 17927, 17927, 17927, 
17927, 17927), class = "Date"), parent = c("grup", "grup", 
"grup", "grup", "grup", "grup"), ads = c("P9", 
"PS8", "PS7", "PS6", "PS5", "PS5"), chl = c("PSS9", 
"PSS8", "PSS7", "PSS6", "PSS5", "PSS5"), 
    average_monthly = c(196586.49, 289829.43, 
    1363529.14, 380446.43, 147296.09, 948669.38), current_month = c(987118.82, 
    1682872.03, 4356755.73, 2225040.29, 922506.21, 5756525.08
    ), current_month_minus_1 = c(585573.1, 
    635763.37, 6551477.37, 818531.11, 255862.51, 1832829.99), 
    current_month_minus_2 = c(0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-6L))

df2<-
  structure(
    list(
      network = c("STAR", "NPD", "GMD"),
      datadate = structure(c(18259, 18259, 18259)),
      brand = c("grup", "GFK", "MDG"),
      average_weekly = c(140389.14,
                                           10281188.25, 172017.39),
      last_week_avg = c(89303.07,
                                         6918460.99, 110594.64),
      last_week_1_minus_avg = c(141765.83,
                                                 10248501.1, 222484.9),
      last_week_2_minus_avg = c(138043.53,
                                                 9846538.57, 164185.21)

    ),
    class = c("tbl_df", "tbl", "data.frame"),
    row.names = c(NA, -3L)
  )

Answer 1

將文件讀入列表的基本 R 解決方案，合並它們所需的更改取決於您的數據：

# Store a scalar of the path containing the csvs: 

example_dir <- "C:/Users/Example_Dir"

# Create a vector of the csv paths: 

files <- file.path(example_dir, list.files(example_dir, pattern = ".*.csv"))

# Create an empty list the same length as the number of files: 

X <- vector("list", length(files))

# Iterate through the files and store them in a list:

X[] <- lapply(seq_along(files), function(i){

    data.frame(read.csv(files[i]), stringsAsFactors = FALSE)

  }
)

Answer 2

除了您的代碼設計之外，您似乎以錯誤的方式使用了mutate 。

在您的代碼中，您將mutate調用作為assign函數的第三個參數，它應該是位置（變量的環境）。

你真正想寫的是：

assign(
  gsub(" ","",substr(file, 1, perpos-1)), 
  read_csv(paste(path,file,sep="")) %>% 
    mutate(Heading = "Data"))
}

如果您不熟悉管道運算符 ( %>% )，我建議您閱讀一些教程，例如dplyr小插圖，其中有一段介紹它。

這段代碼的意思是：在改變它以添加Heading列之后，分配給一個以gsub調用從 csv 讀取的數據幀命名的變量。

但是，就像在hello_friend的回答中一樣，我敦促您重新考慮您的設計以使用列表而不是一堆變量。 為此， tidyverse方法是使用purrr包

導入包含多個 .csv 文件的文件夾並在 R 中一次操作所有數據框

問題描述

例子

2 個解決方案

解決方案1
1 2020-01-30 08:31:35

解決方案2
1 2020-01-30 09:16:56

導入包含多個 .csv 文件的文件夾並在 R 中一次操作所有數據框

問題描述

例子

2 個解決方案

解決方案1 1 2020-01-30 08:31:35

解決方案2 1 2020-01-30 09:16:56

解決方案1
1 2020-01-30 08:31:35

解決方案2
1 2020-01-30 09:16:56