简体   繁体   中英

How to merge csv files from nested folders in R

I have a large collection of csv files that are in different folders and in folders within folders that I need to merge into one file. It would be easy if they were all in one directory but I don't know of a simple way to pull them all out of the different folders. I could combine them one by one but there are A LOT of them.

ex:

+ working directory
|
+-- · data.csv
+-- · data2.csv
+-- + NewFolder
    |
    +-- · data3.csv
    +-- + NewFolder2
        |
        +-- · data4.csv

I want one file that combines all data csv files

You can use dir() with recursive set to TRUE to list all files in the folder tree, and you can use pattern to define a regular expression to filter the .csv files. An example:

csv_files <- dir(pattern='.*[.]csv', recursive = T)

or even better and simpler (thanks to speendo for his comment):

csv_files <- dir(pattern='*.csv$', recursive = T)

The explanation.

  • pattern='*.csv$ : The pattern argument must be a regular expression that filters the file names. This RegEx filters out the file names that end with .csv .

    If you want to filter that starts with data , you should try a pattern like this: pattern='^data.*.csv$'

  • recursive=T : Forces dir() to traverse recursively through all folders below the working directory.

After you get the file list, and assuming all of them have the same structure (ie all the files have the same columns), you can merge them with read.csv() and rbind() :

for(i in 1:length(csv_files)) {
  if(i == 1)
    df <- read.csv(csv_files[i])
  else
    df <- rdbind(df, read.csv(csv_files[i]))
}

Ramnath suggests in his comment a faster way to merge the .csv files (again, assuming all of them have the same structure ):

library(dplyr)
df <- rbind_all(lapply(csv_files, read_csv))

Here is a solution with dplyr

# get list of files ending in csv in directory root
dir(root, pattern='csv$', recursive = TRUE, full.names = TRUE) %>%
  # read files into data frames
  lapply(FUN = read.csv) %>%
  # bind all data frames into a single data frame
  rbind_all %>%
  # write into a single csv file
  write.csv("all.csv")

This solution has the assumption that all *.csv files have the same structure.

(Untested)

fileList <- list.files(
  pattern="*.csv$",
  recursive=TRUE,
  full.name=TRUE,
  )

completeCSV <- data.frame()


for(file in fileList) {
  print(file) # for debug: print current file
  if (nrow(completeCSV) == 0) {
    completeCSV <- read.csv(file)
  } else {
    curDF <- read.csv(file) # could also be read.csv2()
    rbind(completeCSV, curDF)
  }
}

write.csv(completeCSV) # could also be write.csv2()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM