简体   繁体   中英

Read multiple csv files (and skip 2 columns in each csv file) into one dataframe in R?

I have a folder of about 100 csv files and I want to read them into one dataframe in R. I kind of know how to do this but I have to skip the first two columns in every csv file and that is the part I am stuck on. My code so far is:

myfiles <- list.files(pattern = ".csv") # create a list of all csv files in the directory
data_csv <- ldply(myfiles, read.csv)

Thank you for any help

Using the data.table package functions fread() and rbindlist() will provide the result you're after faster than any of the other base or tidyverse alternatives.

library(data.table)

## Create a list of the files
FileList <- list.files(pattern = ".csv")

## Pre-allocate a list to store all of the results of reading
## so that we aren't re-copying the list for each iteration
DTList <- vector(mode = "list", length = length(FileList))

## Read in all the files, excluding the first two columns
for(i %in% seq_along(DTList)) {
  DTList[[i]] <- data.table::fread(FileList[[i]], drop = c(1,2))
}

## Combine the results into a single data.table
DT <- data.table::rbindlist(DTList)

## Optionally, convert the data.table to a data.frame to match requested result
## Though I would recommend looking into using data.table instead!
data.table::setDF(DT)

Here's one way using purrr. You could do basically the same syntax with the base lapply function. The map_dfr function used below applies the read.csv or fread using vectorization. It also has a nice feature of simultaneously binding the dataframes (by row) together to give you a single dataframe.

library(purrr)
myfiles <- list.files(pattern = ".csv") # create a list of all csv files in the directory
data_csv <- map_dfr(myfiles, ~read.csv(.x)[,-c(1,2)])

And taking a note from Matt's answer, you can go even faster with fread and vectorization:

myfiles <- list.files(pattern = ".csv") # create a list of all csv files in the directory
data_csv <- map_dfr(myfiles, ~data.table::fread(.x, drop = c(1,2))

And if you want to go really fast, you could always parallelize with the furrr package.

library(purrr)
library(furrr)

# sets up the workers
plan("multisession")

myfiles <- list.files(pattern = ".csv") # create a list of all csv files in the directory
data_csv <- future_map_dfr(myfiles, ~data.table::fread(.x, drop = c(1,2))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM