将多个 csv 文件（并在每个 csv 文件中跳过 2 列）读入 R 中的一个数据帧？

Question

我有一个包含大约 100 个 csv 文件的文件夹，我想将它们读入 R 中的一个数据帧。我知道如何执行此操作，但我必须跳过每个 csv 文件中的前两列，这就是我被卡住的部分在。 到目前为止我的代码是：

myfiles <- list.files(pattern = ".csv") # create a list of all csv files in the directory
data_csv <- ldply(myfiles, read.csv)

感谢您的任何帮助

Answer 1

使用data.table包函数fread()和rbindlist()将比任何其他base或tidyverse替代方案更快地提供您想要的结果。

library(data.table)

## Create a list of the files
FileList <- list.files(pattern = ".csv")

## Pre-allocate a list to store all of the results of reading
## so that we aren't re-copying the list for each iteration
DTList <- vector(mode = "list", length = length(FileList))

## Read in all the files, excluding the first two columns
for(i %in% seq_along(DTList)) {
  DTList[[i]] <- data.table::fread(FileList[[i]], drop = c(1,2))
}

## Combine the results into a single data.table
DT <- data.table::rbindlist(DTList)

## Optionally, convert the data.table to a data.frame to match requested result
## Though I would recommend looking into using data.table instead!
data.table::setDF(DT)

Answer 2

这是使用 purrr 的一种方法。 您可以使用基本 lapply 函数执行基本相同的语法。 下面使用的map_dfr函数使用矢量化应用read.csv或fread 。 它还有一个很好的功能，可以同时将数据帧（按行）绑定在一起，为您提供单个数据帧。

library(purrr)
myfiles <- list.files(pattern = ".csv") # create a list of all csv files in the directory
data_csv <- map_dfr(myfiles, ~read.csv(.x)[,-c(1,2)])

并从 Matt 的回答中记下，您可以使用fread和矢量化更快：

myfiles <- list.files(pattern = ".csv") # create a list of all csv files in the directory
data_csv <- map_dfr(myfiles, ~data.table::fread(.x, drop = c(1,2))

如果你想走得非常快，你总是可以与furrr包并行。

library(purrr)
library(furrr)

# sets up the workers
plan("multisession")

myfiles <- list.files(pattern = ".csv") # create a list of all csv files in the directory
data_csv <- future_map_dfr(myfiles, ~data.table::fread(.x, drop = c(1,2))

将多个 csv 文件（并在每个 csv 文件中跳过 2 列）读入 R 中的一个数据帧？

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-12-30 14:42:25

解决方案2
2 2019-12-30 14:10:42

将多个 csv 文件（并在每个 csv 文件中跳过 2 列）读入 R 中的一个数据帧？

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-12-30 14:42:25

解决方案2 2 2019-12-30 14:10:42

解决方案1
3 已采纳 2019-12-30 14:42:25

解决方案2
2 2019-12-30 14:10:42