简体   繁体   English

读取多个 RDS 文件

[英]Reading multiple RDS files

I have a directory with multiple RDS files (300+) that I would like to read and combine, these RDS files share the same basic format but have different number of rows & a few different columns in each file.我有一个包含多个 RDS 文件(300+)的目录,我想阅读和合并这些 RDS 文件共享相同的基本格式,但每个文件中的行数和列数不同。 I have the simple code to read one RDS file (All files have same "Events-3digitnumber-4digitnumber-6digitnumber.RDS")我有读取一个 RDS 文件的简单代码(所有文件都具有相同的“Events-3digitnumber-4digitnumber-6digitnumber.RDS”)

    mydata <- readRDS("Events-104-2014-752043.RDS")

Being new to data science I'm sure this simple answer that I'm missing but would I have to use something like list.files() and either lapply or some for loop.作为数据科学的新手,我确定我缺少这个简单的答案,但我是否必须使用 list.files() 之类的东西以及 lapply 或一些 for 循环。

Just to add a tidyverse answer: 只是添加一个tidyverse答案:

library(tidyverse)

df <- list.files(pattern = ".RDS") %>%
  map(readRDS) %>% 
  bind_rows()

Update: 更新:

It is advised to use map_dfr for binding rows and map_dfc for binding columns, much more efficient: 它建议使用map_dfr用于绑定行和map_dfc绑定列,更高效:

df <- list.files(pattern = ".RDS") %>%
  map_dfr(readRDS)

Because the solution from FMM did not work for me with huge data sets, I replaced bind_rows() with data.table::rbindlist() : 因为从溶液中FMM并没有为我巨大的数据集工作,我换成bind_rows()data.table::rbindlist()

library(tidyverse)
library(data.table)

df <- list.files(pattern = ".rds") %>%
      map(readRDS) %>% 
      data.table::rbindlist()

First a reproducible example: 首先是一个可重现的示例:

data(iris)
# make sure that the two data sets (iris, iris2) have different columns
iris2 = copy(iris)
iris2$Species2 = iris2$Species
iris2$Species = NULL

saveRDS(iris, "Events-104-2014-752043.RDS")
saveRDS(iris2, "Events-104-2015-782043.RDS")

Now you need to 现在你需要

  1. find all file names 查找所有文件名
  2. read the data 读取数据
  3. combine the data to one table (if you want that) 将数据合并到一张表中(如果需要)

I would use data.table::rbindlist because it handles differing columns for you when you set fill = TRUE : 我将使用data.table::rbindlist因为当您设置fill = TRUE时,它将为您处理不同的列:

require(data.table)
files = list.files(path = '.', pattern = '^Events-[0-9]{3}-[0-9]{4}-[0-9]{6}\\.RDS$')
dat_list = lapply(files, function (x) data.table(readRDS(x)))
dat = rbindlist(dat_list, fill = TRUE)

complementing FMM's answer above, you may need to include the "full.names=TRUE" in the list.files command to allow map_dfr to read it properly, depending on the path to your files.补充上面的 FMM 答案,您可能需要在 list.files 命令中包含“full.names=TRUE”,以允许 map_dfr 正确读取它,具体取决于文件的路径。

df <- list.files(pattern = ".RDS", full.names=T)%>%
 map_dfr(readRDS)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM