[英]Reading multiple RDS files
I have a directory with multiple RDS files (300+) that I would like to read and combine, these RDS files share the same basic format but have different number of rows & a few different columns in each file.我有一个包含多个 RDS 文件(300+)的目录,我想阅读和合并这些 RDS 文件共享相同的基本格式,但每个文件中的行数和列数不同。 I have the simple code to read one RDS file (All files have same "Events-3digitnumber-4digitnumber-6digitnumber.RDS")
我有读取一个 RDS 文件的简单代码(所有文件都具有相同的“Events-3digitnumber-4digitnumber-6digitnumber.RDS”)
mydata <- readRDS("Events-104-2014-752043.RDS")
Being new to data science I'm sure this simple answer that I'm missing but would I have to use something like list.files() and either lapply or some for loop.作为数据科学的新手,我确定我缺少这个简单的答案,但我是否必须使用 list.files() 之类的东西以及 lapply 或一些 for 循环。
Just to add a tidyverse
answer: 只是添加一个
tidyverse
答案:
library(tidyverse)
df <- list.files(pattern = ".RDS") %>%
map(readRDS) %>%
bind_rows()
Update: 更新:
It is advised to use map_dfr
for binding rows and map_dfc
for binding columns, much more efficient: 它建议使用
map_dfr
用于绑定行和map_dfc
绑定列,更高效:
df <- list.files(pattern = ".RDS") %>%
map_dfr(readRDS)
Because the solution from FMM did not work for me with huge data sets, I replaced bind_rows()
with data.table::rbindlist()
: 因为从溶液中FMM并没有为我巨大的数据集工作,我换成
bind_rows()
与data.table::rbindlist()
library(tidyverse)
library(data.table)
df <- list.files(pattern = ".rds") %>%
map(readRDS) %>%
data.table::rbindlist()
First a reproducible example: 首先是一个可重现的示例:
data(iris)
# make sure that the two data sets (iris, iris2) have different columns
iris2 = copy(iris)
iris2$Species2 = iris2$Species
iris2$Species = NULL
saveRDS(iris, "Events-104-2014-752043.RDS")
saveRDS(iris2, "Events-104-2015-782043.RDS")
Now you need to 现在你需要
I would use data.table::rbindlist
because it handles differing columns for you when you set fill = TRUE
: 我将使用
data.table::rbindlist
因为当您设置fill = TRUE
时,它将为您处理不同的列:
require(data.table)
files = list.files(path = '.', pattern = '^Events-[0-9]{3}-[0-9]{4}-[0-9]{6}\\.RDS$')
dat_list = lapply(files, function (x) data.table(readRDS(x)))
dat = rbindlist(dat_list, fill = TRUE)
complementing FMM's answer above, you may need to include the "full.names=TRUE" in the list.files command to allow map_dfr to read it properly, depending on the path to your files.补充上面的 FMM 答案,您可能需要在 list.files 命令中包含“full.names=TRUE”,以允许 map_dfr 正确读取它,具体取决于文件的路径。
df <- list.files(pattern = ".RDS", full.names=T)%>%
map_dfr(readRDS)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.