[英]Reading multiple .dat files as a list and saving as .RDATA files in R
I want to import multiple .DAT
files from a directory and make them as a list elements and then save them as .RDATA
files.我想从目录中导入多个
.DAT
文件并将它们作为列表元素,然后将它们保存为.RDATA
文件。
I tried the following code我尝试了以下代码
files <- dir(pattern = "*.DAT")
library(tidyverse)
Data1 <-
files %>%
map(~ read.table(file = ., fill = TRUE))
which works sometimes and fails others.有时有效,但有时会失败。 The files are also available on this link .
这些文件也可在此链接上找到。 I want to read all files and them save them as
.RDATA
with the same names.我想读取所有文件并将它们保存为具有相同名称的
.RDATA
。
Since the data of the link are partly a little bit unclean, I show you the solution of the core problem of your question on the basis of this example data:由于链接的数据部分有点不干净,我根据这个示例数据向您展示您问题核心问题的解决方案:
(name1 <- name2 <- name3 <- name4 <- name5 <- data.frame(matrix(1:12, 3, 4)))
# X1 X2 X3 X4
# 1 1 4 7 10
# 2 2 5 8 11
# 3 3 6 9 12
We save the data into a sub directory of your working directory named "test"
.我们将数据保存到名为
"test"
工作目录的子目录中。
l <- mget(ls(pattern="^name"))
DIR <- "test"
# dir.create(DIR) # leave out if dir already exists
sapply(1:length(l), function(x)
write.table(l[[x]], file=paste0(DIR, "/", names(l)[x], ".dat"), row.names=FALSE))
Now we look what's inside "test"
.现在我们看看
"test"
里面有什么。
dir(DIR)
# [1] "name1.dat" "name2.dat" "name3.dat" "name4.dat" "name5.dat"
Now we import the files in the directory by pattern.现在我们按模式导入目录中的文件。 I use
rio::import_list
, which nicely imports the files into a list an uses data.table::fread
inside.我使用
rio::import_list
,它很好地将文件导入到一个列表中,并在其中使用data.table::fread
。 But your own code also would work fine.但是您自己的代码也可以正常工作。
# rm(list=ls()) # commented out for user safety
L <- rio::import_list(paste0(DIR, "/", dir(DIR, pattern="\\.dat$")), format="tsv")
To save them as .Rdata
we want to assign
names dynamically which we achive with the list
option within save()
.要将它们保存为
.Rdata
我们希望动态assign
名称,我们使用save()
的list
选项实现。
sapply(seq_along(L), function(x) {
tmp <- L[[x]]
assign(names(L)[x], tmp)
save(list=names(L)[x], file=paste0(DIR, "/", names(L)[x], ".Rdata"))
})
When we list the directory we see that the data was created.当我们列出目录时,我们会看到数据已创建。
dir(DIR)
# [1] "name1.dat" "name1.Rdata" "name2.dat" "name2.Rdata" "name3.dat" "name3.Rdata"
# [7] "name4.dat" "name4.Rdata" "name5.dat" "name5.Rdata"
Now let's look whether the object names also were created correctly:现在让我们看看对象名称是否也被正确创建:
# rm(list=ls()) # commented out for user safety
load("test/name1.Rdata")
ls()
# [1] "name1"
name1
# X1 X2 X3 X4
# 1 1 4 7 10
# 2 2 5 8 11
# 3 3 6 9 12
Which is the case.情况就是这样。
We alternatively could attempt a more direct approach using rvest
.我们也可以尝试使用
rvest
更直接的方法。 First we fetch the data names:首先我们获取数据名称:
library(rvest)
dat.names <- html_attr(html_nodes(read_html(
"https://www2.stat.duke.edu/courses/Spring03/sta113/Data/Hand/Hand.html"),
"a"), "href")
and create individual links:并创建单独的链接:
links <- as.character(sapply(dat.names, function(x)
paste0("https://www2.stat.duke.edu/courses/Spring03/sta113/Data/Hand/", x)))
The remainder is basically the same as above:剩下的基本和上面一样:
DIR <- "test"
# dir.create(DIR) # leave out if dir already exists
library(rio)
system.time(L <- import_list(links, format="tsv") ) # this will take a minute
sapply(seq_along(L), function(x) {
tmp <- L[[x]]
assign(names(L)[x], tmp)
save(list=names(L)[x], file=paste0(DIR, "/", names(L)[x], ".Rdata"))
})
# rm(list=ls()) # commented out for user safety
load("test/clinical.Rdata") # test a data set
clinical
# V1 V2 V3
# 1 26 31 57
# 2 51 59 110
# 3 21 11 32
# 4 40 34 74
# 5 138 135 273
However, as noted earlier in the introduction, the data are partly a little bit unclean and you probably will have to handle them individually and adapt the code case-wise.但是,正如前面介绍中所指出的,数据部分有点不干净,您可能必须单独处理它们并逐个调整代码。
This should get you close.这应该让你接近。 It reads all the .dat files from your directory and saves them as .RData files in your directory with the appropriate names.
它从您的目录中读取所有 .dat 文件,并将它们以适当的名称保存为您的目录中的 .RData 文件。 One downside is that when you open them in R they retain the "temp.file" name, so you have to rename them manually or just open them one at a time.
一个缺点是,当您在 R 中打开它们时,它们会保留“temp.file”名称,因此您必须手动重命名它们或一次只打开一个。 Not sure how to get around that.
不知道如何解决这个问题。
file.list <- lapply(1:length(dir()), function(x) read.delim(dir()[x], header=FALSE))
names.list <- lapply(1:length(dir()), function(x) gsub(".dat", "", dir()[x]))
for(i in 1:length(file.list)){
temp.file <- file.list[[i]]
temp.name <- paste(names.list[[i]], ".RData", sep="")
save(temp.file, file=temp.name)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.