简体   繁体   English

将多个 .dat 文件作为列表读取并保存为 R 中的 .RDATA 文件

[英]Reading multiple .dat files as a list and saving as .RDATA files in R

I want to import multiple .DAT files from a directory and make them as a list elements and then save them as .RDATA files.我想从目录中导入多个.DAT文件并将它们作为列表元素,然后将它们保存为.RDATA文件。

I tried the following code我尝试了以下代码

files <- dir(pattern = "*.DAT")
library(tidyverse)
Data1 <- 
  files %>%
    map(~ read.table(file = ., fill = TRUE))

which works sometimes and fails others.有时有效,但有时会失败。 The files are also available on this link .这些文件也可在此链接上找到 I want to read all files and them save them as .RDATA with the same names.我想读取所有文件并将它们保存为具有相同名称的.RDATA

Since the data of the link are partly a little bit unclean, I show you the solution of the core problem of your question on the basis of this example data:由于链接的数据部分有点不干净,我根据这个示例数据向您展示您问题核心问题的解决方案:

(name1 <- name2 <- name3 <- name4 <- name5 <- data.frame(matrix(1:12, 3, 4)))
#   X1 X2 X3 X4
# 1  1  4  7 10
# 2  2  5  8 11
# 3  3  6  9 12

We save the data into a sub directory of your working directory named "test" .我们将数据保存到名为"test"工作目录的子目录中。

l <- mget(ls(pattern="^name"))
DIR <- "test"
# dir.create(DIR)  # leave out if dir already exists
sapply(1:length(l), function(x) 
  write.table(l[[x]], file=paste0(DIR, "/", names(l)[x], ".dat"), row.names=FALSE))

Now we look what's inside "test" .现在我们看看"test"里面有什么。

dir(DIR)
# [1] "name1.dat" "name2.dat" "name3.dat" "name4.dat" "name5.dat"

Now we import the files in the directory by pattern.现在我们按模式导入目录中的文件。 I use rio::import_list , which nicely imports the files into a list an uses data.table::fread inside.我使用rio::import_list ,它很好地将文件导入到一个列表中,并在其中使用data.table::fread But your own code also would work fine.但是您自己的代码也可以正常工作。

# rm(list=ls())  # commented out for user safety
L <- rio::import_list(paste0(DIR, "/", dir(DIR, pattern="\\.dat$")), format="tsv")

To save them as .Rdata we want to assign names dynamically which we achive with the list option within save() .要将它们保存为.Rdata我们希望动态assign名称,我们使用save()list选项实现。

sapply(seq_along(L), function(x) {
  tmp <- L[[x]]
  assign(names(L)[x], tmp)
  save(list=names(L)[x], file=paste0(DIR, "/", names(L)[x], ".Rdata"))
})

When we list the directory we see that the data was created.当我们列出目录时,我们会看到数据已创建。

dir(DIR)
# [1] "name1.dat"   "name1.Rdata" "name2.dat"   "name2.Rdata" "name3.dat"   "name3.Rdata"
# [7] "name4.dat"   "name4.Rdata" "name5.dat"   "name5.Rdata"

Now let's look whether the object names also were created correctly:现在让我们看看对象名称是否也被正确创建:

# rm(list=ls())  # commented out for user safety
load("test/name1.Rdata")
ls()
# [1] "name1"
name1
#   X1 X2 X3 X4
# 1  1  4  7 10
# 2  2  5  8 11
# 3  3  6  9 12

Which is the case.情况就是这样。

Add-on option附加选项

We alternatively could attempt a more direct approach using rvest .我们也可以尝试使用rvest更直接的方法。 First we fetch the data names:首先我们获取数据名称:

library(rvest)
dat.names <- html_attr(html_nodes(read_html(
  "https://www2.stat.duke.edu/courses/Spring03/sta113/Data/Hand/Hand.html"),
  "a"), "href")

and create individual links:并创建单独的链接:

links <- as.character(sapply(dat.names, function(x)
  paste0("https://www2.stat.duke.edu/courses/Spring03/sta113/Data/Hand/", x)))

The remainder is basically the same as above:剩下的基本和上面一样:

DIR <- "test"
# dir.create(DIR)  # leave out if dir already exists

library(rio)
system.time(L <- import_list(links, format="tsv") ) # this will take a minute
sapply(seq_along(L), function(x) {
  tmp <- L[[x]]
  assign(names(L)[x], tmp)
  save(list=names(L)[x], file=paste0(DIR, "/", names(L)[x], ".Rdata"))
})

# rm(list=ls())  # commented out for user safety
load("test/clinical.Rdata")  # test a data set
clinical
#    V1  V2  V3
# 1  26  31  57
# 2  51  59 110
# 3  21  11  32
# 4  40  34  74
# 5 138 135 273

However, as noted earlier in the introduction, the data are partly a little bit unclean and you probably will have to handle them individually and adapt the code case-wise.但是,正如前面介绍中所指出的,数据部分有点不干净,您可能必须单独处理它们并逐个调整代码。

This should get you close.这应该让你接近。 It reads all the .dat files from your directory and saves them as .RData files in your directory with the appropriate names.它从您的目录中读取所有 .dat 文件,并将它们以适当的名称保存为您的目录中的 .RData 文件。 One downside is that when you open them in R they retain the "temp.file" name, so you have to rename them manually or just open them one at a time.一个缺点是,当您在 R 中打开它们时,它们会保留“temp.file”名称,因此您必须手动重命名它们或一次只打开一个。 Not sure how to get around that.不知道如何解决这个问题。

file.list <- lapply(1:length(dir()), function(x) read.delim(dir()[x], header=FALSE))
names.list <- lapply(1:length(dir()), function(x) gsub(".dat", "", dir()[x]))

for(i in 1:length(file.list)){
  temp.file <- file.list[[i]]
  temp.name <- paste(names.list[[i]], ".RData", sep="")
  save(temp.file, file=temp.name)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM