简体   繁体   中英

Reading multiple .dat files as a list and saving as .RDATA files in R

I want to import multiple .DAT files from a directory and make them as a list elements and then save them as .RDATA files.

I tried the following code

files <- dir(pattern = "*.DAT")
library(tidyverse)
Data1 <- 
  files %>%
    map(~ read.table(file = ., fill = TRUE))

which works sometimes and fails others. The files are also available on this link . I want to read all files and them save them as .RDATA with the same names.

Since the data of the link are partly a little bit unclean, I show you the solution of the core problem of your question on the basis of this example data:

(name1 <- name2 <- name3 <- name4 <- name5 <- data.frame(matrix(1:12, 3, 4)))
#   X1 X2 X3 X4
# 1  1  4  7 10
# 2  2  5  8 11
# 3  3  6  9 12

We save the data into a sub directory of your working directory named "test" .

l <- mget(ls(pattern="^name"))
DIR <- "test"
# dir.create(DIR)  # leave out if dir already exists
sapply(1:length(l), function(x) 
  write.table(l[[x]], file=paste0(DIR, "/", names(l)[x], ".dat"), row.names=FALSE))

Now we look what's inside "test" .

dir(DIR)
# [1] "name1.dat" "name2.dat" "name3.dat" "name4.dat" "name5.dat"

Now we import the files in the directory by pattern. I use rio::import_list , which nicely imports the files into a list an uses data.table::fread inside. But your own code also would work fine.

# rm(list=ls())  # commented out for user safety
L <- rio::import_list(paste0(DIR, "/", dir(DIR, pattern="\\.dat$")), format="tsv")

To save them as .Rdata we want to assign names dynamically which we achive with the list option within save() .

sapply(seq_along(L), function(x) {
  tmp <- L[[x]]
  assign(names(L)[x], tmp)
  save(list=names(L)[x], file=paste0(DIR, "/", names(L)[x], ".Rdata"))
})

When we list the directory we see that the data was created.

dir(DIR)
# [1] "name1.dat"   "name1.Rdata" "name2.dat"   "name2.Rdata" "name3.dat"   "name3.Rdata"
# [7] "name4.dat"   "name4.Rdata" "name5.dat"   "name5.Rdata"

Now let's look whether the object names also were created correctly:

# rm(list=ls())  # commented out for user safety
load("test/name1.Rdata")
ls()
# [1] "name1"
name1
#   X1 X2 X3 X4
# 1  1  4  7 10
# 2  2  5  8 11
# 3  3  6  9 12

Which is the case.

Add-on option

We alternatively could attempt a more direct approach using rvest . First we fetch the data names:

library(rvest)
dat.names <- html_attr(html_nodes(read_html(
  "https://www2.stat.duke.edu/courses/Spring03/sta113/Data/Hand/Hand.html"),
  "a"), "href")

and create individual links:

links <- as.character(sapply(dat.names, function(x)
  paste0("https://www2.stat.duke.edu/courses/Spring03/sta113/Data/Hand/", x)))

The remainder is basically the same as above:

DIR <- "test"
# dir.create(DIR)  # leave out if dir already exists

library(rio)
system.time(L <- import_list(links, format="tsv") ) # this will take a minute
sapply(seq_along(L), function(x) {
  tmp <- L[[x]]
  assign(names(L)[x], tmp)
  save(list=names(L)[x], file=paste0(DIR, "/", names(L)[x], ".Rdata"))
})

# rm(list=ls())  # commented out for user safety
load("test/clinical.Rdata")  # test a data set
clinical
#    V1  V2  V3
# 1  26  31  57
# 2  51  59 110
# 3  21  11  32
# 4  40  34  74
# 5 138 135 273

However, as noted earlier in the introduction, the data are partly a little bit unclean and you probably will have to handle them individually and adapt the code case-wise.

This should get you close. It reads all the .dat files from your directory and saves them as .RData files in your directory with the appropriate names. One downside is that when you open them in R they retain the "temp.file" name, so you have to rename them manually or just open them one at a time. Not sure how to get around that.

file.list <- lapply(1:length(dir()), function(x) read.delim(dir()[x], header=FALSE))
names.list <- lapply(1:length(dir()), function(x) gsub(".dat", "", dir()[x]))

for(i in 1:length(file.list)){
  temp.file <- file.list[[i]]
  temp.name <- paste(names.list[[i]], ".RData", sep="")
  save(temp.file, file=temp.name)
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM