r 读入多个 .dat 文件

Question

Hi I am new here and a beginner in R,嗨，我是新来的，也是 R 的初学者，

My problem: in the case i have more than one file (test1.dat, test2.dat,...) to work with in R i use this code to read them in我的问题：如果我在 R 中有多个文件（test1.dat、test2.dat...）可以使用，我会使用此代码读取它们

filelist <- list.files(pattern = "*.dat")

df_list <- lapply(filelist, function(x) read.table(x, header = FALSE, sep = ","
                                               ,colClasses = "factor", comment.char = "", 
                                               col.names = "raw"))

Now i have the problem that my data is big, i found a solution to speed things up using the sqldf-package :现在我遇到了数据很大的问题，我找到了一个使用 sqldf-package 加快速度的解决方案：

sql <- file("test2.dat")
df <- sqldf("select * from sql", dbname = tempfile(),
                    file.format = list(header = FALSE, row.names = FALSE, colClasses = "factor", 
                                       comment.char = "", col.names ="raw"))

it is working well for one file but i am not able to change the code to read-in multiple files like in the first code snippet.它适用于一个文件，但我无法像第一个代码片段那样更改代码以读入多个文件。 can someone help me?有人能帮我吗？ Thank you!谢谢！ Momo沫沫

Answer 1

This seems to work (but i assume there is a quicker sql way to this)这似乎有效（但我认为有一种更快的sql方法）

sql.l <- lapply(filelist , file)

df_list2 <- lapply(sql.l, function(i) sqldf("select * from i" ,  
    dbname = tempfile(),  file.format = list(header = TRUE, row.names = FALSE)))

Look at speeds - partially taken from mnel's post Quickly reading very large tables as dataframes in R查看速度 - 部分摘自 mnel 的帖子Quickly reading very large tables as dataframes in R

library(data.table)
library(sqldf)

# test data
n=1e6
DT = data.table( a=sample(1:1000,n,replace=TRUE),
                 b=sample(1:1000,n,replace=TRUE),
                 c=rnorm(n),
                 d=sample(c("foo","bar","baz","qux","quux"),n,replace=TRUE),
                 e=rnorm(n),
                 f=sample(1:1000,n,replace=TRUE) )

# write 5 files out
lapply(1:5, function(i) write.table(DT,paste0("test", i, ".dat"), 
                                 sep=",",row.names=FALSE,quote=FALSE))

read: data.table读取： data.table

filelist <- list.files(pattern = "*.dat")

system.time(df_list <- lapply(filelist, fread))

#  user  system elapsed 
# 5.244   0.200   5.457

read: sqldf阅读： sqldf

sql.l <- lapply(filelist , file)

 system.time(df_list2 <- lapply(sql.l, function(i) sqldf("select * from i" ,  
   dbname = tempfile(),  file.format = list(header = TRUE, row.names = FALSE))))

#    user  system elapsed 
#  35.594   1.432  37.357

Check - seems ok except for attributes检查 - 除属性外似乎没问题

all.equal(df_list , df_list2)

Answer 2

Somehow the lappy() doesn't work for me.不知何故，lappy() 对我不起作用。

map_df() works for me in combining 7000+ .dat files. map_df() 在组合 7000 多个 .dat 文件时对我有用。 Also skipped the 1st row of each file and filter the column "V1"还跳过了每个文件的第一行并过滤了“V1”列

rawDATfile.list <- list.files(pattern="*.DAT")

data <- rawDATfile.list%>%
  map_dfr(~read.delim(.x, header = FALSE, sep=";", skip=1, quote = "\"'")%>%
            mutate_all(as.character))%>%
  filter(V1=="B")

r 读入多个 .dat 文件

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-04-24 15:29:14

解决方案2
0 2020-10-10 03:13:51

r 读入多个 .dat 文件

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-04-24 15:29:14

解决方案2 0 2020-10-10 03:13:51

解决方案1
1 已采纳 2014-04-24 15:29:14

解决方案2
0 2020-10-10 03:13:51