如何在Rstudio中读取多个.txt文件并制作一个数据框？

Question

I want to make a recommender system. 我想做一个推荐系统。 I have 17770 txt files which each txt is a movie metadata contains userID and rating. 我有17770个txt文件，每个txt文件都是包含用户ID和等级的电影元数据。

I have trouble to import the data into RStudio. 我无法将数据导入RStudio。

I searched lot of method to import multiple data but ended up all method won't work. 我搜索了很多方法来导入多个数据，但最终所有方法都行不通。

At least I've tried 3 codes: 至少我尝试了3个代码：

folderPath <- "D:/3rd Term/DataAnalysis/finalProject/dataSet/trainData/"
file_list <- list.files(path=folderPath, pattern="*.txt")    
dataSet <- 
  do.call("cbind", 
          lapply(file_list, 
                 function(x) 
                   read.table(paste(folderPath, x, sep=''), 
                              header = TRUE, 
                              stringsAsFactors = FALSE)))

========================================================================================
setwd("D:/3rd Term/DataAnalysis/finalProject/dataSet/trainData/")
files <-list.files()
data <- 0
for (f in files) {

  tempData = scan( f, what="character", sep = "")

  dataSet <- cbind(data,tempData)

} 

=========================================================================================

list_of_files <- list.files(path = "D:/3rd Term/DataAnalysis/finalProject/dataSet/trainData/", recursive = TRUE,
                            pattern = "\\.txt$", 
                            full.names = TRUE)

DT <- rbindlist(sapply(list_of_files, fread, simplify = FALSE),
                use.names = TRUE, idcol = "FileName", fill = TRUE)

I'm expecting the files will imported as a dataframe. 我期望文件将作为数据框导入。 I want to use cbind so I can combine all the txt and then make a matrix. 我想使用cbind，以便可以合并所有txt，然后创建一个矩阵。

EDIT: I forgot to mention that each txt contains userID, rating and date (which is not important) that has a comma separator/delimiter like this: 编辑：我忘了提到每个txt都包含具有逗号分隔符/定界符的userID，等级和日期（这并不重要），如下所示：

1488844,3,2005-09-06 1488844,3,2005-09-06
822109,5,2005-05-13 822109,5,2005-05-13
885013,4,2005-10-19 885013,4,2005-10-19
30878,4,2005-12-26 30878,4,2005-12-26

Answer 1

Assuming that the read.table command works on every file individually: 假设read.table命令可分别在每个文件上运行：

folderPath <- "D:/3rd Term/DataAnalysis/finalProject/dataSet/trainData/"
file_list <- list.files(path=folderPath, pattern="*.txt", full.names = TRUE) 
library(dplyr)
df <- lapply(file_list, function(file) {
  read.table(file, 
             header = TRUE, 
             stringsAsFactors = FALSE))
}) %>% 
  bind_rows()

As per the updated question here is a minimal reproducible example 根据更新的问题，这里是一个最小的可复制示例

The example shows that the initial read.table command can be improved as well: 该示例显示了初始的read.table命令也可以得到改进：

# create sample file to reproduce problem
writeLines("1488844,3,2005-09-06
822109,5,2005-05-13
885013,4,2005-10-19
30878,4,2005-12-26", "mv_00001.txt", useBytes = TRUE)

file_list <- list.files(path = ".", pattern="*.txt") 

# use the same file a couple of times to make setup more realistic
file_list <- c(file_list, file_list, file_list)

# initial answer with improved read-in command
library(dplyr)
df <- lapply(file_list, function(file) {
  read.csv(file, 
           header = FALSE, 
           col.names = c("userID", "rating", "date" ),
           stringsAsFactors = FALSE)
}) %>% 
  bind_rows()

# result
df
#>     userID rating       date
#> 1  1488844      3 2005-09-06
#> 2   822109      5 2005-05-13
#> 3   885013      4 2005-10-19
#> 4    30878      4 2005-12-26
#> 5  1488844      3 2005-09-06
#> 6   822109      5 2005-05-13
#> 7   885013      4 2005-10-19
#> 8    30878      4 2005-12-26
#> 9  1488844      3 2005-09-06
#> 10  822109      5 2005-05-13
#> 11  885013      4 2005-10-19
#> 12   30878      4 2005-12-26

^{Created on 2019-09-17 by the reprex package (v0.3.0)} ^{由reprex软件包（v0.3.0）创建于2019-09-17}

如何在Rstudio中读取多个.txt文件并制作一个数据框？

问题描述

1 个解决方案

解决方案1
0 2019-09-16 12:43:01

As per the updated question here is a minimal reproducible example 根据更新的问题，这里是一个最小的可复制示例

如何在Rstudio中读取多个.txt文件并制作一个数据框？

问题描述

1 个解决方案

解决方案1 0 2019-09-16 12:43:01

As per the updated question here is a minimal reproducible example 根据更新的问题，这里是一个最小的可复制示例

解决方案1
0 2019-09-16 12:43:01