简体   繁体   English

如何在Rstudio中读取多个.txt文件并制作一个数据框?

[英]How to read multiple .txt files in Rstudio and make a dataframe?

I want to make a recommender system. 我想做一个推荐系统。 I have 17770 txt files which each txt is a movie metadata contains userID and rating. 我有17770个txt文件,每个txt文件都是包含用户ID和等级的电影元数据。

Dataset_Capture I have trouble to import the data into RStudio. 我无法将数据导入RStudio。

I searched lot of method to import multiple data but ended up all method won't work. 我搜索了很多方法来导入多个数据,但最终所有方法都行不通。

At least I've tried 3 codes: 至少我尝试了3个代码:

folderPath <- "D:/3rd Term/DataAnalysis/finalProject/dataSet/trainData/"
file_list <- list.files(path=folderPath, pattern="*.txt")    
dataSet <- 
  do.call("cbind", 
          lapply(file_list, 
                 function(x) 
                   read.table(paste(folderPath, x, sep=''), 
                              header = TRUE, 
                              stringsAsFactors = FALSE)))

========================================================================================
setwd("D:/3rd Term/DataAnalysis/finalProject/dataSet/trainData/")
files <-list.files()
data <- 0
for (f in files) {

  tempData = scan( f, what="character", sep = "")

  dataSet <- cbind(data,tempData)

} 

=========================================================================================

list_of_files <- list.files(path = "D:/3rd Term/DataAnalysis/finalProject/dataSet/trainData/", recursive = TRUE,
                            pattern = "\\.txt$", 
                            full.names = TRUE)

DT <- rbindlist(sapply(list_of_files, fread, simplify = FALSE),
                use.names = TRUE, idcol = "FileName", fill = TRUE)

I'm expecting the files will imported as a dataframe. 我期望文件将作为数据框导入。 I want to use cbind so I can combine all the txt and then make a matrix. 我想使用cbind,以便可以合并所有txt,然后创建一个矩阵。

EDIT: I forgot to mention that each txt contains userID, rating and date (which is not important) that has a comma separator/delimiter like this: 编辑:我忘了提到每个txt都包含具有逗号分隔符/定界符的userID,等级和日期(这并不重要),如下所示:

1488844,3,2005-09-06 1488844,3,2005-09-06
822109,5,2005-05-13 822109,5,2005-05-13
885013,4,2005-10-19 885013,4,2005-10-19
30878,4,2005-12-26 30878,4,2005-12-26

Assuming that the read.table command works on every file individually: 假设read.table命令可分别在每个文件上运行:

folderPath <- "D:/3rd Term/DataAnalysis/finalProject/dataSet/trainData/"
file_list <- list.files(path=folderPath, pattern="*.txt", full.names = TRUE) 
library(dplyr)
df <- lapply(file_list, function(file) {
  read.table(file, 
             header = TRUE, 
             stringsAsFactors = FALSE))
}) %>% 
  bind_rows()

As per the updated question here is a minimal reproducible example 根据更新的问题,这里是一个最小的可复制示例

The example shows that the initial read.table command can be improved as well: 该示例显示了初始的read.table命令也可以得到改进:

# create sample file to reproduce problem
writeLines("1488844,3,2005-09-06
822109,5,2005-05-13
885013,4,2005-10-19
30878,4,2005-12-26", "mv_00001.txt", useBytes = TRUE)

file_list <- list.files(path = ".", pattern="*.txt") 

# use the same file a couple of times to make setup more realistic
file_list <- c(file_list, file_list, file_list)

# initial answer with improved read-in command
library(dplyr)
df <- lapply(file_list, function(file) {
  read.csv(file, 
           header = FALSE, 
           col.names = c("userID", "rating", "date" ),
           stringsAsFactors = FALSE)
}) %>% 
  bind_rows()

# result
df
#>     userID rating       date
#> 1  1488844      3 2005-09-06
#> 2   822109      5 2005-05-13
#> 3   885013      4 2005-10-19
#> 4    30878      4 2005-12-26
#> 5  1488844      3 2005-09-06
#> 6   822109      5 2005-05-13
#> 7   885013      4 2005-10-19
#> 8    30878      4 2005-12-26
#> 9  1488844      3 2005-09-06
#> 10  822109      5 2005-05-13
#> 11  885013      4 2005-10-19
#> 12   30878      4 2005-12-26

Created on 2019-09-17 by the reprex package (v0.3.0) reprex软件包 (v0.3.0)创建于2019-09-17

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在rstudio中读取多个文件并将其合并为一个dataframe? - How to read and combine multiple files into one dataframe in rstudio? 运行循环以在 Rstudio 中编辑多个 txt 文件 - Running a loop for editing multiple txt files in Rstudio 如何在 R 中将具有两个制表符分隔列的 multiple.txt 文件读取到一个 dataframe 中? - How can I read multiple .txt files with two tab separated columns into one dataframe in R? 如何为多个 excel 文件创建循环以在 RStudio 中清理数据? - How to make a loop for multiple excel files to clean data in RStudio? 如何在 R 中读取具有不同列数的多个 txt 文件 - How to read in multiple txt files in R with differing number of columns 如何在R / Rstudio中逐行读取txt文件? - How to read a txt file line by line in R/Rstudio? 在Rstudio中读取xlsx文件 - read xlsx files in Rstudio Read Multiple txt files in an order and combine them into one dataframe but label the origin of each row in the new generated dataframe in r - Read Multiple txt files in an order and combine them into one dataframe but label the origin of each row in the new generated dataframe in r 如何在 RStudio 中重命名多个 .RDS 文件 - How to rename multiple .RDS files in RStudio RStudio如何快速导出多个文件 - How to quickly export multiple files from RStudio
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM