[英]How to read multiple .txt files in Rstudio and make a dataframe?
I want to make a recommender system. 我想做一个推荐系统。 I have 17770 txt files which each txt is a movie metadata contains userID and rating.
我有17770个txt文件,每个txt文件都是包含用户ID和等级的电影元数据。
I have trouble to import the data into RStudio.
我无法将数据导入RStudio。
I searched lot of method to import multiple data but ended up all method won't work. 我搜索了很多方法来导入多个数据,但最终所有方法都行不通。
At least I've tried 3 codes: 至少我尝试了3个代码:
folderPath <- "D:/3rd Term/DataAnalysis/finalProject/dataSet/trainData/"
file_list <- list.files(path=folderPath, pattern="*.txt")
dataSet <-
do.call("cbind",
lapply(file_list,
function(x)
read.table(paste(folderPath, x, sep=''),
header = TRUE,
stringsAsFactors = FALSE)))
========================================================================================
setwd("D:/3rd Term/DataAnalysis/finalProject/dataSet/trainData/")
files <-list.files()
data <- 0
for (f in files) {
tempData = scan( f, what="character", sep = "")
dataSet <- cbind(data,tempData)
}
=========================================================================================
list_of_files <- list.files(path = "D:/3rd Term/DataAnalysis/finalProject/dataSet/trainData/", recursive = TRUE,
pattern = "\\.txt$",
full.names = TRUE)
DT <- rbindlist(sapply(list_of_files, fread, simplify = FALSE),
use.names = TRUE, idcol = "FileName", fill = TRUE)
I'm expecting the files will imported as a dataframe. 我期望文件将作为数据框导入。 I want to use cbind so I can combine all the txt and then make a matrix.
我想使用cbind,以便可以合并所有txt,然后创建一个矩阵。
EDIT: I forgot to mention that each txt contains userID, rating and date (which is not important) that has a comma separator/delimiter like this: 编辑:我忘了提到每个txt都包含具有逗号分隔符/定界符的userID,等级和日期(这并不重要),如下所示:
1488844,3,2005-09-06 1488844,3,2005-09-06
822109,5,2005-05-13 822109,5,2005-05-13
885013,4,2005-10-19 885013,4,2005-10-19
30878,4,2005-12-26 30878,4,2005-12-26
Assuming that the read.table
command works on every file individually: 假设
read.table
命令可分别在每个文件上运行:
folderPath <- "D:/3rd Term/DataAnalysis/finalProject/dataSet/trainData/"
file_list <- list.files(path=folderPath, pattern="*.txt", full.names = TRUE)
library(dplyr)
df <- lapply(file_list, function(file) {
read.table(file,
header = TRUE,
stringsAsFactors = FALSE))
}) %>%
bind_rows()
The example shows that the initial read.table
command can be improved as well: 该示例显示了初始的
read.table
命令也可以得到改进:
# create sample file to reproduce problem
writeLines("1488844,3,2005-09-06
822109,5,2005-05-13
885013,4,2005-10-19
30878,4,2005-12-26", "mv_00001.txt", useBytes = TRUE)
file_list <- list.files(path = ".", pattern="*.txt")
# use the same file a couple of times to make setup more realistic
file_list <- c(file_list, file_list, file_list)
# initial answer with improved read-in command
library(dplyr)
df <- lapply(file_list, function(file) {
read.csv(file,
header = FALSE,
col.names = c("userID", "rating", "date" ),
stringsAsFactors = FALSE)
}) %>%
bind_rows()
# result
df
#> userID rating date
#> 1 1488844 3 2005-09-06
#> 2 822109 5 2005-05-13
#> 3 885013 4 2005-10-19
#> 4 30878 4 2005-12-26
#> 5 1488844 3 2005-09-06
#> 6 822109 5 2005-05-13
#> 7 885013 4 2005-10-19
#> 8 30878 4 2005-12-26
#> 9 1488844 3 2005-09-06
#> 10 822109 5 2005-05-13
#> 11 885013 4 2005-10-19
#> 12 30878 4 2005-12-26
Created on 2019-09-17 by the reprex package (v0.3.0) 由reprex软件包 (v0.3.0)创建于2019-09-17
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.