简体   繁体   中英

How to create a large .csv file from 4 folders containing 100 files each, using an R code?

How to create a large .csv file from 4 folders containing 100 files each, using an R code? The individual files in the 4 folders are not .csv files but normal files. I don't know how to create this and can't find any answers online, for reference my large folder Newsgroups(D:/Newsgroups) contains 4 folders (D:/Newsgroups/1), ..., (D:/Newsgroups/4). Within each of the 4 folders, there are 100 files (D:/Newsgroups/1/100). My main goal with the final .csv file is to create a bag of words.

So here's an attempt of giving you a solution. I made a few assumptions of the files based on what you said.

This part of the code generates the folders and files for the example. The folders are called A, B, C and D. Inside them, each has 100 files. Those files are just named with a number. Inside each file, I've added 100 random words.

#-- This code will make the folders for the minimal, reproducible example
if(!require(OpenRepGrid)) install.packages(OpenRepGrid)
folders <- LETTERS[1:4]
for(folder in folders) {
    dir.create(folder)
    for (file in 1:100) {
        words <- randomWords(100)
        sentence <- paste(words, collapse = " ")
        write.table(sentence, file = paste0(folder, "/", file), row.names = F, col.names = F, quote = F)
    }
}

The second part is a solution to read the files. Here, I'm assuming you want the file for each folder in a separate column. This simplifies the 2 loops into two sapply's. I've used sapply instead of for because of convenience. ? sapply ? sapply will give you some more examples of its use.

#-- This code reads the files into a single table
folders2read <- c("A", "B", "C", "D")

table <- sapply(folders2read, function(folder) {
    sapply(files2read, function(file) {
        fpath <- paste0(folder, "/", file)
        words <- read.table(fpath, stringsAsFactors = FALSE)
        paste(words, collapse = " ")
    })
})
write.csv(table, file = "all_words.csv")

If you'd like a single column with all the files, you can simply do this:

#-- Make a tidy table
if(!require(reshape2)) install.packages(reshape2)
table_tidy <- melt(table, varnames = c("file", "folder"), value.name = "text")
write.csv(table_tidy, file = "all_words_tidy.csv")

It will create a 'tidy' table where each text is in a row and you have the 'folder' and 'file' where it came from.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM