简体   繁体   中英

How do you combine multiple documents into a single document with topicmodels in r?

I am currently trying to combine multiple documents of a corpus into a single document using the topicmodels package. I initially imported my data through multiple csvs, each with multiple lines of text. When I import each csv, however, each line of the csv is treated as a document, and each csv is treated as a corpus. What I would like to do is merge each of the documents/lines for each csv into a single document, and then each of the csvs would represent one document in my corpus. I'm not sure if this possible--perhaps it would be easier to somehow read in all of the lines of the csv as a single text file when initially importing and then create the docs and corpus, but I don't know how to do that either. Below is the code that I have used to import my csvs:

file <- read.csv("file.csv")
fileCorp <- VCorpus(VectorSource(file$text))

The rows in the csv look something like this (where each / represents a line break): 'I walked' / 'the dog' / 'at the' / 'park last night'

I would like to combine each of those lines into a single line of text that will serve as a single document in my corpus.

Thanks for the help!

Your task can be accomplished with these steps:

file1 <- data.frame(text = c('I walked','the dog','at the','park last night'))
file2 <- data.frame(text = c('He walked','the cat','at the','yesterday'))

data.frame(id = c(1, 2), 
           text = c(paste(file1$text, collapse = " "),
                    paste(file2$text, collapse = " ")))

  id                                    text
1  1 I walked the dog at the park last night
2  2      He walked the cat at the yesterday

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM