简体   繁体   English

如何将多个文档与 r 中的主题模型合并为一个文档?

[英]How do you combine multiple documents into a single document with topicmodels in r?

I am currently trying to combine multiple documents of a corpus into a single document using the topicmodels package.我目前正在尝试使用 topicmodels 包将一个语料库的多个文档合并为一个文档。 I initially imported my data through multiple csvs, each with multiple lines of text.我最初通过多个 csv 导入我的数据,每个 csv 都有多行文本。 When I import each csv, however, each line of the csv is treated as a document, and each csv is treated as a corpus.但是,当我导入每个 csv 时,csv 的每一行都被视为一个文档,而每个 csv 都被视为一个语料库。 What I would like to do is merge each of the documents/lines for each csv into a single document, and then each of the csvs would represent one document in my corpus.我想做的是将每个 csv 的每个文档/行合并到一个文档中,然后每个 csv 将代表我的语料库中的一个文档。 I'm not sure if this possible--perhaps it would be easier to somehow read in all of the lines of the csv as a single text file when initially importing and then create the docs and corpus, but I don't know how to do that either.我不确定这是否可能——也许在最初导入然后创建文档和语料库时,以某种方式将 csv 的所有行作为单个文本文件读取会更容易,但我不知道如何要么这样做。 Below is the code that I have used to import my csvs:下面是我用来导入我的 csvs 的代码:

file <- read.csv("file.csv")
fileCorp <- VCorpus(VectorSource(file$text))

The rows in the csv look something like this (where each / represents a line break): 'I walked' / 'the dog' / 'at the' / 'park last night' csv 中的行看起来像这样(每个 / 代表一个换行符):“我走了”/“狗”/“在”/“昨晚公园”

I would like to combine each of those lines into a single line of text that will serve as a single document in my corpus.我想将这些行中的每一行组合成一行文本,作为我语料库中的单个文档。

Thanks for the help!谢谢您的帮助!

Your task can be accomplished with these steps:您可以通过以下步骤完成您的任务:

file1 <- data.frame(text = c('I walked','the dog','at the','park last night'))
file2 <- data.frame(text = c('He walked','the cat','at the','yesterday'))

data.frame(id = c(1, 2), 
           text = c(paste(file1$text, collapse = " "),
                    paste(file2$text, collapse = " ")))

  id                                    text
1  1 I walked the dog at the park last night
2  2      He walked the cat at the yesterday

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM