简体   繁体   English

如何将列表转换为R中的语料库?

[英]How transform a list into a corpus in r?

In this question I asked how to split a huge dataframe to create a corpus. 在这个问题中,我问如何分割巨大的数据框以创建语料库。 Thanks to the answer I was able to create a list from a dataframe. 多亏了答案,我才能够从数据框创建一个列表。 My problem was still obtaining a corpus from the list I created in order to do some text mining and cluster the data according to the search term. 我的问题仍然是从我创建的列表中获得一个语料库 ,以便进行一些文本挖掘并根据搜索词对数据进行聚类。

To solve this problem I just applied the as.VCorpus function of the tm package to the list I created before: 为了解决这个问题,我将tm包的as.VCorpus函数应用于我之前创建的列表:

new_corpus <- as.VCorpus(new_list)

Check if the new object is a corpus: 检查新对象是否为语料库:

class(new_corpus)
[1] "VCorpus" "Corpus" 

I thus created a "volatile corpus". 因此,我创建了一个“易失性语料库”。 As written in the R documentation: 如R文档中所述:

A volatile corpus is fully kept in memory and thus all changes only affect the corresponding R object. 易失语料库完全保留在内存中,因此所有更改仅影响相应的R对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM