简体   繁体   English

在R中拆分DocumentTermMatrix

[英]Splitting DocumentTermMatrix in R

I'm looking to create a word pair prediction function, but am having trouble working with DocumentTermMatrix to data frame or similar to use in prediction function. 我正在寻找创建单词对预测功能的方法,但是在使用DocumentTermMatrix到数据帧或在预测功能中使用类似功能时遇到了麻烦。 Here is my working code: 这是我的工作代码:

library(tm); 
BigramTokenizer <-
function(x)
    unlist(lapply(ngrams(words(x), 2), paste, collapse = " "), use.names = FALSE)

tdm_pairs <- DocumentTermMatrix(my_corpus, control = list(tokenize = BigramTokenizer))

freq_pairs <- colSums(as.matrix(tdm_pairs))

freq_pairs[100]

abandon contemporary 
               1 

I'm looking to split this and put into a dataframe, so I can use for a prediction function. 我希望将其拆分并放入数据框,以便可以用于预测功能。 I use the following: 我使用以下内容:

for (i in 1:10){
df <- rbind(df,(unlist(strsplit(as.character(freq_pairs)[i]," "))[1]))
}

The output is all 1's. 输出为全1。 I would like the output to be: 我希望输出为:

 "abandon" "contemporary" "1"

You could use the following code to get a data frame. 您可以使用以下代码来获取数据帧。 Advantage is that freq_pairs stays a number and no need of a loop. 优点是freq_pairs可以保留数字,并且不需要循环。

df <- strsplit(names(freq_pairs), " ") 
df <- as.data.frame(matrix(unlist(df), 
                           ncol = 2, 
                           byrow = TRUE, 
                           dimnames = list(1:length(df), c("word1", "word2"))), 
                    stringsAsFactors = FALSE)
df <- cbind(df, freq_pairs)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM