在R中拆分DocumentTermMatrix

Question

I'm looking to create a word pair prediction function, but am having trouble working with DocumentTermMatrix to data frame or similar to use in prediction function. 我正在寻找创建单词对预测功能的方法，但是在使用DocumentTermMatrix到数据帧或在预测功能中使用类似功能时遇到了麻烦。 Here is my working code: 这是我的工作代码：

library(tm); 
BigramTokenizer <-
function(x)
    unlist(lapply(ngrams(words(x), 2), paste, collapse = " "), use.names = FALSE)

tdm_pairs <- DocumentTermMatrix(my_corpus, control = list(tokenize = BigramTokenizer))

freq_pairs <- colSums(as.matrix(tdm_pairs))

freq_pairs[100]

abandon contemporary 
               1

I'm looking to split this and put into a dataframe, so I can use for a prediction function. 我希望将其拆分并放入数据框，以便可以用于预测功能。 I use the following: 我使用以下内容：

for (i in 1:10){
df <- rbind(df,(unlist(strsplit(as.character(freq_pairs)[i]," "))[1]))
}

The output is all 1's. 输出为全1。 I would like the output to be: 我希望输出为：

 "abandon" "contemporary" "1"

Answer 1

You could use the following code to get a data frame. 您可以使用以下代码来获取数据帧。 Advantage is that freq_pairs stays a number and no need of a loop. 优点是freq_pairs可以保留数字，并且不需要循环。

df <- strsplit(names(freq_pairs), " ") 
df <- as.data.frame(matrix(unlist(df), 
                           ncol = 2, 
                           byrow = TRUE, 
                           dimnames = list(1:length(df), c("word1", "word2"))), 
                    stringsAsFactors = FALSE)
df <- cbind(df, freq_pairs)

在R中拆分DocumentTermMatrix

问题描述

1 个解决方案

解决方案1
0 2015-07-26 13:32:38

在R中拆分DocumentTermMatrix

问题描述

1 个解决方案

解决方案1 0 2015-07-26 13:32:38

解决方案1
0 2015-07-26 13:32:38