[英]Splitting DocumentTermMatrix in R
I'm looking to create a word pair prediction function, but am having trouble working with DocumentTermMatrix to data frame or similar to use in prediction function. 我正在寻找创建单词对预测功能的方法,但是在使用DocumentTermMatrix到数据帧或在预测功能中使用类似功能时遇到了麻烦。 Here is my working code:
这是我的工作代码:
library(tm);
BigramTokenizer <-
function(x)
unlist(lapply(ngrams(words(x), 2), paste, collapse = " "), use.names = FALSE)
tdm_pairs <- DocumentTermMatrix(my_corpus, control = list(tokenize = BigramTokenizer))
freq_pairs <- colSums(as.matrix(tdm_pairs))
freq_pairs[100]
abandon contemporary
1
I'm looking to split this and put into a dataframe, so I can use for a prediction function. 我希望将其拆分并放入数据框,以便可以用于预测功能。 I use the following:
我使用以下内容:
for (i in 1:10){
df <- rbind(df,(unlist(strsplit(as.character(freq_pairs)[i]," "))[1]))
}
The output is all 1's. 输出为全1。 I would like the output to be:
我希望输出为:
"abandon" "contemporary" "1"
You could use the following code to get a data frame. 您可以使用以下代码来获取数据帧。 Advantage is that freq_pairs stays a number and no need of a loop.
优点是freq_pairs可以保留数字,并且不需要循环。
df <- strsplit(names(freq_pairs), " ")
df <- as.data.frame(matrix(unlist(df),
ncol = 2,
byrow = TRUE,
dimnames = list(1:length(df), c("word1", "word2"))),
stringsAsFactors = FALSE)
df <- cbind(df, freq_pairs)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.