在R中使用DocumenttermMatrix函数时出错

Question

I have taken generic text of 1000 rows and performed the below in the process of textmining. 我采用了1000行的通用文本，并在文本挖掘过程中执行了以下操作。 While using the document Term matrix I am not getting the count of words as output in the matrix. 使用文档术语矩阵时，我没有得到矩阵中输出的单词数。

>def<-read.csv("Defect.csv",header = T)
>docs<-Corpus(VectorSource(def$Summary))
>docs<-tm_map(docs,content_transformer(tolower))
>docs<-tm_map(docs,removeNumbers)
>docs<-tm_map(docs,removeWords,stopwords("english"))
>docs<-tm_map(docs,removePunctuation)
>docs<-tm_map(docs,stripWhitespace)
>docs<-tm_map(docs,stemDocument,language = "english")

>docs[[1]]$content
[1] "access logout access employe separ modul"

>dtm<-DocumentTermMatrix(docs)
>data.matrix(dtm)

Below is the output I got for DTM 以下是我为DTM获得的输出

Terms Docs access logout modul separ approv button click display error 条款文档访问登出模块化分隔批准按钮单击显示错误

I am not getting the word count in a matrix. 我没有得到矩阵中的字数统计。 Not sure of what could be the error here. 不知道这里可能是什么错误。

Answer 1

def<-read.csv("Defect.csv",header = T)
docs<-Corpus(VectorSource(def$Summary))
docs<-tm_map(docs,content_transformer(tolower))
docs<-tm_map(docs,removeNumbers)
docs<-tm_map(docs,removeWords,stopwords("english"))
docs<-tm_map(docs,removePunctuation)
docs<-tm_map(docs,stripWhitespace)
docs<-tm_map(docs,stemDocument,language = "english")

Note : use TermDocumentMatrix over DocumentTermMatrix 注意：在DocumentTermMatrix使用TermDocumentMatrix

dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
rownames(d) <- NULL

Now, your dataframe should look like.. 现在，您的数据框应该看起来像..

> head(d,10)
        word freq
1       file  157
2       data  151
3  incorrect  136
4     target  120
5       issu   95
6       tabl   82
7      sourc   69
8     column   63
9        get   61
10   process   56

在R中使用DocumenttermMatrix函数时出错

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-09-15 09:58:23

在R中使用DocumenttermMatrix函数时出错

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-09-15 09:58:23

解决方案1
1 已采纳 2017-09-15 09:58:23