简体   繁体   中英

Text mining using R to count frequency of words

I want to count the occurrence of the word "uncertainty" but only if "economic policy" or "legislation" or words pertaining to policies appear in the same text. Right now, I have come out with a code in R to count the frequency of all words in the text, but it does not discern if the words counted occur in the right context. Do you have any suggestions how to rectify this?

library(tm) #load text mining library
setwd('D:/3_MTICorpus') #sets R's working directory to near where my files are
ae.corpus<-Corpus(DirSource("D:/3_MTICorpus"),readerControl=list(reader=readPlain))
summary(ae.corpus) #check what went in
ae.corpus <- tm_map(ae.corpus, tolower)
ae.corpus <- tm_map(ae.corpus, removePunctuation)
ae.corpus <- tm_map(ae.corpus, removeNumbers)
myStopwords <- c(stopwords('english'), "available", "via")
ae.corpus <- tm_map(ae.corpus, removeWords, myStopwords) # this stopword file is at C:\Users\[username]\Documents\R\win-library\2.13\tm\stopwords 
#library(SnowballC)
#ae.corpus <- tm_map(ae.corpus, stemDocument)

ae.tdm <- DocumentTermMatrix(ae.corpus, control = list(minWordLength = 3))
inspect(ae.tdm)
findFreqTerms(ae.tdm, lowfreq=2)
findAssocs(ae.tdm, "economic",.7)
d<- Dictionary (c("economic", "uncertainty", "policy"))
inspect(DocumentTermMatrix(ae.corpus, list(dictionary = d)))

You can transform your term-document matrix to matrix with 0/1 values

dtm$v[dtm$v > 0] <- 1

dtm <- as.matrix(dtm)

and then you can easily use table

table(tdm[which(rownames(tdm)=='uncertainty'),], tdm[which(rownames(tdm)=='economic_policy'),])

which should produce something like this:

     0  1
  0 105  13
  1  7  5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM