简体   繁体   中英

Convert TDM CSV file into Corpus Format in Text Mining

I am using tm package for text mining in R. I performed following steps:

Import the data in R system and Creating Text Corpus

dataorg <- read.csv("Report_2014.csv")
corpus <- Corpus(VectorSource(data$Resolution))

Clean the data

mystopwords <- c("through","might","much","had","got","with","these")

cleanset <- tm_map(corpus, removeWords, mystopwords)
cleanset <- tm_map(cleanset, tolower)
cleanset <- tm_map(cleanset, removePunctuation)
cleanset <- tm_map(cleanset, removeNumbers)

Creating Term Document Matrix

tdm <- TermDocumentMatrix(cleanset)

At this point I export the TDM data into csv in order to perform some manual cleansing of the terms

write.csv(inspect(tdm), file="tdmfile.csv")

Now the problem is that I want to bring back the cleaned tdm csv file into R system and perform further text analysis like clustering, frequency analysis. But I am not able to convert the csv file back into corpus format acceptable by tm package algorithms so I am not able to proceed further with my text analysis.

It would be really helpful if somebody can help me out to convert cleaned csv file into corpus format which is acceptable by text analysis functions of tm package.

First read the csv back into R

df<-read.csv("tdmfile.csv")

Then convert the vector (referenced by the column name) into a corpus

corpus<-Corpus(VectorSource(df$column))

If the above doesn't work, try converting the df into utf-8 before the corpus

convert <- iconv(df,to="utf-8-mac")

you are using keyword Dataorg...but i did n't see anywhere you are mentioning it in your code.... if you want convert your csv file into Corpus Format just fellow this link
R text mining documents from CSV file (one row per doc)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM