简体   繁体   中英

Using tm package in R to clean the columns in dataframe

I am looking to use the tm package to make changes to the columns of a dataframe ie I would like to use the content_transformer, removePunctuation etc. functions to be applied on the columns of a dataframe.

For example using the below dataframe

df <- data.frame(a=c("I love TEXTMINING","Here I GO, Again!!"))

I would like to us the content_transformer to make the df$a into lower cases and removePunctuation to remove the punctuation such that the output would look like the below

                  a
1 i love textmining
2   here i go again

Is there a way to perform the above specifically using the functions in the tm package?

To use the tm package here is an example:

df <- data.frame(a=c("I love TEXTMINING","Here I GO, Again!!"))

library(tm)
corpus<-Corpus(VectorSource(df$a))
corpus<-tm_map(corpus, removeNumbers)
corpus<-tm_map(corpus, content_transformer(tolower))
#corpus<-tm_map(corpus, removeWords, stopwords('english'))
corpus<-tm_map(corpus, removePunctuation)

answer<-unlist(as.list(corpus))
answer

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM