简体   繁体   中英

Neural Networks handling character data

I am doing a basic experiment in sentiment analysis using the neural networks package library(neuralnet)

the structure of my data is as follows:
'data.frame':   4442 obs. of  2 variables:
 $ comment_text: chr  "really briliant app\tit's intuitive and informative giving all the information you could need and seemingly very accurate." "will not connect to gps\tapp does not connect to gps no matter how long i have it on. i have gps set on high ac"| __truncated__ "wish this would interest more with google now to provide weekly or monthly summaries." "useless\tdoes not talk to gps on the phone. 20 minute run no data." ...
 $ rating      : int  5 1 5 1 4 5 4 3 4 5 ...

I am diving this data into training and testing part and running neural network prediction like this:

senti_train <- nnsenti[1:3499, ]
senti_test <- nnsenti[3500:4443, ]
library(neuralnet)
neuralmodel <- neuralnet(rating ~ comment_text, data=senti_train)
plot(neuralmodel)

after running this it gives me this error

Error in neurons[[i]] %*% weights[[i]] : 
requires numeric/complex matrix/vector arguments

How to resolve this as text is the important part

I have tokenized the text data, done some text cleaning using the tm package and updated my code as follows:

nnsenti$comment_text <- VCorpus(VectorSource(nnsenti$comment_text))


#Text Cleaning
nnsenti$comment_text <- tm_map(nnsenti$comment_text,content_transformer(tolower))
nnsenti$comment_text <- tm_map(nnsenti$comment_text, removeNumbers)
nnsenti$comment_text <- tm_map(nnsenti$comment_text, removePunctuation)
nnsenti$comment_text <- tm_map(nnsenti$comment_text, removeWords,stopwords('english'))
nnsenti$comment_text <- tm_map(nnsenti$comment_text, removeWords,c('please','sad')) #Additional words
nnsenti$comment_text <- tm_map(nnsenti$comment_text, stripWhitespace)
senti_train <- nnsenti[1:3499, ]
senti_test <- nnsenti[3500:4443, ]

library(neuralnet)
neuralmodel <- neuralnet(rating ~ comment_text, data=senti_train)

Now I get this error

Error in model.frame.default(formula.reverse, data) : 
  invalid type (list) for variable 'comment_text'

It seems like you are not standarizing your data. Your data should at least be fed in numerically into a neural network, and even better inbetween a certain range (mostly -1,1 or 0,1 ).

You can normalize text using one-hot encoding . Normalize values (like ratings) by dividing them by some maximal value. Maximal rating = 10, so divide all ratings by 10.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM