简体   繁体   中英

ValueError: dimension mismatch While Predicting New Values Sentiment Analysis

I am relatively new to the machine learning subject. I am trying to do sentiment analysis prediction.

Type column includes the sentiment of the tweet(pos, neg or neutral as 0,1 and 2). Tweet column includes the tweets.

I am trying to predict new set of tweets's sentiments as 0,1 and 2.

When I wrote the code given here I got dimension mismatch error.

import pandas as pd
train_tweets = pd.read_csv("tweets_type.csv")
from sklearn.model_selection import train_test_split

y = train_tweets.Type
X= train_tweets.Tweet

train_X, test_X, train_y, test_y = train_test_split(X, y, random_state=1)

from sklearn.feature_extraction.text import CountVectorizer

vect = CountVectorizer()

vect.fit(train_X)
train_X_dtm = vect.transform(train_X)

test_X_dtm = vect.transform(test_X)
test_X_dtm

from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()

%time nb.fit(train_X_dtm, train_y)

# make class predictions for X_test_dtm
y_pred_class = nb.predict(test_X_dtm)

# calculate accuracy of class predictions
from sklearn import metrics
from sklearn.metrics import classification_report, confusion_matrix
metrics.accuracy_score(test_y, y_pred_class)

march_tweets = pd.read_csv("march_data.csv")
X=march_tweets.Tweet
vect.fit(X)
train_new_dtm = vect.transform(X)

new_pred_class = nb.predict(train_new_dtm)

The error I am getting is here:

在此处输入图片说明

Would be so glad if you could help me.

It seems I made a mistake fitting X after I already fitted train_X. I found out there is no use of doing that repeatedly once you the model is fitted. So what I did is I removed this line and it worked perfectly.

vect.fit(X)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM