I've trained a sentiment analysis classifier using TripAdvisor's textual reviews datasets. It can predict the input textual reviews' rating based on sentiment. Everything is ok with the training and testing.
However, when I loaded the classifier in a new.ipynb file and tried to use a review for prediction, I get
NotFittedError: The TF-IDF vectorizer is not fitted** arises.
This is the detailed error:
---------------------------------------------------------------------------
NotFittedError Traceback (most recent call last)
/var/folders/rn/vqtp35xn15zd9d5scq3rxsth0000gn/T/ipykernel_71297/777236349.py in <module>
----> 1 prediction(test_str,HotelModel1000)
/var/folders/rn/vqtp35xn15zd9d5scq3rxsth0000gn/T/ipykernel_71297/1165328373.py in prediction(text, model)
4 cw = clean_string(text)
5 cw = tokenize(cw)
----> 6 cw = tfidf_vectorizer.transform([cw])
7 result = model.predict(cw)
8 print("Expected rating:",int(result))
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py in transform(self, raw_documents, copy)
1869 Tf-idf-weighted document-term matrix.
1870 """
-> 1871 check_is_fitted(self, msg='The TF-IDF vectorizer is not fitted')
1872
1873 # FIXME Remove copy parameter support in 0.24
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
71 FutureWarning)
72 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73 return f(**kwargs)
74 return inner_f
75
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
1018
1019 if not attrs:
-> 1020 raise NotFittedError(msg % {'name': type(estimator).__name__})
1021
1022
NotFittedError: The TF-IDF vectorizer is not fitted
Here is my code to predict:
HotelModel = pickle.load(open('./models/TripAdvisorHotels_SVM_Model1000(2).pickle','rb'))
test_str = input('')
prediction(test_str,HotelModel)
Here is prediction() I called:
tfidf_vectorizer = TfidfVectorizer(max_features=5000,ngram_range=(2,2))
def prediction(text,model):
cw = clean_string(text)
cw = tokenize(cw)
cw = tfidf_vectorizer.transform([cw])
result = model.predict(cw)
print("Expected rating:",int(result))
print("\nThe confidence of the prediction is:",model.predict_proba(cw)[0][int(result)-1])
As mentioned in the comment,
you have correctly loaded the trained model from pickle file.
HotelModel = pickle.load(open('./models/TripAdvisorHotels_SVM_Model1000(2).pickle','rb'))
It can do the prediction because you saved the fitted
version.
Similarly, tfidf_vectorizer
also need the fitted
version.
You have to pickle the tfidf_vectorizer
fitted version, then load from pickle to use it.
If you are using the SVM based model, keep an eye with vectorizer length for fine tuning.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.