简体   繁体   中英

NotFittedError: The TF-IDF vectorizer is not fitted

I've trained a sentiment analysis classifier using TripAdvisor's textual reviews datasets. It can predict the input textual reviews' rating based on sentiment. Everything is ok with the training and testing.

However, when I loaded the classifier in a new.ipynb file and tried to use a review for prediction, I get

 NotFittedError: The TF-IDF vectorizer is not fitted** arises. 

This is the detailed error:

---------------------------------------------------------------------------
NotFittedError                            Traceback (most recent call last)
/var/folders/rn/vqtp35xn15zd9d5scq3rxsth0000gn/T/ipykernel_71297/777236349.py in <module>
----> 1 prediction(test_str,HotelModel1000)

/var/folders/rn/vqtp35xn15zd9d5scq3rxsth0000gn/T/ipykernel_71297/1165328373.py in prediction(text, model)
      4     cw = clean_string(text)
      5     cw = tokenize(cw)
----> 6     cw = tfidf_vectorizer.transform([cw])
      7     result = model.predict(cw)
      8     print("Expected rating:",int(result))

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py in transform(self, raw_documents, copy)
   1869             Tf-idf-weighted document-term matrix.
   1870         """
-> 1871         check_is_fitted(self, msg='The TF-IDF vectorizer is not fitted')
   1872 
   1873         # FIXME Remove copy parameter support in 0.24

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     71                           FutureWarning)
     72         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73         return f(**kwargs)
     74     return inner_f
     75 

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
   1018 
   1019     if not attrs:
-> 1020         raise NotFittedError(msg % {'name': type(estimator).__name__})
   1021 
   1022 

NotFittedError: The TF-IDF vectorizer is not fitted

Here is my code to predict:

HotelModel = pickle.load(open('./models/TripAdvisorHotels_SVM_Model1000(2).pickle','rb'))
test_str = input('')
prediction(test_str,HotelModel)

Here is prediction() I called:

tfidf_vectorizer = TfidfVectorizer(max_features=5000,ngram_range=(2,2))

def prediction(text,model):
    cw = clean_string(text)
    cw = tokenize(cw)
    cw = tfidf_vectorizer.transform([cw])
    result = model.predict(cw)
    print("Expected rating:",int(result)) 
    print("\nThe confidence of the prediction is:",model.predict_proba(cw)[0][int(result)-1])

As mentioned in the comment,

you have correctly loaded the trained model from pickle file.

HotelModel = pickle.load(open('./models/TripAdvisorHotels_SVM_Model1000(2).pickle','rb'))

It can do the prediction because you saved the fitted version.

Similarly, tfidf_vectorizer also need the fitted version.

You have to pickle the tfidf_vectorizer fitted version, then load from pickle to use it.

If you are using the SVM based model, keep an eye with vectorizer length for fine tuning.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM