I implemented an SVM model that can classify given text into two categories. The model was trained and tested using data.csv data set. Now I want to use this model with live data. To do that I used the pickle python library. First I saved the model.
joblib.dump(clf, "model.pkl")
Then I have loaded that model.
classifer = joblib.load("model.pkl")
Then I used below input as text to be classified.
new_observation = "this news should be in one category"
classifer.predict([new_observation])
But after running this, it gives an error.
ValueError: could not convert string to float: 'this news should be in one category'
I referred below link to know about how to save and load the trained model. [ https://scikit-learn.org/stable/modules/model_persistence.html][1]
EDIT
Here is the code I used to create an svm model.
data = pd.read_csv('data1.csv',encoding='cp1252')
def pre_process(text):
text = text.translate(str.maketrans('', '', string.punctuation))
text = [word for word in text.split() if word.lower() not in
stopwords.words('english')]
words = ""
for i in text:
stemmer = SnowballStemmer("english")
words += (stemmer.stem(i))+" "
return words
textFeatures = data['textForCategorized'].copy()
textFeatures = textFeatures.apply(pre_process)
vectorizer = TfidfVectorizer("english")
features = vectorizer.fit_transform(textFeatures)
features_train, features_test, labels_train, labels_test = train_test_split(features, data['class'], test_size=0.3, random_state=111)
svc = SVC(kernel='sigmoid', gamma=1.0)
clf = svc.fit(features_train, labels_train)
prediction = svc.predict(features_test)
And after implementing the model, here is the way I try to give input to the model.
joblib.dump(clf, "model.pkl")
classifer = joblib.load("model.pkl")
new_observation = "This news should be in one category"
classifer.predict(new_observation)
EDIT
joblib.dump(clf, "model.pkl")
classifer = joblib.load("model.pkl")
textFeature = "Dengue soaring in ......"
textFeature =pre_process(textFeature)
classifer.predict(textFeature.encode())
Here is the code that I used to load the model and input text to the model. After doing so, I added code to get prediction value. But I got an error.
ValueError: could not convert string to float: b'dengu soar '
You should pre-process new_observation
before feeding it to the model. In your case, you've only pre-processed textFeatures
for training, you must repeat the pre-processing steps for new_observation
too.
pre_process()
function on new_observation
vectorizer
to transform the output obtained from pre_process(new_observation)
I have got the same issue and resolved by resizing single string data as per the shape of training data.
complete code:
joblib.dump(clf, "model.pkl")
classifer = joblib.load("model.pkl")
textFeature = "Dengue soaring in ......"
vocabulary=pre_process(textFeature)
vocabulary_df =pd.Series(vocabulary)
#### Feature extraction using Tfidf Vectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(stop_words='english')
test_ = vectorizer.fit_transform(vocabulary_df.values)
test_.resize(1, features_train.shape[1])
classifer.predict(test_)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.