[英]How to parallelise .predict() method of a scikit-learn SVM (SVC) Classifier?
[英]Wrong prediction with SVC classifier in scikit-learn?
我生成了自己的語料庫,因此我將其拆分成一個訓練文本文件,如下所示:
POS|This film was awesome, highly recommended
NEG|I did not like this film
NEU|I went to the movies
POS|this film is very interesting, i liked a lot
NEG|the film was very boring i did not like it
NEU|the cinema is big
NEU|the cinema was dark
為了進行測試,我還有另一篇未貼標簽的文字評論:
I did not like this film
然后,我執行以下操作:
import pandas as pd
from sklearn.feature_extraction.text import HashingVectorizer
trainingdata = pd.read_csv('/Users/user/Desktop/training.txt',
header=None, sep='|', names=['labels', 'movies_reviews'])
vect = HashingVectorizer(analyzer='word', ngram_range=(2,2), lowercase=True, n_features=7)
X = vect.fit_transform(trainingdata['movies_reviews'])
y = trainingdata['labels']
TestText= pd.read_csv('/Users/user/Desktop/testing.txt',
header=None, names=['test_opinions'])
test = vect.transform(TestText['test_opinions'])
from sklearn.svm import SVC
svm = SVC()
svm.fit(X, y)
prediction = svm.predict(test)
print prediction
預測是:
['NEU']
然后我想到的是為什么這個預測是錯誤的? 這是代碼問題還是功能或分類算法問題?,我試着玩這個,當我從訓練文本文件中刪除最后一個評論時,我意識到總是在預測該文件的最后一個元素。 關於如何解決此問題的任何想法嗎?
SVM對參數設置非常敏感。 您將需要進行網格搜索以找到正確的值。 我嘗試在您的數據集上訓練兩種朴素貝葉斯,並且在訓練集上獲得了完美的准確性:
from sklearn.naive_bayes import *
from sklearn.feature_extraction.text import *
# first option- Gaussian NB
vect = HashingVectorizer(analyzer='word', ngram_range=(2,2), lowercase=True)
X = vect.fit_transform(trainingdata['movies_reviews'])
y = trainingdata['labels']
nb = GaussianNB().fit(X.A,y) # input needs to be dense
nb.predict(X.A) == y
# second option- MultinomialNB (input needs to be positive, use CountingVect instead)
vect = CountVectorizer(analyzer='word', ngram_range=(2,2), lowercase=True)
X = vect.fit_transform(trainingdata['movies_reviews'])
y = trainingdata['labels']
nb = MultinomialNB().fit(X,y)
nb.predict(X.A) == y
在這兩種情況下,輸出均為
Out[33]:
0 True
1 True
2 True
3 True
4 True
5 True
6 True
Name: labels, dtype: bool
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.