帶有 MNIST 數據集的 Sklearn SVC：數字 5 始終錯誤？

Question

我已經建立了一個非常簡單的 SVC 來對 MNIST 數字進行分類。 出於某種原因，分類器一直錯誤地預測數字 5，但在嘗試所有其他數字時，它不會錯過一個。 有沒有人知道我是否可能設置錯誤，或者它在預測數字 5 方面真的很糟糕？

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix

data = datasets.load_digits()
images = data.images
targets = data.target

# Split into train and test sets
images_train, images_test, imlabels_train, imlabels_test = train_test_split(images, targets, test_size=.2, shuffle=False)


# Re-shape data so that it's 2D
images_train = np.reshape(images_train, (np.shape(images_train)[0], 64))
images_test = np.reshape(images_test, (np.shape(images_test)[0], 64))


svm_classifier = SVC(gamma='auto').fit(images_train, imlabels_train)

number_correct_svc = 0
preds = []

for label_index in range(len(imlabels_test)):

    pred = svm_classifier.predict(images_test[label_index].reshape(1,-1))
    if pred[0] == imlabels_test[label_index]:
        number_correct_svc += 1

    preds.append(pred[0])

print("Support Vector Classifier...")
print(f"\tPercent correct for all test data: {100*number_correct_svc/len(imlabels_test)}%")

confusion_matrix(preds,imlabels_test)

這是由此產生的混淆矩陣：

array([[22,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0, 15,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0, 15,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0, 21,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0, 21,  0,  0,  0,  0,  0],
       [13, 21, 20, 16, 16, 37, 23, 20, 31, 16],
       [ 0,  0,  0,  0,  0,  0, 14,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0, 16,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  2,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0, 21]], dtype=int64)

我一直在閱讀 SVC 的 sklearn 頁面，但不知道我做錯了什么

更新：

我嘗試使用 SCV(gamma='scale') ，它似乎更合理。 知道為什么“自動”不起作用仍然很高興？ 與規模：

array([[34,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0, 36,  0,  0,  0,  0,  0,  0,  1,  0],
       [ 0,  0, 35,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0, 27,  0,  0,  0,  0,  0,  1],
       [ 1,  0,  0,  0, 34,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  2,  0, 37,  0,  0,  0,  1],
       [ 0,  0,  0,  0,  0,  0, 37,  0,  0,  0],
       [ 0,  0,  0,  2,  0,  0,  0, 35,  0,  1],
       [ 0,  0,  0,  6,  1,  0,  0,  1, 31,  1],
       [ 0,  0,  0,  0,  2,  0,  0,  0,  1, 33]], dtype=int64)

Answer 1

第二個問題更容易處理。 事情是在 RBF 內核中，伽馬表示決策邊界的擺動程度。 我們所說的“搖擺不定”是什么意思？ 伽馬值越高，決策邊界就越精確。 SVM 的決策邊界。

如果gamma='scale' （默認）被傳遞，那么它使用1 / (n_features *X.var())作為 gamma 的值，

如果是“自動”，則使用1 / n_features 。

在第二種情況下，伽馬值更高。 對於 MNIST 標准偏差小於 1。因此，第二個決策邊界更精確，給出了比前一個案例更好的結果。

帶有 MNIST 數據集的 Sklearn SVC：數字 5 始終錯誤？

問題描述

更新：

1 個解決方案

解決方案1
0 已采納 2020-03-29 01:12:14

帶有 MNIST 數據集的 Sklearn SVC：數字 5 始終錯誤？

問題描述

更新：

1 個解決方案

解決方案1 0 已采納 2020-03-29 01:12:14

解決方案1
0 已采納 2020-03-29 01:12:14