scikit-learn中二進制分類的權重和偏差量

Question

我們在sklearn.neural_network中使用MLPClassifier，並對分類器生成的偏差和權重進行一些分析。

當我們擁有二進制數據時存在一個問題，即僅允許兩個值。 然后，似乎最后一層的尺寸為1，而不是2。在其他情況下，偏差和權重的形狀似乎總是與輸出值的數量匹配。

binary_classifier= MLPClassifier().fit(np.matrix([[0.], [1.]]), np.array([0,1]))
other_classifier = MLPClassifier().fit(np.matrix([[0.], [1.], [2]]), np.array([0,1,2]))

# Note that the dimension below is 1
print(binary_classifier.intercepts_[-1].shape, binary_classifier.coefs_[-1].shape)
# Note that the dimension below is 3
print(other_classifier.intercepts_[-1].shape, other_classifier.coefs_[-1].shape)

輸出：

(1,) (100, 1)
(3,) (100, 3)

從數學上講，您可以執行此操作，並且我認為這是一種優化，但是我們失去了概括性。 有沒有簡單的方法可以防止scikit這樣做？ 否則我們如何轉換權重和偏差，以使它們的尺寸與值的數量匹配？

Answer 1

神經網絡的類標簽需要一種熱編碼，這是在MLPClassifier 。 如果您顯式傳遞一個熱編碼目標，那么您將獲得所需的輸出：

#Now one hot encoded
binary_classifier= MLPClassifier().fit(np.matrix([[0.], [1.]]), np.array([[1, 0], [0, 1]]))
# NOT encoded
other_classifier = MLPClassifier().fit(np.matrix([[0.], [1.], [2]]), np.array([0,1,2]))

# Note that the dimension below is 2
print(binary_classifier.intercepts_[-1].shape, binary_classifier.coefs_[-1].shape)
# Note that the dimension below is 3
print(other_classifier.intercepts_[-1].shape, other_classifier.coefs_[-1].shape)

輸出繼電器：

((2,), (100, 2))
((3,), (100, 3))

有關如何執行此預處理步驟的更多信息，請查看scikit中的OneHotEncoder 文檔。

scikit-learn中二進制分類的權重和偏差量

問題描述

1 個解決方案

解決方案1
1 已采納 2017-10-02 16:10:16

scikit-learn中二進制分類的權重和偏差量

問題描述

1 個解決方案

解決方案1 1 已采納 2017-10-02 16:10:16

解決方案1
1 已采納 2017-10-02 16:10:16