Python-混淆矩陣維度上的差異

Question

我有一個關於混淆矩陣的問題。 我使用交叉驗證將148個實例拆分為兩個數組-測試和訓練。 比我這樣稱呼：

def GenerateResult:
   clf = OneVsRestClassifier(GaussianNB())
   clf.fit(x_train, y_train)
   predictions = clf.predict(x_test)
   accuracy = accuracy_score(y_test, predictions)
   confusion_mtrx = confusion_matrix(y_test, predictions)

這是KFold的循環->我從上調用函數：

for train_idx, test_idx in pf.split(x_array):
       x_train, x_test = x_array[train_idx], x_array[test_idx]
       y_train, y_test = y_array[train_idx], y_array[test_idx]
       acc, confusion= GenerateResult(x_train, x_test, y_train, y_test)
       results['First'].append(acc)
       confusion_dict['First'].append(confusion)

然后我對結果求和並計算平均值

np_gausian = np.asarray(results['gaussian'])
print("[First] Mean: {}".format(np.mean(np_gausian)))

print(confusion_dict['gaussian'])

我有一個問題。 在我的148個實例中，我在輸出中有4個類，當我將該循環用於KFold時，我得到了兩個不同的混淆矩陣。 第一混淆矩陣3x3：

[[36  1  1]

 [15 17  1]

 [ 0  0  3]]

第二4x4：

[[ 0  2  0  0]

 [ 0 41  2  0]

 [ 0 12 16  0]

 [ 0  0  1  0]]

我認為我有一個問題，因為我有148個實例

1-2類
2級-81 ea
3級-61 ea
類別4-4個
所有等級-148

我該怎么辦？ 我該如何總結混淆矩陣？ 如果我更改KFold中的分割數怎么辦？ 我嘗試使用Pandas，但我不知道該怎么做。 請幫助，我使用sk-learn

Answer 1

正如@KRKirov在評論中指出的，其原因是由於Kfold交叉驗證將數據拆分為折疊，因此該折疊的測試集中沒有某些類。

例如，如果y_test不存在class1，並且在predictions也不進行predictions ，則confusion_matrix代碼將自動推斷出數據中僅存在三個類，並據此生成矩陣。

您可以通過設置labels param來強制confusion_matrix使用所有類：

標簽：數組，形狀= [n_classes]，可選

 List of labels to index the matrix. This may be used to reorder or select a subset of labels. If none is given, those that appear at least once in y_true or y_pred are used in sorted order.

通過做這個：

confusion_mtrx = confusion_matrix(y_test, predictions, 
                                 labels = np.unique(y_array))

您需要將y_array或唯一標簽從y_array傳遞給GenerateResult（）方法。

Python-混淆矩陣維度上的差異

問題描述

1 個解決方案

解決方案1
0 2018-03-14 04:23:51

Python-混淆矩陣維度上的差異

問題描述

1 個解決方案

解決方案1 0 2018-03-14 04:23:51

解決方案1
0 2018-03-14 04:23:51