[英]Confusion Matrix return single matrix
I found an issue with scikit confusion matrix. 我发现scikit混淆矩阵存在问题。
I use confusion matrix inside KFold, then when the y_true and y_pred is 100% correct, the confusion matrix return a single number. 我在KFold中使用混淆矩阵,然后当y_true和y_pred为100%正确时,混淆矩阵返回单个数字。 This make my confusion matrix variable broke, because i add the result from confusion matrix in each fold.
这使我的混淆矩阵变量崩溃了,因为我将混淆矩阵中的结果相加。 Any one have solution for this?
有人对此有解决方案吗?
Here is my code 这是我的代码
model = MultinomialNB()
kf = KFold(n_splits=10)
cf = np.array([[0, 0], [0, 0]])
for train_index, test_index in kf.split(x):
x_train, x_test = x[train_index], x[test_index]
y_train, y_test = y[train_index], y[test_index]
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
cf += confusion_matrix(y_test, y_pred)
Thank You 谢谢
The cleanest way is probably to pass a list of all possible classes in as the labels
argument. 最干净的方法可能是将所有可能的类的列表作为
labels
参数传递。 Here is an example that shows the issue and it being resolved (based on spoofed data for the truth and predictions). 这是一个显示问题和解决方案的示例(基于真实性和预测的欺骗数据)。
from sklearn.metrics import confusion_matrix
import numpy as np
y_test = np.array([1,1,1,1,1,0,0])
y_pred = np.array([0,1,1,1,1,0,0])
labels = np.unique(y_test)
cf = np.array([[0, 0], [0, 0]])
for indices in [ [0,1,2,3], [1,2,3] , [1,2,3,4,5,6]]:
cm1= confusion_matrix(y_test[indices], y_pred[indices])
cm2= confusion_matrix(y_test[indices], y_pred[indices], labels=labels)
print (cm1.shape == (2,2), cm2.shape == (2,2))
In the first subset, both classes appear; 在第一个子集中,两个类都出现。 but in the second subset, only one class appears and so the cm1 matrix is not of size (2,2) (it comes out as (1,1)).
但是在第二个子集中,仅出现一个类,因此cm1矩阵的大小不为(2,2)(结果为(1,1))。 But note that by indicating all potential classes in
labels
, cm2 is always ok. 但是请注意,通过在
labels
指示所有可能的类别,cm2始终可以。
If you already know that the labels can only be 0 or 1, you could just assign labels=[0,1], but using np.unique
will be more robust. 如果您已经知道标签只能是0或1,则只需分配标签= [0,1],但是使用
np.unique
会更可靠。
You can check first if all pred_values
are all equal to true_values
. 您可以首先检查所有
pred_values
是否都等于true_values
。 If it is the case, then just increment your 00
and 11
confusion matrix values by the true_values
(or pred_values
). 如果是这种情况,则只需将
true_values
值(或pred_values
)增加00
和11
混淆矩阵值pred_values
。
X = pd.DataFrame({'f1': [1]*10 + [0]*10,
'f2': [3]*10 + [10]*10}).values
y = np.array([1]*10 + [0]*10)
model = MultinomialNB()
kf = KFold(n_splits=5)
cf = np.array([[0, 0], [0, 0]])
for train_index, test_index in kf.split(X):
x_train, x_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
if all(y_test == y_pred): # if perfect prediction
cf[0][0] += sum(y_pred == 0) # increment by number of 0 values
cf[1][1] += sum(y_pred == 1) # increment by number of 1 values
else:
cf += confusion_matrix(y_test, y_pred) # else add cf values
Result of print(cf)
print(cf)
结果print(cf)
>> [10 0]
[0 10]
Be careful to overfitting 小心过度拟合
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.