接收分类指标无法处理多类混淆_matrix

Question

after I cross-validated my training datasets - I began to have trouble with the confusion matrix. 在对训练数据集进行交叉验证之后，我开始对混淆矩阵感到困惑。 my X_Train shape shows (835, 5) and my y_train shape shows (835,). 我的X_Train形状显示（835，5），而我的y_train形状显示（835，）。 I cannot use this method when my data is mixed. 混合数据时，我无法使用此方法。 Otherwise, the modules before it, were working perfectly. 否则，之前的模块运行良好。 The code that I have is written below. 我的代码写在下面。 How do I setup the training data to work with the confusion_matrix method? 如何设置训练数据以与confusion_matrix方法配合使用？

cross_validate/cross_val_score module cross_validate / cross_val_score模块

from sklearn.model_selection import cross_validate
from sklearn.model_selection import cross_val_score
lasso = linear_model.Lasso()
cross_validate_results = cross_validate(lasso, X_train, y_train, return_train_score=True)
sorted(cross_validate_results.keys())
cross_validate_results['test_score']
print(cross_val_score(lasso, X_train, y_train))

confusion_matrix module confusion_matrix模块

from sklearn.metrics import confusion_matrix

confusion_matrix(y_train, X_train)

Error 错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-83-78f76b6bc798> in <module>()
      1 from sklearn.metrics import confusion_matrix
      2 
----> 3 confusion_matrix(y_test, X_test)

~\Anaconda3\lib\site-packages\sklearn\metrics\classification.py in confusion_matrix(y_true, y_pred, labels, sample_weight)
    248 
    249     """
--> 250     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    251     if y_type not in ("binary", "multiclass"):
    252         raise ValueError("%s is not supported" % y_type)

~\Anaconda3\lib\site-packages\sklearn\metrics\classification.py in _check_targets(y_true, y_pred)
     79     if len(y_type) > 1:
     80         raise ValueError("Classification metrics can't handle a mix of {0} "
---> 81                          "and {1} targets".format(type_true, type_pred))
     82 
     83     # We can't have more than one value on y_type => The set is no more needed

ValueError: Classification metrics can't handle a mix of multiclass and multiclass-multioutput targets

print shape of arrays module 打印阵列形状的模块

print(X_train.shape)
print(y_train.shape)
(835, 5)
(835,)

UPDATE: I am now receiving this error ValueError: Found input variables with inconsistent numbers of samples: [356, 209] 更新：我现在收到此错误ValueError: Found input variables with inconsistent numbers of samples: [356, 209]

When I run confusion_matrix(y_train, X_train) 当我运行confusion_matrix（y_train，X_train）

from sklearn.metrics import confusion_matrix

confusion_matrix(y_train, y_pred)

Full error 完全错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-46-3caf00cb052f> in <module>()
      1 from sklearn.metrics import confusion_matrix
      2 
----> 3 confusion_matrix(y_train, y_pred)

~\Anaconda3\lib\site-packages\sklearn\metrics\classification.py in confusion_matrix(y_true, y_pred, labels, sample_weight)
    248 
    249     """
--> 250     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    251     if y_type not in ("binary", "multiclass"):
    252         raise ValueError("%s is not supported" % y_type)

~\Anaconda3\lib\site-packages\sklearn\metrics\classification.py in _check_targets(y_true, y_pred)
     69     y_pred : array or indicator matrix
     70     """
---> 71     check_consistent_length(y_true, y_pred)
     72     type_true = type_of_target(y_true)
     73     type_pred = type_of_target(y_pred)

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
    202     if len(uniques) > 1:
    203         raise ValueError("Found input variables with inconsistent numbers of"
--> 204                          " samples: %r" % [int(l) for l in lengths])
    205 
    206 

ValueError: Found input variables with inconsistent numbers of samples: [356, 209]

Answer 1

You need to pass y to the confusion matrix, not X ( http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html ). 您需要将y传递给混淆矩阵，而不是X（ http://scikit-learn.org/stable/modules/generation/sklearn.metrics.confusion_matrix.html ）。 Ideally, you would reserve a proportion of your data as a test set using sklearn's train_test_split ( http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html ) and use your model to predict y based on the test set. 理想情况下，您将使用sklearn的train_test_split（ http://scikit-learn.org/stable/modules/generation/sklearn.model_selection.train_test_split.html ）将一部分数据保留为测试集，并使用模型预测基于y在测试集上。 Then you would use 那你会用

confusion_matrix(y_test, y_pred)

to calculate the confusion matrix. 计算混淆矩阵。 In cases where there is no test set you would still use the predict method of your classifier with X_train in order to get y_pred. 在没有测试集的情况下，您仍将使用带有X_train的分类器的预测方法来获取y_pred。 In this case, you pass y_train as the true labels and y_pred as the predicted labels to the confusion matrix, eg 在这种情况下，您将y_train作为真实标签，将y_pred作为预测标签传递给混淆矩阵，例如

confusion_matrix(y_train, y_pred)

Looking at your code again, your estimator is a regression model ( http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso , eg it predicts numerical values and then you are trying to use confusion matrix with it which is used for assessing the performance of classification models, eg how well labels have been predicted. So, you ought to consider metrics other than confusion_matrix for your problem. 再次查看您的代码，您的估计量是一个回归模型（ http://scikit-learn.org/stable/modules/generation/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso ，例如，它预测数值然后您正在尝试使用混淆矩阵，该矩阵用于评估分类模型的性能（例如，标签的预测程度如何），因此，您应该考虑使用confusion_matrix以外的指标来解决问题。

Since you have now decided to use knn try the following first before dealing with cross validation. 由于您现在已经决定使用knn，因此在处理交叉验证之前，请先尝试以下操作。

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix

# Assuming your target column is y, otherwise use the appropriate column name
X = df.drop(['y'], axis=1).values.astype('float')
y = df['y'].values.astype('float') # assuming you have label encoded your target variable

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=23, stratify=y)

knn = KNeighborsClassifier()
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)

cm = confusion_matrix(y_test, y_pred)
print(cm)

接收分类指标无法处理多类混淆_matrix

问题描述

1 个解决方案

解决方案1
0 2018-03-17 00:13:44

接收分类指标无法处理多类混淆_matrix

问题描述

1 个解决方案

解决方案1 0 2018-03-17 00:13:44

解决方案1
0 2018-03-17 00:13:44