ValueError：分类指标无法处理多标签指标和连续多输出目标 sklearn 的混合

Question

I use the random forest classifier algorithm to predict the belonging of my samples to different classes (5 different classes).我使用随机森林分类器算法来预测我的样本属于不同的类别（5 个不同的类别）。 However, after having made the prediction I cannot evaluate my model precisely because of the different classes.但是，在做出预测后，由于类别不同，我无法准确评估我的 model。 I saw in another post that it was necessary to use np.argmax(y_pred, axis=1) but I didn't really understand the usefulness and how to use this element nor even if it is required in my case.我在另一篇文章中看到有必要使用 np.argmax(y_pred, axis=1) 但我并不真正了解该元素的用途以及如何使用该元素，即使在我的情况下也不需要它。 Can you please help me?你能帮我么？

import numpy as np
import pandas as pd
from sklearn import metrics
from keras.utils import to_categorical
import sklearn as sk
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

X = pd.read_csv('/Users/lottie/desktop/1.csv', header=None)
Y = pd.read_csv('/Users/lottie/desktop/2.csv', header=None)

X.drop([0,0], inplace=True)
Y.drop([0,0], inplace=True)
del X[0]
del Y[0]

Y_encoded = list()
for i in Y.loc[0:,1] :
    if i == 'BRCA' : Y_encoded.append(0)
    if i == 'KIRC' : Y_encoded.append(1)
    if i == 'COAD' : Y_encoded.append(2)
    if i == 'LUAD' : Y_encoded.append(3)
    if i == 'PRAD' : Y_encoded.append(4)
Y_bis = to_categorical(Y_encoded)


X_train, X_test, y_train, y_test = train_test_split(X, Y_bis, test_size=0.30, random_state=42)

regressor = RandomForestRegressor(n_estimators=20, random_state=0)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)


print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test, y_pred))

Answer 1

You are using RandomForestRegressor .您正在使用RandomForestRegressor 。 This model is for continuous variables (such as the price of a house), and your output is not continuous if you have classes.这个 model 是用于连续变量（比如房子的价格），如果你有类，你的 output 是不连续的。

If you have classes you have to use RandomForestClassifier .如果您有课程，则必须使用RandomForestClassifier 。 Obviously, you have to encode your output as number.显然，您必须将 output 编码为数字。 One number for each different class.每个不同的 class 对应一个编号。 Then, when you predict, you will obtain the number of the class.然后，当您预测时，您将获得 class 的编号。

ValueError：分类指标无法处理多标签指标和连续多输出目标 sklearn 的混合

问题描述

1 个解决方案

解决方案1
0 2021-01-11 08:33:50

ValueError：分类指标无法处理多标签指标和连续多输出目标 sklearn 的混合

问题描述

1 个解决方案

解决方案1 0 2021-01-11 08:33:50

解决方案1
0 2021-01-11 08:33:50