[英]ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets sklearn
I use the random forest classifier algorithm to predict the belonging of my samples to different classes (5 different classes).我使用随机森林分类器算法来预测我的样本属于不同的类别(5 个不同的类别)。 However, after having made the prediction I cannot evaluate my model precisely because of the different classes.
但是,在做出预测后,由于类别不同,我无法准确评估我的 model。 I saw in another post that it was necessary to use np.argmax(y_pred, axis=1) but I didn't really understand the usefulness and how to use this element nor even if it is required in my case.
我在另一篇文章中看到有必要使用 np.argmax(y_pred, axis=1) 但我并不真正了解该元素的用途以及如何使用该元素,即使在我的情况下也不需要它。 Can you please help me?
你能帮我么?
import numpy as np
import pandas as pd
from sklearn import metrics
from keras.utils import to_categorical
import sklearn as sk
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
X = pd.read_csv('/Users/lottie/desktop/1.csv', header=None)
Y = pd.read_csv('/Users/lottie/desktop/2.csv', header=None)
X.drop([0,0], inplace=True)
Y.drop([0,0], inplace=True)
del X[0]
del Y[0]
Y_encoded = list()
for i in Y.loc[0:,1] :
if i == 'BRCA' : Y_encoded.append(0)
if i == 'KIRC' : Y_encoded.append(1)
if i == 'COAD' : Y_encoded.append(2)
if i == 'LUAD' : Y_encoded.append(3)
if i == 'PRAD' : Y_encoded.append(4)
Y_bis = to_categorical(Y_encoded)
X_train, X_test, y_train, y_test = train_test_split(X, Y_bis, test_size=0.30, random_state=42)
regressor = RandomForestRegressor(n_estimators=20, random_state=0)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test, y_pred))
You are using RandomForestRegressor
.您正在使用
RandomForestRegressor
。 This model is for continuous variables (such as the price of a house), and your output is not continuous if you have classes.这个 model 是用于连续变量(比如房子的价格),如果你有类,你的 output 是不连续的。
If you have classes you have to use RandomForestClassifier
.如果您有课程,则必须使用
RandomForestClassifier
。 Obviously, you have to encode your output as number.显然,您必须将 output 编码为数字。 One number for each different class.
每个不同的 class 对应一个编号。 Then, when you predict, you will obtain the number of the class.
然后,当您预测时,您将获得 class 的编号。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.