I am trying to get a confusion matrix from my XGBoost and compute the accuracy. However, my confusion matrix is not complete and misses all the false areas and looks like this:
y_pred 0 1
TRUE 526 482
Therefore, I cannot compute the accuracy. Here is my code:
# Splitting the dataset into the training set and test set
dataset$Good.Bad.Stock = factor(dataset$Good.Bad.Stock, levels = c(0,1))
training_set = dataset[1:2740,]
test_set = dataset[2741:3748,]
data = as.factor(as.character(training_set$Good.Bad.Stock))
data = replace(training_set$Good.Bad.Stock, is.na(training_set$Good.Bad.Stock), 0)
data
# Fitting XGBoost to the Training set
classifier_XGB = xgboost(data = as.matrix(training_set[-63]),
label = data,
nrounds = 15,
objective = "binary:logistic")
# Predicting the Test set results
pred_data=as.matrix(test_set[-63])
y_pred = predict(classifier_XGB, pred_data)
y_pred = (y_pred > 0.5)
# Making the Confusion Matrix
cm_XGB = table(y_pred, test_set$Good.Bad.Stock)
cm_XGB
# Evaluate Model
accuracy_XGB = (cm_XGB[1,1] + cm_XGB[2,2]) / (cm_XGB[1,1] + cm_XGB[2,2] + cm_XGB[1,2] + cm_XGB[2,1])
print(accuracy_XGB)
Thank you for the help!
I didn't run the code, but i wonder the problem is in:
y_pred = (y_pred > 0.5)
Just print y_pred before to do that, and probably you will see a 1s vector or probabilities above 0.5.
This is probably caused by a bad configurated model (read more about xgb parameters) or a highly unbalanced dataset (don't seem that in the testset).
Edited: Of course you have to be sure that your response variable is typed as factor. Also you should set the objective function as binary. As I said, I highly recommed you to keep reading basic posts about xgb. https://www.analyticsvidhya.com/blog/2016/01/xgboost-algorithm-easy-steps/ https://cran.r-project.org/web/packages/xgboost/vignettes/discoverYourData.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.