简体   繁体   中英

Error: `data` and `reference` should be factors with the same levels' doesn't return confusion matrix

I have a csv file containing estimated probabilities and actual results. I want to create a confusion matrix using a threshold of 0.5 for the estimated probabilities but i keep getting the error message 'Error: data and reference should be factors with the same levels.' Whats wrong? See code below

I have tried to write the code

TURN PROBS INTO CLASSES AND DISPLAY FREQUENCIES

p_class = ifelse (probs_truth$estimated > 0.5, 1, 0)
table(p_class)

CALCULATING CONFUSION MATRIX

predicted = p_class
actual = probs_truth$truth

library(caret)
result = confusionMatrix (data=predicted, reference=actual)
print(result)

I expected a confusion matrix table to be returned

Subsequent code workes for me, hope it helps: I made a small dataset which I guess resembles your data.

library(data.table)
probs_truth <- data.table(estimated = c(0.5, 0.3, 0.7, 0.8, 0.1), actual = c(1, 0, 0, 1, 0))

Added a column to your dataset with the estimated values according to your ifelse statement ('estimated2').

probs_truth$estimated2 = ifelse (probs_truth$estimated > 0.5, 1, 0)

Made sure 'estimated2' and 'actual' are factors.

probs_truth$estimated2 <- as.factor(probs_truth$estimated2)
probs_truth$actual <- as.factor(probs_truth$actual)

head(probs_truth)

library(caret)
result = confusionMatrix (data=probs_truth$estimated2, reference=probs_truth$actual)
print(result)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM