My data resource: https://www.kaggle.com/mlg-ulb/creditcardfraud The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions,
I was using the PRROC package to get AUC of ROC curve, here is my random forest code:
rf.model <- randomForest(Class ~ ., data = training, ntree = 2000, nodesize = 20)
rf_pred <- predict(rf.model, test,type="prob"
so, as expected, rf_pred should return the probability of each class : Then, i used the following code:
fg_rf <- rf_pred[test$Class==1]
bg_rf <- rf_pred[test$Class==0]
roc_rf <- roc.curve(scores.class0 = fg_rf,scores.class1 = bg_rf,curve = T)
However, the ROC CURVE turned out to be not what as i expected The same problem occurred for PR curve. Is it because of high imbalance in class? And assuming rf_pred returns the probability of 0/1, how can i let fg_rf equals to the probability of calss=1, is my code:
fg_rf <- rf_pred[test$Class==1]
correct?
Looking at your head(rf_pred)
results, it is obvious that your predict
function returns (hard) classes (ie 0/1), and not probability scores, probably due to your type="pro"
typo (it should be type="prob"
).
The scores.class0
& scores.class1
arguments of the roc.curve
method should be probability scores, and not hard class predictions.
Correct the typo in predict
and you should be fine, but most probably you need to also switch the scores - as they are now you are assigning your class 1 points to scores.class0
:
rf_pred <- predict(rf.model, test,type="prob")
fg_rf <- rf_pred[test$Class==1]
bg_rf <- rf_pred[test$Class==0]
roc_rf <- roc.curve(scores.class0 = bg_rf, scores.class1 = fg_rf, curve = T)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.