how to use PRROC package to get the auc of ROC & PR for random forest in R

Question

My data resource: https://www.kaggle.com/mlg-ulb/creditcardfraud The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions,

I was using the PRROC package to get AUC of ROC curve, here is my random forest code:

rf.model <- randomForest(Class ~ ., data = training, ntree = 2000, nodesize = 20)
rf_pred <- predict(rf.model, test,type="prob"

so, as expected, rf_pred should return the probability of each class : Then, i used the following code:

fg_rf <- rf_pred[test$Class==1]
bg_rf <- rf_pred[test$Class==0]
roc_rf <- roc.curve(scores.class0 = fg_rf,scores.class1 = bg_rf,curve = T)

However, the ROC CURVE turned out to be not what as i expected The same problem occurred for PR curve. Is it because of high imbalance in class? And assuming rf_pred returns the probability of 0/1, how can i let fg_rf equals to the probability of calss=1, is my code: fg_rf <- rf_pred[test$Class==1] correct?

Answer 1

Looking at your head(rf_pred) results, it is obvious that your predict function returns (hard) classes (ie 0/1), and not probability scores, probably due to your type="pro" typo (it should be type="prob" ).

The scores.class0 & scores.class1 arguments of the roc.curve method should be probability scores, and not hard class predictions.

Correct the typo in predict and you should be fine, but most probably you need to also switch the scores - as they are now you are assigning your class 1 points to scores.class0 :

rf_pred <- predict(rf.model, test,type="prob")
fg_rf <- rf_pred[test$Class==1]
bg_rf <- rf_pred[test$Class==0]
roc_rf <- roc.curve(scores.class0 = bg_rf, scores.class1 = fg_rf, curve = T)

how to use PRROC package to get the auc of ROC & PR for random forest in R

Question

1 answers

solution1
1 2018-02-20 19:32:19

how to use PRROC package to get the auc of ROC & PR for random forest in R

Question

1 answers

solution1 1 2018-02-20 19:32:19

solution1
1 2018-02-20 19:32:19