简体   繁体   中英

how to use PRROC package to get the auc of ROC & PR for random forest in R

My data resource: https://www.kaggle.com/mlg-ulb/creditcardfraud The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions, 在此处输入图片说明 在此处输入图片说明

I was using the PRROC package to get AUC of ROC curve, here is my random forest code:

rf.model <- randomForest(Class ~ ., data = training, ntree = 2000, nodesize = 20)
rf_pred <- predict(rf.model, test,type="prob" 

so, as expected, rf_pred should return the probability of each class : 在此处输入图片说明 Then, i used the following code:

fg_rf <- rf_pred[test$Class==1]
bg_rf <- rf_pred[test$Class==0]
roc_rf <- roc.curve(scores.class0 = fg_rf,scores.class1 = bg_rf,curve = T)

However, the ROC CURVE turned out to be not what as i expected 在此处输入图片说明 The same problem occurred for PR curve. Is it because of high imbalance in class? And assuming rf_pred returns the probability of 0/1, how can i let fg_rf equals to the probability of calss=1, is my code: fg_rf <- rf_pred[test$Class==1] correct?

Looking at your head(rf_pred) results, it is obvious that your predict function returns (hard) classes (ie 0/1), and not probability scores, probably due to your type="pro" typo (it should be type="prob" ).

The scores.class0 & scores.class1 arguments of the roc.curve method should be probability scores, and not hard class predictions.

Correct the typo in predict and you should be fine, but most probably you need to also switch the scores - as they are now you are assigning your class 1 points to scores.class0 :

rf_pred <- predict(rf.model, test,type="prob")
fg_rf <- rf_pred[test$Class==1]
bg_rf <- rf_pred[test$Class==0]
roc_rf <- roc.curve(scores.class0 = bg_rf, scores.class1 = fg_rf, curve = T)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM