简体   繁体   中英

Writing a function in R to plot ROC curve using pROC

I'm trying to write a function to plot ROC curves based on different scoring systems I have to predict an outcome.

I have a dataframe data_all, with columns "score_1" and "Threshold.2000". I generate a ROC curve as desired with the following:

plot.roc(data_all$Threshold.2000, data_all$score_1)

My goal is to generate a ROC curve for a number of different outcomes (eg Threshold.1000) and scores (score_1, score_2 etc), but am initially trying to set it up just for different scores. My function is as follows:

roc_plot <- function(dataframe_of_interest, score_of_interest) {
plot.roc(dataframe_of_interest$Threshold.2000, dataframe_of_interest$score_of_interest)}

I get the following error: Error in roc.default(x, predictor, plot = TRUE, ...): No valid data provided.

I'd be very grateful if someone can spot why my function doesn't work, I'm a python coder and new-ish to R. and haven't had much luck trying a number of different things. Thanks very much.

EDIT: Here is the same example with mtcars so it's reproducible:

data(mtcars)
plot.roc(mtcars$vs, mtcars$mpg) # --> makes correct graph
roc_plot <- function(dataframe_of_interest, score_of_interest) {
plot.roc(dataframe_of_interest$mpg, dataframe_of_interest$score_of_interest)}

Outcome: Error in roc.default(x, predictor, plot = TRUE, ...): No valid data provided. roc_plot(mtcars, vs)

Here's one solution that works as desired (ie lets the user specify different values for score_of_interest ):

library(pROC)
data(mtcars)

plot.roc(mtcars$vs, mtcars$mpg) # --> makes correct graph

# expects `score_of_interest` to be a string!!!
roc_plot <- function(dataframe_of_interest, score_of_interest) {
    plot.roc(dataframe_of_interest$vs, dataframe_of_interest[, score_of_interest])
}

roc_plot(mtcars, 'mpg')
roc_plot(mtcars, 'cyl')

Note that your error was not resulting from an incorrect column name, it was resulting from an incorrect use of the data.frame class. Notice what happens with a simpler function:

foo <- function(x, col_name) {
    head(x$col_name)
}
foo(mtcars, mpg)
## NULL

This returns NULL . So in your original function when you tried to supply plot.roc with dataframe_of_interest$score_of_interest you were actually feeding plot.roc a NULL .

There are several ways to extract a column from a data.frame by the column name when that name is stored in an object (which is what you're doing when you pass it as an argument in a function). Perhaps the easiest way is to remember that a data.frame is like a 2D array-type object and so we can use familiar object[i, j] syntax, but we ask for all rows and we specify the column by name, eg, mtcars[, 'mpg'] . This still works if we assign the string 'mpg' to an object:

x <- 'mpg'
mtcars[, x]

So that's how I produced my solution. Going a step further, it's not hard to imagine how you would be able to supply both a score_of_interest and a threshold_of_interest :

roc_plot2 <- function(dataframe_of_interest, threshold_of_interest, score_of_interest) {
    plot.roc(dataframe_of_interest[, threshold_of_interest], 
             dataframe_of_interest[, score_of_interest])
}

roc_plot2(mtcars, 'vs', 'mpg')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM