简体   繁体   English

如何使用r,*中的ROCR包绘制ROC曲线,仅使用分类列联表*

[英]How to plot a ROC curve using ROCR package in r, *with only a classification contingency table*

How to plot a ROC curve using ROCR package in r, with only a classification contingency table ? 如何在r中使用ROCR包绘制ROC曲线, 只有一个分类列联表

I have a contingency table where the true positive, false positive.. etc. all the rated can be computed. 我有一个列联表,其中真正的正面,误报等等。所有额定值都可以计算出来。 I have 500 replications, therefore 500 tables. 我有500个复制,因此有500个表。 But, I can not generate a prediction data indicating each single case of estimating probability and the truth. 但是,我无法生成指示每个估计概率和真值的单个案例的预测数据。 How can I get a curve without the individual data. 如何在没有单个数据的情况下获得曲线。 Below is the package instruction used. 以下是使用的包指令。

## computing a simple ROC curve (x-axis: fpr, y-axis: tpr)
library(ROCR)
data(ROCR.simple)
pred <- prediction( ROCR.simple$predictions, ROCR.simple$labels)
perf <- performance(pred,"tpr","fpr")
plot(perf)    

You cannot generate the full ROC curve with a single contingency table because a contingency table provides only a single sensitivity/specificity pair (for whatever predictive cutoff was used to generate the contingency table). 您不能使用单个列联表生成完整的ROC曲线,因为列联表仅提供单个敏感度/特异性对(对于用于生成列联表的任何预测截止值)。

If you had many contingency tables that were generated with different cutoffs, you would be able to approximate the ROC curve (basically it will be a linear interpolation between the sensitivity/specificity values in your contingency tables). 如果您有许多使用不同截止值生成的列联表,您将能够近似ROC曲线(基本上它将是您的列联表中的灵敏度/特异性值之间的线性插值)。 As an example, let's consider predicting whether a flower is versicolor in the iris dataset using logistic regression: 例如,让我们考虑使用逻辑回归来预测虹膜数据集中的花是否是多色的:

iris$isv <- as.numeric(iris$Species == "versicolor")
mod <- glm(isv~Sepal.Length+Sepal.Width, data=iris, family="binomial")

We could use the standard ROCR code to compute the ROC curve for this model: 我们可以使用标准ROCR代码来计算此模型的ROC曲线:

library(ROCR)
pred1 <- prediction(predict(mod), iris$isv)
perf1 <- performance(pred1,"tpr","fpr")
plot(perf1)

在此输入图像描述

Now let's assume that instead of mod all we have is contingency tables with a number of cutoffs values for predictions: 现在让我们假设我们所拥有的不是mod ,而是具有预测的许多截止值的列联表:

tables <- lapply(seq(0, 1, .1), function(x) table(iris$isv, factor(predict(mod, type="response") >= x, levels=c(F, T))))

# Predict TRUE if predicted probability at least 0
tables[[1]]
#     FALSE TRUE
#   0     0  100
#   1     0   50

# Predict TRUE if predicted probability at least 0.5
tables[[6]]
#     FALSE TRUE
#   0    86   14
#   1    29   21

# Predict TRUE if predicted probability at least 1
tables[[11]]
#     FALSE TRUE
#   0   100    0
#   1    50    0

From one table to the next some predictions changed from TRUE to FALSE due to the increased cutoff, and by comparing column 1 of the successive table we can determine which of these represent true negative and false negative predictions. 从一个表到下一个表,由于截止增加,一些预测从TRUE变为FALSE,并且通过比较连续表的第1列,我们可以确定哪些表示真正的否定和假阴性预测。 Iterating through our ordered list of contingency tables we can create fake predicted value/outcome pairs that we can pass to ROCR, ensuring that we match the sensitivity/specificity for each contingency table. 通过我们有序的列联表列表迭代,我们可以创建假的预测值/结果对,我们可以将其传递给ROCR,确保我们匹配每个列联表的灵敏度/特异性。

fake.info <- do.call(rbind, lapply(1:(length(tables)-1), function(idx) {
  true.neg <- tables[[idx+1]][1,1] - tables[[idx]][1,1]
  false.neg <- tables[[idx+1]][2,1] - tables[[idx]][2,1]
  if (true.neg <= 0 & false.neg <= 0) {
    return(NULL)
  } else {
    return(data.frame(fake.pred=idx,
                      outcome=rep(c(0, 1), times=c(true.neg, false.neg))))
  }
}))

Now we can pass the faked predictions to ROCR as usual: 现在我们可以像往常一样将伪造的预测传递给ROCR:

pred2 <- prediction(fake.info$fake.pred, fake.info$outcome)
perf2 <- performance(pred2,"tpr","fpr")
plot(perf2)

在此输入图像描述

Basically what we have done is a linear interpolation of the points that we do have on the ROC curve. 基本上我们所做的是对ROC曲线上的点进行线性插值。 If you had contingency tables for many cutoffs you could more closely approximate the true ROC curve. 如果您有许多临界值的列联表,您可以更接近真实的ROC曲线。 If you don't have a wide range of cutoffs you can't hope to accurately reproduce the full ROC curve. 如果您没有多种截止值,则无法准确再现完整的ROC曲线。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM