评估R中的统计模型

Question

I have a very big data set ( ds ). 我有一个非常大的数据集（ ds ）。 One of its columns is Popularity , of type factor ('High' / ' Low'). 其中一个栏目是Popularity ，类型factor （'高'/'低'）。

I split the data to 70% and 30% in order to create a training set ( ds_tr ) and a test set ( ds_te ). 我将数据拆分为70％和30％，以便创建训练集（ ds_tr ）和测试集（ ds_te ）。

I have created the following model using a Logistic regression: 我使用Logistic回归创建了以下模型：

mdl <- glm(formula = popularity ~ . -url , family= "binomial", data = ds_tr )

then I created a predict object (will do it again for ds_te ) 然后我创建了一个predict对象（将再次为ds_te做）

y_hat = predict(mdl, data = ds_tr - url , type = 'response')

I want to find the precision value which corresponds to a cutoff threshold of 0.5 and find the recall value which corresponds to a cutoff threshold of 0.5, so I did: 我想找到对应于截止阈值0.5的精度值，并找到对应于截止阈值0.5的召回值，所以我做了：

library(ROCR)
pred <- prediction(y_hat, ds_tr$popularity)
perf <- performance(pred, "prec", "rec")

The result is a table of many values 结果是一个包含许多值的表

str(perf)

Formal class 'performance' [package "ROCR"] with 6 slots
  ..@ x.name      : chr "Recall"
  ..@ y.name      : chr "Precision"
  ..@ alpha.name  : chr "Cutoff"
  ..@ x.values    :List of 1
  .. ..$ : num [1:27779] 0.00 7.71e-05 7.71e-05 1.54e-04 2.31e-04 ...
  ..@ y.values    :List of 1
  .. ..$ : num [1:27779] NaN 1 0.5 0.667 0.75 ...
  ..@ alpha.values:List of 1
  .. ..$ : num [1:27779] Inf 0.97 0.895 0.89 0.887 ...

How do I find the specific precision and recall values corresponding to a cutoff threshold of 0.5? 如何找到与截止阈值0.5相对应的特定精度和召回值？

Answer 1

Acces the slots of performance object (through the combination of @ + list) 访问性能对象的插槽（通过@ +列表的组合）

We create a dataset with all possible values: 我们创建一个包含所有可能值的数据集：

probab.cuts <- data.frame(cut=perf@alpha.values[[1]], prec=perf@y.values[[1]], rec=perf@x.values[[1]])

You can view all associated values 您可以查看所有关联的值

probab.cuts

If you want to select the requested values, it is trivial to do: 如果要选择所请求的值，则执行以下操作非常简单：

tail(probab.cuts[probab.cuts$cut > 0.5,], 1)

Manual check 手动检查

tab <- table(ds_tr$popularity, y_hat > 0.5)
tab[4]/(tab[4]+tab[2]) # recall
tab[4]/(tab[4]+tab[3]) # precision

评估R中的统计模型

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-01-03 22:29:27

评估R中的统计模型

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-01-03 22:29:27

解决方案1
1 已采纳 2016-01-03 22:29:27