训练模型并支持召回率/精度的最佳方法是什么？

Question

I have a binary classification problem and my dataset is composed of 5% positive labels. 我有一个二进制分类问题，我的数据集由5％的正向标签组成。 I'm training my model using tensorflow. 我正在使用张量流训练我的模型。 Here is my results during training: 这是我在训练期间的结果：

Step 3819999: loss = 0.22 (0.004 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

Step 3820999: loss = 0.21 (0.003 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

Step 3821999: loss = 0.15 (0.003 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

Step 3822999: loss = 0.15 (0.003 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

What are the main strategies to improve recall? 改善召回率的主要策略是什么？ Changing dataset and adding more positive labels may solve the problem, but it seems odd to change the problem's reality... 更改数据集并添加更多肯定标签可能可以解决问题，但是更改问题的实际情况似乎很奇怪。

In my point of view, there should be a way to favour "True positives" instead of "False Negatives", but I can't seem to find one. 以我的观点，应该有一种方法支持“真肯定”而不是“假否定”，但是我似乎找不到。

Answer 1

You should use the " weighted cross entropy " instead of the classic CE. 您应该使用“ 加权交叉熵 ”代替经典的CE。 From the Tensorflow documentation : 从Tensorflow文档中：

This is like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error. 类似于sigmoid_cross_entropy_with_logits（），不同之处在于pos_weight可通过相对于负误差增加或减小正误差的成本来权衡取回和精度。 The usual cross-entropy cost is defined as: 通常的交叉熵成本定义为：

targets * -log(sigmoid(logits)) + (1 - targets) * -log(1 - sigmoid(logits))

A value pos_weights > 1 decreases the false negative count, hence increasing the recall. 值pos_weights> 1会减少假阴性计数，从而增加召回率。 Conversely setting pos_weights < 1 decreases the false positive count and increases the precision. 相反，将pos_weights设置为<1可减少误报计数并提高精度。 This can be seen from the fact that pos_weight is introduced as a multiplicative coefficient for the positive targets term in the loss expression: 从以下事实可以看出这一点：pos_weight作为损失表达式中正目标项的乘数系数引入：

targets * -log(sigmoid(logits)) * pos_weight + (1 - targets) * -log(1 - sigmoid(logits))

训练模型并支持召回率/精度的最佳方法是什么？

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-07-03 15:21:05

训练模型并支持召回率/精度的最佳方法是什么？

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-07-03 15:21:05

解决方案1
2 已采纳 2018-07-03 15:21:05