简体   繁体   English

训练模型并支持召回率/精度的最佳方法是什么?

[英]What is the best way to train your model and favoring recall/precision?

I have a binary classification problem and my dataset is composed of 5% positive labels. 我有一个二进制分类问题,我的数据集由5%的正向标签组成。 I'm training my model using tensorflow. 我正在使用张量流训练我的模型。 Here is my results during training: 这是我在训练期间的结果:

Step 3819999: loss = 0.22 (0.004 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

Step 3820999: loss = 0.21 (0.003 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

Step 3821999: loss = 0.15 (0.003 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

Step 3822999: loss = 0.15 (0.003 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

What are the main strategies to improve recall? 改善召回率的主要策略是什么? Changing dataset and adding more positive labels may solve the problem, but it seems odd to change the problem's reality... 更改数据集并添加更多肯定标签可能可以解决问题,但是更改问题的实际情况似乎很奇怪。

In my point of view, there should be a way to favour "True positives" instead of "False Negatives", but I can't seem to find one. 以我的观点,应该有一种方法支持“真肯定”而不是“假否定”,但是我似乎找不到。

You should use the " weighted cross entropy " instead of the classic CE. 您应该使用“ 加权交叉熵 ”代替经典的CE。 From the Tensorflow documentation : 从Tensorflow文档中:

This is like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error. 类似于sigmoid_cross_entropy_with_logits(),不同之处在于pos_weight可通过相对于负误差增加或减小正误差的成本来权衡取回和精度。 The usual cross-entropy cost is defined as: 通常的交叉熵成本定义为:

targets * -log(sigmoid(logits)) + (1 - targets) * -log(1 - sigmoid(logits))

A value pos_weights > 1 decreases the false negative count, hence increasing the recall. 值pos_weights> 1会减少假阴性计数,从而增加召回率。 Conversely setting pos_weights < 1 decreases the false positive count and increases the precision. 相反,将pos_weights设置为<1可减少误报计数并提高精度。 This can be seen from the fact that pos_weight is introduced as a multiplicative coefficient for the positive targets term in the loss expression: 从以下事实可以看出这一点:pos_weight作为损失表达式中正目标项的乘数系数引入:

targets * -log(sigmoid(logits)) * pos_weight + (1 - targets) * -log(1 - sigmoid(logits))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 有没有办法在自定义 Tensorflow 模型中找到每个类的平均精度和召回率? - Is there a way to find the average precision and recall of each class in the custom Tensorflow model? model 中的精度和召回率相同 - Precision and recall are the same within a model 指导 tensorflow keras model 训练以实现二进制分类的最佳召回精度 0.95 - Guiding tensorflow keras model training to achieve best Recall At Precision 0.95 for binary classification 在 BIG 数据集上训练 model 的最佳实践是什么 - what is the best practices to train model on BIG dataset 保存和调用函数序列和 arguments 的最佳方法是什么 - What is the best way to save and recall a sequence of functions and arguments Object 用 CNN 识别,训练我的 model 的最佳方法是什么:照片还是视频? - Object recognition with CNN, what is the best way to train my model : photos or videos? 对于 keras model,如何获得精度和召回率? - How to get precision and recall, for a keras model? 精度和召回率 1 - Precision and recall of 1 向 class 添加选项的最佳方法是什么? - What is the best way to add option to your class? 在不同长度的批量序列上训练 LSTM 网络的最佳方法是什么? - what is the best way to train an LSTM network on batches of sequences with different lengths?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM