简体   繁体   中英

What is the best way to train your model and favoring recall/precision?

I have a binary classification problem and my dataset is composed of 5% positive labels. I'm training my model using tensorflow. Here is my results during training:

Step 3819999: loss = 0.22 (0.004 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

Step 3820999: loss = 0.21 (0.003 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

Step 3821999: loss = 0.15 (0.003 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

Step 3822999: loss = 0.15 (0.003 sec)
Accuracy = 0.955; Recall = 0.011; Precision = 0.496

What are the main strategies to improve recall? Changing dataset and adding more positive labels may solve the problem, but it seems odd to change the problem's reality...

In my point of view, there should be a way to favour "True positives" instead of "False Negatives", but I can't seem to find one.

You should use the " weighted cross entropy " instead of the classic CE. From the Tensorflow documentation :

This is like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error. The usual cross-entropy cost is defined as:

targets * -log(sigmoid(logits)) + (1 - targets) * -log(1 - sigmoid(logits))

A value pos_weights > 1 decreases the false negative count, hence increasing the recall. Conversely setting pos_weights < 1 decreases the false positive count and increases the precision. This can be seen from the fact that pos_weight is introduced as a multiplicative coefficient for the positive targets term in the loss expression:

targets * -log(sigmoid(logits)) * pos_weight + (1 - targets) * -log(1 - sigmoid(logits))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM