简体   繁体   English

减少误报 ML 模型

[英]Reducing False positives ML models

Is there a nice way to enforce a limit on the false positives while training a ML model?在训练 ML model 时,有没有一种很好的方法来限制误报?

Let's suppose you start with a balanced dataset with two class.假设您从具有两个 class 的平衡数据集开始。 You develop a ML model for binary classification.您开发了一个用于二进制分类的 ML model。 As the task is easy the output distributions will be peaked respectively at 0 and 1 and overlapping around 0.5.由于任务很简单,output 分布将分别在 0 和 1 处达到峰值,并在 0.5 左右重叠。 However what you really care about is that your false positive rate is sustainable and cannot exceed a certain amount.但是你真正关心的是你的误报率是可持续的,不能超过一定的量。 So at best you would like to have that for pred > 0.8 you only have one class.所以充其量你希望 pred > 0.8 你只有一个 class。

At the moment i'm weighting the two class to penalise an error on the class "0".目前我正在加权两个 class 以惩罚 class “0”上的错误。

history = model.fit(..., class_weight={0:5, 1:1}, ...)

As expected it does decrease the fpr in the region pred > 0.8 and of course it will worsen the recall of class 1.正如预期的那样,它确实降低了 pred > 0.8 区域的 fpr,当然它会恶化 class 1 的召回。

I'm wondering if there are other ways to enforce this.我想知道是否有其他方法可以强制执行此操作。

Thank you谢谢

Depending on your problem, you can consider one-class classification svm.根据您的问题,您可以考虑一类分类 svm。 This article can be useful: https://towardsdatascience.com/outlier-detection-with-one-class-svms-5403a1a1878c .这篇文章很有用: https://towardsdatascience.com/outlier-detection-with-one-class-svms-5403a1a1878c The article shows also why one-class classification is better to consider instead of some other classical techniques, such as oversampling/undersampling or class-weighting.这篇文章还展示了为什么要考虑使用一类分类而不是其他一些经典技术,例如过采样/欠采样或类加权。 But of course it depends on the problem you want to solve.但这当然取决于您要解决的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM