简体   繁体   English

是否有正确的方法以编程方式防止短暂的错误识别(在 object 检测应用程序中)触发操作?

[英]Is there a right way to programmatically prevent a brief wrong recognition (in object detection app) to trigger an action?

Context语境

I'm building an app which performs real-time object detection throught the camera module of the device.我正在构建一个应用程序,它通过设备的摄像头模块执行实时 object 检测 The render is like the image below.渲染如下图。
在此处输入图像描述
Let's say I try to recognize an apple, most of the time the app will recognize an apple.假设我尝试识别一个苹果,大多数情况下应用程序会识别一个苹果。 However, sometimes, the app will recognize the wrong fruit (let's say a lemon) on a few camera frames.但是,有时,应用程序会在几个相机帧上识别出错误的水果(比如柠檬)。

Goal目标

As the recognition of a fruit triggers an action in my code, my goal is to programmatically prevent a brief wrong recognition to trigger an action, and only take into account the majority result .由于识别水果会在我的代码中触发动作,因此我的目标是以编程方式防止短暂的错误识别触发动作,并且只考虑多数结果

What I've tried我试过的

I tried this way: if the same fruit is recognized several frames in a row , I assumed the result is supposed to be the right one.我试过这样:如果同一水果连续几帧被识别,我认为结果应该是正确的。 But as my device process image recognition several times per second, even a wrong guess can be recognized several times in a row, and leads to the wrong action.但是由于我的设备每秒会处理多次图像识别,即使是错误的猜测也可以连续识别多次,并导致错误的动作。

Question问题

Is there any known techniques for avoiding this behavior?是否有任何已知的技术可以避免这种行为?

I feel like you've already answered your own question.我觉得你已经回答了你自己的问题。 In general the interpretation of a model's inference is it's own tuning step.一般来说,模型推理的解释是它自己的调整步骤。 You know for example in logistic regression tasks that the threshold does NOT have to be 0.5.例如,在逻辑回归任务中,您知道阈值不必为 0.5。 In fact, it's quite common to flex the threshold to see what the recall and precision are at various thresholds, and you can pick a threshold that works given your business/product problem.事实上,通过调整阈值来查看不同阈值下的召回率和准确率是很常见的,您可以根据您的业务/产品问题选择一个可行的阈值。 (Fraud detection might favor high recall if you never want to miss any fraud... or high precision if you don't want to annoy users with lots of false positives). (如果您不想错过任何欺诈行为,欺诈检测可能有利于高召回率……如果您不想因大量误报而惹恼用户,则可能有利于高精度)。

In video this broad concept is extended to multiple frames as you know.如您所知,在视频中,这个广泛的概念扩展到多帧。 You now have the tune the hyperparameters, "how many frames total?"你现在已经调整了超参数,“总共有多少帧?” and "how many frames voting [apple]"?和“多少帧投票[苹果]”?

If you are analyzing fruit going down a conveyer belt one by one, and you know each piece of fruit will be in frame for X seconds and you are shooting at 60 fps, maybe you want 60 * X frames.如果您正在逐个分析传送带上的水果,并且您知道每片水果将在 X 秒内处于帧中,并且您正在以 60 fps 的速度拍摄,那么您可能需要 60 * X 帧。 And maybe you want 90% of the frames to agree.也许您希望 90% 的帧都同意。

You'll want to visualize how often your detector "flips" detections so you can make a business/product judgement call on what your threshold ought to be.您需要可视化您的检测器“翻转”检测的频率,以便您可以对您的阈值应该是多少做出业务/产品判断。

This answer hasn't been very helpful in giving you a bright line rule here, but I hope it's helpful in suggesting that there is in fact NO bright line rule.这个答案对在这里给你一个明线规则没有太大帮助,但我希望它有助于暗示实际上没有明线规则。 You have to understand the problem to set the key hyperparameters:您必须了解设置关键超参数的问题:

  1. For each frame, is top-1 acc sufficient, or do I need [.75] or higher confidence?对于每一帧,top-1 acc 是否足够,还是我需要 [.75] 或更高的置信度?
  2. How many frames get to vote?有多少帧可以投票? Say [100].说[100]。
  3. How many correlated votes are necessary to trigger an actual signal?触发实际信号需要多少相关投票? maybe it's [85].也许是 [85]。

The above algo assumes you take a hardmax after step 1. another option would be to just average all 100 frames and pick a threshold.上述算法假设您在第 1 步之后采用 hardmax。另一种选择是平均所有 100 帧并选择一个阈值。 that's kind of a soft label version of the above algo.这是上述算法的一种软 label 版本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM