简体繁体中英

Is there a right way to programmatically prevent a brief wrong recognition (in object detection app) to trigger an action?

原文 2021-12-07 15:25:30 2 1 android/ tensorflow/ object-detection/ image-recognition/ object-recognition

Context

I'm building an app which performs real-time object detection throught the camera module of the device. The render is like the image below.

Let's say I try to recognize an apple, most of the time the app will recognize an apple. However, sometimes, the app will recognize the wrong fruit (let's say a lemon) on a few camera frames.

Goal

As the recognition of a fruit triggers an action in my code, my goal is to programmatically prevent a brief wrong recognition to trigger an action, and only take into account the majority result .

What I've tried

I tried this way: if the same fruit is recognized several frames in a row , I assumed the result is supposed to be the right one. But as my device process image recognition several times per second, even a wrong guess can be recognized several times in a row, and leads to the wrong action.

Question

Is there any known techniques for avoiding this behavior?

1 answers

I feel like you've already answered your own question. In general the interpretation of a model's inference is it's own tuning step. You know for example in logistic regression tasks that the threshold does NOT have to be 0.5. In fact, it's quite common to flex the threshold to see what the recall and precision are at various thresholds, and you can pick a threshold that works given your business/product problem. (Fraud detection might favor high recall if you never want to miss any fraud... or high precision if you don't want to annoy users with lots of false positives).

In video this broad concept is extended to multiple frames as you know. You now have the tune the hyperparameters, "how many frames total?" and "how many frames voting [apple]"?

If you are analyzing fruit going down a conveyer belt one by one, and you know each piece of fruit will be in frame for X seconds and you are shooting at 60 fps, maybe you want 60 * X frames. And maybe you want 90% of the frames to agree.

You'll want to visualize how often your detector "flips" detections so you can make a business/product judgement call on what your threshold ought to be.

This answer hasn't been very helpful in giving you a bright line rule here, but I hope it's helpful in suggesting that there is in fact NO bright line rule. You have to understand the problem to set the key hyperparameters:

For each frame, is top-1 acc sufficient, or do I need [.75] or higher confidence?
How many frames get to vote? Say [100].
How many correlated votes are necessary to trigger an actual signal? maybe it's [85].

The above algo assumes you take a hardmax after step 1. another option would be to just average all 100 frames and pick a threshold. that's kind of a soft label version of the above algo.

Trigger ACTION_POINTER_DOWN event programmatically

Face detection, Face Recognition

Registration in app right way

3d object recognition for AR android app

the right way to set text in TextView programmatically

Programmatically trigger app update from Google play

android cordova app trigger update programmatically

Phonegap Android app trigger update programmatically

Instantiate custom view programmatically the right way

speech recognition: what to do, when i speak "write", but the recognition is "right". an App, written in Kotlin

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Trigger ACTION_POINTER_DOWN event programmatically Face detection, Face Recognition Registration in app right way 3d object recognition for AR android app the right way to set text in TextView programmatically Programmatically trigger app update from Google play android cordova app trigger update programmatically Phonegap Android app trigger update programmatically Instantiate custom view programmatically the right way speech recognition: what to do, when i speak "write", but the recognition is "right". an App, written in Kotlin

Related Tags

Is there a right way to programmatically prevent a brief wrong recognition (in object detection app) to trigger an action?

Question

Context

Goal

What I've tried

Question

1 answers

solution1 0 2021-12-15 19:04:44

solution1
0 2021-12-15 19:04:44