简体繁体 English

关于量化 Mobilenet-SSD V2 所需的 300x300 输入的一些问题

[英]Some questions about the required 300x300 input of the quantized Mobilenet-SSD V2

原文 2020-04-05 12:17:47 3 1 tensorflow/ machine-learning/ deep-learning/ computer-vision/ mobilenet

I want to retrain quantized Mobilenet-SSD V2 model so i downloaded the unlabeled folder from COCO.我想重新训练量化的 Mobilenet-SSD V2 model 所以我从 COCO 下载了未标记的文件夹。 This model requires input size of 300x300 but i succeeded retrainig it once on pictures of a different size and it worked (poorly, but worked).这个 model 需要 300x300 的输入大小，但我成功地在不同大小的图片上重新训练它一次并且它工作（很差，但工作）。 Also, the code that uses the retrained model resizes the input from the camera to 500x500 and it works.此外，使用重新训练的 model 的代码将来自相机的输入调整为 500x500 并且可以正常工作。 So my question is, why is it written that the required input is 300x300 if it works with other sizes too?所以我的问题是，如果它也适用于其他尺寸，为什么它会写出所需的输入是 300x300？ Do I need to resize all the dataset to 300x300 before I label them?在我 label 之前，我是否需要将所有数据集的大小调整为 300x300？ I know it does convolution on the input so i don't think the size really matters (fix me if im wrong).我知道它会对输入进行卷积，所以我认为大小并不重要（如果我错了，请纠正我）。 As I know, the convolution occoure until we reach the end of the input.据我所知，卷积一直持续到我们到达输入的末尾。

Thanks for helping!感谢您的帮助！

1 个解决方案

If I understand correctly you are using TF Object Detection API.如果我理解正确，您正在使用 TF Object 检测 API。 A given model, as mobilenet-v2-ssd, contains 3 main blocks: [prepeocessing (normalizing and resizing] --> [Detector (backbone + detection heads)] --> [Postprocessing(bbox decoding+nms)]给定的 model，作为 mobilenet-v2-ssd，包含 3 个主要块：[预处理（规范化和调整大小）-> [检测器（骨干网 + 检测头）]-> [后处理（bbox 解码+nms）]

When they talk about required input, it is for the detector.. The checkpoint itself contain the full pipeline, which means that the preprocessing unit will do the work for you - so there is no need to resize it to 300x300 beforehand.当他们谈论所需的输入时，它是用于检测器的。检查点本身包含完整的管道，这意味着预处理单元将为您完成工作——因此无需事先将其调整为 300x300。

if for some reason you intend to inject the input by yourself directly to the detector you have do the same preprocessing what was done in the training .如果由于某种原因您打算自己将输入直接注入到检测器中，则您必须执行与训练中相同的预处理。

BTW: in the config file of the training ( https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config ) you can see the resize that was defined: image_resizer { fixed_shape_resizer { height: 300 width: 300 } } - the normalization is mobilenet normalization (changing the dynamic range of the input from [0,255] to [-1,1]顺便说一句：在训练的配置文件（ https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config ）中，您可以看到定义的调整大小： image_resizer { fixed_shape_resizer { height: 300 width: 300 } } - 归一化是 mobilenet 归一化（将输入的动态范围从 [0,255] 更改为 [-1,1]