简体   繁体   中英

What is the difference between a base network and detection network in Deep learning?

I started working recently on object-detection algorithms. And I usually encounter with models having a base network as LeNet or PVA-Net and then a different architecture or model for detection. But I never understood how these base networks and detection network help and how to choose a particular model as base or detection network?

Assume that you are building a model for object detection.

A CNN object detection model (for simplicity, let's choose SSD) may consist of a base network which serves as the feature extraction, while the detection modules get the input features (extracted from the base network) to generate the outputs which contain the object classes, and coordinates of objects detected (including the center (x, y), the height (h) and the width (w) of predicted box).

For the base network, we usually take the pre-trained network such as ResNet , VGG , etc which already trained on large datasets like ImageNet with the hope that the base network would produce a good set of features for detection layer (or at least we don't need to tune so much the parameters of the base network during training which helps the model converges soon).

For the detection modules, it depends on what kind of methods you want to use, for instance, one-stage methods (SSD, RetinaNet, YOLO, so on) or two-stage methods (Faster R-CNN, Masked R-CNN, etc). There is a trade-off between the accuracy and speed among those methods which is an important indicator of which detection module you should pick.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM