简体   繁体   中英

Tensor Flow 2 Object Detection API2 Batch Non Max Suppression in trained Faster-RCNN network on TPU does not seem to work. Is this a bug?

I followed the new Tensorflow 2 Object Detection API 2 documentation to train a Faster RCNN detector using transfer learning on Google Cloud Platform TPU. After the training is completed, I dowloaded the result on my workstation and exported the model using the tensorflow 2 implementation ('object_detection/exporter_main_v2.py'). I followed the official instructions and setup the environment locally (running on macOS catalina, tensorflow 2.2, python 3.6 etc)

However the Non-Max-Supprersion (NMS) part of the inference pipeline seems not to be working as there are cases where bounding boxes of different classes overlap almost completed. I debugged the code to ensure that the object detection api implementation of NMS (batch_multiclass_non_max_suppression method in object_detection/core/post_processing.py) is called in the inference pipeline for the Faster-RCNN model. It is called twice as expected by the Fast-RCNN architecture on inference.

The instructions I used for GCPs AI-Platform TPU, are the ones in the official object detection api page: link . I made corrections in the training parameters to use the TPU runtime and Python version that are supported on GCP as the actual example are not supported. Instead I used:

gcloud ai-platform jobs submit training whoami object_detection date +%m_%d_%Y_%H_%M_%S
--job-dir=gs://${MODEL_DIR}
--package-path./object_detection
--module-name object_detection.model_main_tf2
--runtime-version 2.2
--python-version 3.7
--scale-tier BASIC_TPU
--region us-central1
--
--use_tpu true
--model_dir=gs://${MODEL_DIR}
--pipeline_config_path=gs://${PIPELINE_CONFIG_PATH}

The dataset I used for training was the Pets example from the official object detection api page: link . However I exported it using the Tensorflow 2 Object Detection API 2 methods for consistency.

The pre-trained neural network I uses was the Faster R-CNN ResNet101 V1 1024x1024 trained on TPU.

The configuration file I used was faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8.config for TPU training. I changed the number of classes to 37. I also changed the number of batches to batch_size: 32 as gpc on tpu v2 was crashing. The fine_tune_checkpoint_type was changed to fine_tune_checkpoint_type: "detection" and the only data augmentation I used was random_horizontal_flip.

The official object detection 2 model zoo reports results on TPU trained architectures other than SSD. However the official object detection tpu compatibility guide mentions that currently SSD is only supported while non max suppression is not.

Why NMS is not working?

I think that's because the batch_multiclass_non_max_suppression method is a class-aware NMS (or at least is what I understood). This means that, for each class, among all the boxes that belong to the same class with IOUs greater than a threshold only the box with the highest score is retained.

I think you want a class-agnostic NMS ( use_class_agnostic_nms: True ). Moreover, if you want one class for detection you should also set max_classes_per_detection: 1 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM