简体   繁体   中英

Tensorflow Object Detection API multi-class error

I am creating a 11 class object detector using the faster-RCNN model set up to the maximum size of 300x400 in the image-resizer tag. This is due to CUDA OOM error popping up if I go any higher as the GPU is a 1050 Ti, 4Gb ver, so I have approximately 3800-3900 Mb of model run-time training memory.

I have followed erishima's steps and mutated them with the Pet's scripts and Dati Tran's to generate the TFRecord files.

The steps were as follows:

  1. Create the labels for the categories using labelImg.
  2. Use the name field in labelImg to annotate the class of the image file.
  3. Create a CSV file and extract the filename, class, xmin, ymin, xmax, ymax from the XML file. (Custom Script)
  4. Create a train and test/eval CSV from the main CSV file.
  5. Generate the TFRecord files to be inputted into the config file. Train and Test.(Dati Tran's script modified to suit needs)
  6. Modify faster_rcnn_config without touching the hyper-parameters.
  7. Created a label_map.pbtxt file which corresponded to the names of the classes. Started from 1 as stated in many other answers related to this topic.
  8. Started training the model via the stated method.

The dataset for the classes is custom and the images/class varies from 2500 to 300. The dataset has no definition of orientation of the object and the difficulty of detection in the image even though every possible angle of the object is present in those images.

The problem which arises when I have trained to a loss value of .002 after 217k steps was that a single class was enveloping the objects of all other classes whether I ran the detector on a video or images. I have not tried to run the eval.py script as that takes too long on this setup and those I can't really see the mAP for the classes but I would assume that it should be redundant information as the problem should be in the dataset set preparation method or in the dataset itself.

When retrained from anew for 60k steps, the problem persisted but with another class enveloping all the other.

The warnings shown were:

  • The Sparse Index Tensor going to take alot of memory. Can I change the code so that this does not pop-up and possibly save me some precious memory.
  • Wanted [x,?,?,y], got [x,y,z,a,b] instead. This one stops the training. Got this 2 times in the training upto 217k steps. Have no idea where this one originates; probably, the dataset.

If someone can show me even a hint to the proper fix to this, I would highly appreciate it.

I believe you have class imbalance. Had similar problem in the past

Do an analysis of your dataset - make sure # of images per class are in similar order of magnitude.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM