TensorFlow对象检测API-内存不足

Question

I am using Tensorflow Object Detection API to train my own object detector. 我正在使用Tensorflow对象检测API训练自己的对象检测器。 I downloaded the faster_rcnn_inception_v2_coco_2018_01_28 from the model zoo ( here ), and made my own dataset (train.record (~221Mo), test.record and the label map) to fine tune it. 我从模型动物园（ here ）下载了faster_rcnn_inception_v2_coco_2018_01_28 ，并制作了自己的数据集（train.record（〜221Mo），test.record和标签图）进行微调。

But when I run it : 但是当我运行它时：

python train.py --logtostderr --pipeline_config_path=/home/username/Documents/Object_Detection/training/faster_rcnn_inception_v2_coco_2018_01_28/pipeline.config --train_dir=/home/username/Documents/Object_Detection/training/

the process is killed during the filling up shuffle buffer operation, looks like an OOM problem (16Go RAM)... 进程在填充随机缓冲区操作期间被终止，看起来像是OOM问题（16Go RAM）...

2018-06-07 12:02:51.107021: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:94] Filling up shuffle buffer (this may take a while): 410 of 2048
Process stopped

Does it exist a way to reduce the shuffle buffer size ? 是否存在减小随机缓冲区大小的方法？ What impact its size ? 它的大小有什么影响？

Then, I add some swap (115Go swap + 16Go RAM) and the filling up shuffle buffer op finished, but my training took all the RAM and swap after step 4 whereas my train.record is just about 221 Mo ! 然后，我添加了一些交换（115Go交换+ 16Go RAM）并完成了shuffle缓冲区的填充操作，但是我的训练在第4步之后占用了所有RAM和交换，而我的train.record仅为221 Mo！

I already added those lines to my pipeline.config > train_config: 我已经将这些行添加到了pipeline.config> train_config：

batch_size: 1
batch_queue_capacity: 10
num_batch_queue_threads: 8
prefetch_queue_capacity: 9

and these ones to my pipeline.config > train_input_reader : 和这些到我的pipeline.config> train_input_reader：

queue_capacity: 2
min_after_dequeue: 1
num_readers: 1

following this post . 在这篇文章之后。

I know my images are very (very very) large : 25Mo each, but as I only took 9 images to make my train.record (just to test if my installation gone well), it should not be so memory consuming right ? 我知道我的图像非常大（非常非常大）：每个25Mo，但是由于我只拍摄了9张图像来制作train.record（只是为了测试我的安装是否顺利进行），所以它不应该占用太多内存吗？

Any other idea about why it uses so much RAM ? 关于它为什么要使用这么多RAM的任何其他想法？

(BTW I only use CPU) （顺便说一句，我只使用CPU）

Answer 1

The number of images is not the problem. 图像数量不是问题。 The problem is your input image resolution(in your setting .config file). 问题是您输入的图像分辨率（在您的设置.config文件中）。 You need to change height and width value at here(similar in your .config file): 您需要在此处更改高度和宽度值（与.config文件类似）：

image_resizer {
  # TODO(shlens): Only fixed_shape_resizer is currently supported for NASNet
  # featurization. The reason for this is that nasnet.py only supports
  # inputs with fully known shapes. We need to update nasnet.py to handle
  # shapes not known at compile time.
  fixed_shape_resizer {
    height: 1200
    width: 1200
  }
}

Set to smaller value width and height and you will able to train perfectly. 设置为较小的宽度和高度值，您将可以完美地进行训练。

TensorFlow对象检测API-内存不足

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-04-08 07:01:12

TensorFlow对象检测API-内存不足

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-04-08 07:01:12

解决方案1
2 已采纳 2019-04-08 07:01:12