简体繁体中英

Batch processing mode in Caffe - no performance gains

原文 2015-09-11 11:32:43 2 1 c++/ image-processing/ deep-learning/ caffe

Following on this thread I reimplemented my image processing code to send in 10 images at once (ie I now have the num property of the input blob set to 100 instead of 10).

However, the time required to process this batch is 10 times bigger than originally. Which means that I did not get any performance increase.

Is that reasonable or did I make something wrong?

I am running Caffe in CPU mode. Unfortunately GPU mode is not an option for me.

1 answers

Update: Caffe now natively supports parallel processing of multiple images when using multiple GPUs. Though it seems relatively simple to implement base on the current implementation of GPU parallelism, at the moment there's no similar support for parallel processing on multiple CPUs.

Considering that the main problem with implementing parallelism is the syncing you need during training If you just want to process your images in parallel (as opposed to training the model), then you could load several copies of the same network to memory (whether through python with multiprocessing or c++ with multi-threading), and process each image on a different network. It would be simple and quite effective, especially if you load the networks once and then just process a large amount of images. Nevertheless, GPUs are much faster :)

Caffe doesn't process multiple images in parallel, the only saving you get by batch processing several images is in the time it takes to transfer the image data back and forth between Caffe's framework, which could be significant when dealing with the GPU.

IIRC there are several attempts to make Caffe process images in parallel, but most focus on the GPU implementation (CUDNN, CUDA Streams etc.), with few attempts to add parallelism to the CPU code (OpenBLAS's multithread mode, or simply running on multiple threads). Of those I believe only the CUDNN option is currently part of the stable version of Caffe, but obviously requires a GPU. You can try to look at one of the pull requests about this matter on Caffe's github page and see if it works for you, but note that it might cause compatibilities issue with your current version.

This is one such version that in the past I've used, though it's no longer maintained: https://github.com/BVLC/caffe/pull/439

I've also noticed in the last comment of the above issue that there's some speed up to the CPU code on this pull request as well, though I've never tried it myself: https://github.com/BVLC/caffe/pull/2610

Batch processing mode in Caffe

Performance Gains with Visual Studio Whole Program Optimization

benchmarking trig lookup tables performance gains vs cpp implementation

Are measurable performance gains possible from using VC++'s __assume?

Performance gains in re-writing C# code in C/C++

Can caffe take in a batch of inputs with different resolutions at once? If so how?

How to work in batch mode

openmp nested loop processing performance

Poor opengl image processing performance

Performance issue with immediate mode in OpenGL

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Batch processing mode in Caffe Performance Gains with Visual Studio Whole Program Optimization benchmarking trig lookup tables performance gains vs cpp implementation Are measurable performance gains possible from using VC++'s __assume? Performance gains in re-writing C# code in C/C++ Can caffe take in a batch of inputs with different resolutions at once? If so how? How to work in batch mode openmp nested loop processing performance Poor opengl image processing performance Performance issue with immediate mode in OpenGL

Related Tags

Batch processing mode in Caffe - no performance gains

Question

1 answers

solution1 4 ACCPTED 2015-09-11 20:10:48

solution1
4 ACCPTED 2015-09-11 20:10:48