简体繁体 English

Caffe中的批处理模式-无性能提升

[英]Batch processing mode in Caffe - no performance gains

原文 2015-09-11 11:32:43 4 1 c++/ image-processing/ deep-learning/ caffe

Following on this thread I reimplemented my image processing code to send in 10 images at once (ie I now have the num property of the input blob set to 100 instead of 10). 在该线程之后，我重新实现了图像处理代码以一次发送10张图像（即，我现在将输入Blob的num属性设置为100而不是10）。

However, the time required to process this batch is 10 times bigger than originally. 但是，处理此批次所需的时间比原始时间大10倍。 Which means that I did not get any performance increase. 这意味着我没有得到任何性能提升。

Is that reasonable or did I make something wrong? 那是合理的还是我做错了什么？

I am running Caffe in CPU mode. 我在CPU模式下运行Caffe。 Unfortunately GPU mode is not an option for me. 不幸的是，GPU模式不是我的选择。

1 个解决方案

Update: Caffe now natively supports parallel processing of multiple images when using multiple GPUs. 更新：现在，当使用多个GPU时，Caffe本机支持并行处理多个图像。 Though it seems relatively simple to implement base on the current implementation of GPU parallelism, at the moment there's no similar support for parallel processing on multiple CPUs. 尽管基于当前GPU并行性的实现似乎相对简单，但目前尚不存在对多个CPU并行处理的类似支持。

Considering that the main problem with implementing parallelism is the syncing you need during training If you just want to process your images in parallel (as opposed to training the model), then you could load several copies of the same network to memory (whether through python with multiprocessing or c++ with multi-threading), and process each image on a different network. 考虑到实现并行性的主要问题是训练期间需要的同步如果您只想并行处理图像（而不是训练模型），则可以将同一网络的多个副本加载到内存（无论通过python）多处理或c ++和多线程），并在不同的网络上处理每个图像。 It would be simple and quite effective, especially if you load the networks once and then just process a large amount of images. 这将是简单且相当有效的，尤其是如果您一次加载网络然后仅处理大量图像。 Nevertheless, GPUs are much faster :) 不过，GPU的速度要快得多：)

Caffe doesn't process multiple images in parallel, the only saving you get by batch processing several images is in the time it takes to transfer the image data back and forth between Caffe's framework, which could be significant when dealing with the GPU. Caffe不会并行处理多个图像，通过批量处理多个图像而获得的唯一节省是在Caffe的框架之间来回传输图像数据所花费的时间，这在处理GPU时可能很重要。

IIRC there are several attempts to make Caffe process images in parallel, but most focus on the GPU implementation (CUDNN, CUDA Streams etc.), with few attempts to add parallelism to the CPU code (OpenBLAS's multithread mode, or simply running on multiple threads). IIRC尝试了多种并行制作Caffe过程映像的尝试，但大多数尝试着重于GPU的实现（CUDNN，CUDA流等），很少尝试向CPU代码添加并行性（OpenBLAS的多线程模式，或者只是在多个线程上运行））。 Of those I believe only the CUDNN option is currently part of the stable version of Caffe, but obviously requires a GPU. 我认为其中只有CUDNN选项是Caffe稳定版的一部分，但显然需要GPU。 You can try to look at one of the pull requests about this matter on Caffe's github page and see if it works for you, but note that it might cause compatibilities issue with your current version. 您可以尝试在Caffe的github页面上查看有关此问题的请求请求之一，并查看它是否对您有用，但是请注意，这可能会导致与当前版本的兼容性问题。

This is one such version that in the past I've used, though it's no longer maintained: https://github.com/BVLC/caffe/pull/439 这是我过去使用过的一个版本，尽管不再维护： https : //github.com/BVLC/caffe/pull/439

I've also noticed in the last comment of the above issue that there's some speed up to the CPU code on this pull request as well, though I've never tried it myself: https://github.com/BVLC/caffe/pull/2610 我在上述问题的最后一条评论中也注意到，虽然我从未亲自尝试过，但此拉取请求的CPU代码也有所提高： https : //github.com/BVLC/caffe/拉/ 2610