如何为OpenCV多核图像处理创建TBB Task Scheduler？ C ++

Question

I am learning to work with OpenCV and TBB. 我正在学习与OpenCV和TBB合作。 I need to learn how to use multiprocessing of images because I have multicore CPU and want to create muticpu support for my programs. 我需要学习如何使用图像的多处理功能，因为我具有多核CPU，并且想为我的程序创建muticpu支持。

I have read an article "The Foundations for Scalable Multi-core Software in Intel® Threading Building Blocks" in Intel®Technology Journal paper (you can find it in the pdf here http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.71.8289&rep=rep1&type=pdf ) 我已经阅读了《英特尔®技术期刊》论文中的文章“英特尔®线程构建基块中的可伸缩多核软件基础”（您可以在pdf中找到它，网址为http://citeseerx.ist.psu.edu/viewdoc/下载？doi = 10.1.1.71.8289＆rep = rep1＆type = pdf ）

They use fabonacci number calculation as an example of multiprocessing. 他们以fabonacci数计算为例进行多处理。 There is also similar fabonacci number example in TBB examples in TBB package (see ParallelTaskFib). TBB包中的TBB示例中也有类似的fabonacci编号示例（请参见ParallelTaskFib）。 The only problem is that the calculation is that simple that it is not much burden for CPU so when you run multitasking on small numbers an low CutOff it is not much efficient because it takes too much overhead. 唯一的问题是，计算非常简单，CPU负担不大，因此当您在少量的Cutoff上运行少量多任务时，效率不高，因为这会占用过多的开销。 So to learn to work with TBB I need more practical example from image processing. 因此，要学习与TBB一起工作，我需要图像处理方面的更多实际示例。 In my concept I would like to use TBB Task Scheduler. 在我的概念上，我想使用TBB Task Scheduler。 I started with a class FibTask and function ParallelFib which I renamed, changed arguments to work with vectors of images. 我从FibTask类和ParallelFib函数开始，并对其进行了重命名，并更改了参数以使用图像矢量。 The basic principle how it was designed should stay untouched. 它的设计基本原理应保持不变。 The fabonacci example includes only two children called a and b. fabonacci示例仅包含两个称为a和b的孩子。 Now the problem is that I am not sure if I can use more than two children in one function matTask (which was originally called 'execute'). 现在的问题是，我不确定在一个函数matTask（最初称为“ execute”）中是否可以使用两个以上的子代。 So I have tried to add more called, more pointers and more waiting spawn_and_wait_for_all()... In this stage I did not create any image processing functions because I want to ask you if this design is correct and if there would be not performance problems. 因此，我尝试添加更多被调用的对象，更多的指针和更多的等待spawn_and_wait_for_all（）...在此阶段中，我没有创建任何图像处理函数，因为我想问您这种设计是否正确以及是否不会出现性能问题。 It is not finished. 还没结束 I will wait for your suggestions to fix possible mistakes in my concept. 我将等待您的建议，以纠正我的概念中可能出现的错误。

My basic idea is to use some filter function like gaussian blur on lena.jpg. 我的基本想法是在lena.jpg上使用一些过滤器功能，例如高斯模糊。 First I would pass a number of threads. 首先，我将传递多个线程。 I have 8 cores so only 8 threads I can pass as maximum. 我有8个核心，因此最多只能传递8个线程。 I plan to separate lena image to 8 strips of same size and then to copy pixels to vectors (8 basic vectors), Then they should be blured. 我打算将lena图像分成8个相同大小的条，然后将像素复制到矢量（8个基本矢量）上，然后将它们模糊化。 Then another stage is that I need to create next 7-8 images which overlap the margins of the 8 sections. 然后另一个阶段是，我需要创建下一个7-8张图像，这些图像与8个部分的边距重叠。 I want to repeat only the bluring action. 我只想重复模糊动作。 Finally one more pass is needed for area which could be rest of the image (the remains from source_image.rows()/8). 最后，对于可能是图像其余部分的区域，还需要再进行一次遍历（其余部分来自source_image.rows（）/ 8）。

The main thing I need to solve (I do not know how to do) is stop infinite loop. 我需要解决的主要问题（我不知道该怎么做）是停止无限循环。 Should I create different class and different methods for 1) coping and 2) bluring 3) cropping 4) pasting ? 我是否应该针对1）应对和2）模糊3）裁剪4）粘贴创建不同的类和不同的方法？ Or can I pass everything (copy+blur) in one call? 或者我可以在一次呼叫中传递所有内容（副本+模糊）？ This is the difference from fabonnaci number example because that code did the same thing, but I need to do more different things... So what should be the logic, how to sort things, how to name functions? 这与fabonnaci数字示例有所不同，因为该代码执行相同的操作，但是我需要执行更多不同的操作...那么逻辑应该是什么，如何对事物进行排序，如何命名函数？

Easier solution would be to use 8 strips of same size... And then 7-8 overlaying areas. 较简单的解决方案是使用8个相同大小的条带，然后使用7-8个覆盖区域。

The code bellow prints no error, but it is not suppose to return correct result because It is just temporal concept. 下面的代码不会显示任何错误，但是由于它只是时间概念，因此不应该返回正确的结果。

#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"
#include <iostream>
#include <stdlib.h>
#include <stdio.h>

#include "tbb/task.h"
#include "tbb/task_scheduler_init.h"

#define CutOff 12

using namespace cv;

void SerialAction(int n){};

/**

**/
class matTask: public tbb::task {
public:
    int n;
    const int offset;
    std::vector<cv::Mat> main_layers;
    std::vector<cv::Mat> overlay_layers;

    matTask( std::vector<cv::Mat>main_layers_, std::vector<cv::Mat> overlay_layers_, int n_, const int offset_ ) :
        main_layers(main_layers_),
        overlay_layers(overlay_layers_),
        n(n_), offset(offset_)
        {}

        task* execute() {
        if( n<CutOff ) {
             SerialAction(n);
            } 
        else {
            // Main layers - copy regions
            matTask& a = *new( allocate_child() )
                matTask(main_layers,overlay_layers,n,0);
            matTask& b = *new( allocate_child() )
                matTask(main_layers,overlay_layers,n-1,0);
            matTask& c = *new( allocate_child() )
                matTask(main_layers,overlay_layers,n-2,0);
            matTask& d = *new( allocate_child() )
                matTask(main_layers,overlay_layers,n-3,0);
            matTask& e = *new( allocate_child() )
                matTask(main_layers,overlay_layers,n-4,0);
            matTask& f = *new( allocate_child() )
                matTask(main_layers,overlay_layers,n-5,0);
            matTask& g = *new( allocate_child() )
                matTask(main_layers,overlay_layers,n-6,0);
            matTask& h = *new( allocate_child() )
                matTask(main_layers,overlay_layers,n-7,0);

            spawn_and_wait_for_all( a );
            spawn_and_wait_for_all( b );
            spawn_and_wait_for_all( c );
            spawn_and_wait_for_all( d );
            spawn_and_wait_for_all( e );
            spawn_and_wait_for_all( f );
            spawn_and_wait_for_all( g );
            spawn_and_wait_for_all( h );
            // In the case of effect:
            // Overlay layers

            matTask& ab = *new( allocate_child() )
                matTask(main_layers,overlay_layers,n,offset);
            matTask& bc = *new( allocate_child() )
                matTask(main_layers,overlay_layers,n-1,offset);
            matTask& cd = *new( allocate_child() )
                matTask(main_layers,overlay_layers,n-2,offset);
            matTask& de = *new( allocate_child() )
                matTask(main_layers,overlay_layers,n-2,offset);
            matTask& ef = *new( allocate_child() )
                matTask(main_layers,overlay_layers,n-2,offset);
            matTask& gh = *new( allocate_child() )
                matTask(main_layers,overlay_layers,n-2,offset);

            // ... + crop .. depends on size of kernel

            set_ref_count(8);
            spawn( b );
            spawn_and_wait_for_all( a );
        }
    return NULL;
    }
};
void ParallelAction( std::vector<cv::Mat> main, std::vector<cv::Mat> overlays, int n, const int offset ) {
    matTask& a = *new(tbb::task::allocate_root())
    matTask(main, overlays, n,offset);
    tbb::task::spawn_root_and_wait(a);
}

int main( int argc, char** argv )
{       
    int threads = 8;

    std::vector<cv::Mat> main_layers;
    std::vector<cv::Mat> overlays;

    cv:: Mat sourceImg;
    sourceImg = imread( "../../data/lena.jpg");
    if ( sourceImg.empty() )
        return -1;

    const int offset = (int) sourceImg.rows / threads;


    cv::setNumThreads(0);
    ParallelAction(main_layers, overlays, threads, offset );

    // GaussianBlur( src, dst, Size(3,3), 0, 0, BORDER_DEFAULT );

    return 0;
}

Edit: Reaction to Anton 's answer. 编辑：对安东答案的反应。 If I use operator() overload, when exactly is the operator () applied? 如果我使用operator（）重载，那么什么时候才应用operator（）？ Also is it possible to add some methods to ApplyFoo? 也可以向ApplyFoo添加一些方法吗？ WWhen the () is overloaded, it seems there can be only one method. W当（）重载时，似乎只能有一种方法。

void Foo(float a){};

class ApplyFoo {
    float *const my_a;  
public:
    void operator()( const tbb::blocked_range<size_t>& r ) const {
        float *a = my_a;
        for( size_t i=r.begin(); i!=r.end(); ++i ) 
           Foo(a[i]);
    }
    ApplyFoo( float a[] ) :
        my_a(a) // initiate my_a
    {}
};

Answer 1

The article you point to is from 2007! 您指向的文章来自2007年！ It's awfully outdated (though still relevant since TBB keeps all the source compatibility). 它已经过时了（尽管仍然有用，因为TBB保留了所有源兼容性）。 The tbb::task interface is considered low-level and it is not that convenient for application development. tbb::task接口被认为是低级的，它对于应用程序开发不是很方便。 Please refer to tbb::parallel_for , tbb::parallel_invoke , and in particular to tbb::task_group which has direct support for cancellation. 请参考 tbb::parallel_for ， tbb::parallel_invoke ，尤其是直接支持取消的tbb::task_group 。

如何为OpenCV多核图像处理创建TBB Task Scheduler？ C ++

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-06-30 16:31:34

如何为OpenCV多核图像处理创建TBB Task Scheduler？ C ++

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-06-30 16:31:34

解决方案1
3 已采纳 2016-06-30 16:31:34