简体   繁体   English

在tbb parallel_pipeline中调整粒度

[英]Adjusting granularity in tbb parallel_pipeline

Task for pipeline is following: 管道任务如下:

  1. read sequentially huge(10-15k) amount of ~100-200 Mb compressed files 顺序读取约100-200 Mb压缩文件(10-15k)
  2. decompress each file in parallel 并行解压缩每个文件
  3. deserialize each decompressed file in parallel 并行反序列化每个解压缩的文件
  4. process result deserialized objects and get some values based on all objects (mean, median, grouppings etc.) 处理结果反序列化的对象并获得基于所有对象的一些值(均值,中位数,分组等)

When I get decompressed file memory buffer, serialized blocks go one after one, so I'd like to pass them to the next filter in the same manner or, at least, adjust this process by packing serialized blocks in groups of some number and then pass. 当我获得解压缩的文件内存缓冲区时,序列化的块一个接一个地移动,因此我想以相同的方式将其传递到下一个过滤器,或者至少通过将序列化的块打包成一定数量的组来调整此过程,然后通过。 However (as I understand it) tbb_pipeline makes me pass pointer to buffer with ALL serialized blocks because each filter has to get pointer and return pointer. 但是(据我了解),tbb_pipeline使我将指针传递给具有所有序列化块的缓冲区,因为每个过滤器都必须获取指针并返回指针。

Using concurrent queue to accumulate packs of serialized objects kills matter of using tbb_pipeline, as I understand. 据我所知,使用并发队列来累积序列化对象的包会杀死使用tbb_pipeline的麻烦。 Moreover, constness of operator() in filters doesn't allow to have my own intermediate "task pool"(but nevertheless if each thread had its own local copy of storage for "tasks" and just cut right pieces from it, it would be great) 此外,过滤器中operator()的常量不允许拥有我自己的中间“任务池”(但是,如果每个线程都具有自己的“任务”存储本地副本并只是从中切出正确的片段,大)

Primary question: Is there some way to "adjust" granularity in this situation? 主要问题:在这种情况下,有什么方法可以“调整”粒度? (ie some filter gets pointer to all serialized objects and passes to the next filter small pack of objects) (即某个过滤器获取指向所有序列化对象的指针,并传递到下一过滤器对象小包装)

Reformatting(splitting etc.) input files is almost impossible. 重新格式化(分割等)输入文件几乎是不可能的。

Secondary question: When I accumulate processing results, I don't really care about any kind of order, I need only aggregating statistics. 第二个问题:当我累积处理结果时,我并不真正在乎任何顺序,我只需要汇总统计信息即可。 Can I use parallel filter instead of serial_out_of_order and accumulate results of processing for each thread somewhere, and then just merge them? 我可以使用并行过滤器代替serial_out_of_order并在某个地方累积每个线程的处理结果,然后将它们合并吗?

However (as I understand it) tbb_pipeline makes me pass pointer to buffer with ALL serialized blocks because each filter has to get pointer and return pointer. 但是(据我了解),tbb_pipeline使我将指针传递给具有所有序列化块的缓冲区,因为每个过滤器都必须获取指针并返回指针。

First I think, it's better to use more modern, type-safe form of the pipeline: parallel_pipeline . 首先,我认为最好使用更现代,类型安全的管道形式: parallel_pipeline It does not prescribe you to pass any specific pointer of any specific data. 它不要求您传递任何特定数据的任何特定指针。 You just specify which data of which type is needed for the next stage to be able to process it. 您只需指定下一阶段需要哪种类型的数据就可以对其进行处理。 So it's rather a matter of how your first filter partitions the data to be processed by the following filters. 因此,这与第一个过滤器如何划分要由以下过滤器处理的数据有关。

Primary question : Is there some way to "adjust" granularity in this situation? 主要问题 :在这种情况下,有什么方法可以“调整”粒度? (ie some filter gets pointer to all serialized objects and passes to the next filter small pack of objects) (即某个过滤器获取指向所有序列化对象的指针,并传递到下一过滤器对象小包装)

You can safely embed one parallel algorithm into another in order to change the granularity for some stages, eg on the top level, 1st pipeline goes through the file list; 您可以安全地将一种并行算法嵌入另一种并行算法中,以便在某些阶段更改粒度,例如,在顶层,第一个管道通过文件列表; 2nd pipeline reads big blocks of the file on the nested level; 第二个管道在嵌套级别读取文件的大块; and finally, the innermost pipeline breaks down the big blocks to smaller ones for some of the 2nd level stages. 最后,最里面的管道将大块分解为一些第二阶段的块。 See a general example of nesting below. 请参见下面的嵌套示例。

Secondary question : Can I use parallel filter instead of serial_out_of_order and accumulate results of processing for each thread somewhere, and then just merge them? 第二个问题 :我可以使用并行过滤器代替serial_out_of_order并在某个地方累积每个线程的处理结果,然后将它们合并吗?

Yes, you can always use a parallel filter if it does not modify a shared data. 是的,如果不修改共享数据,则始终可以使用并行过滤器。 For example, you can use tbb::combinable in order to collect thread-specific partial sums and then combine them. 例如,您可以使用tbb::combinable来收集特定于线程的部分和,然后将它们组合。

but nevertheless if each thread had its own local copy of storage for "tasks" and just cut right pieces from it, it would be great 但是,如果每个线程都有自己的本地存储用于“任务”的副本,并从中正确切出片段,那就太好了

yes, they have. 是的,他们有。 Each thread has its own local pool of tasks. 每个线程都有其自己的本地任务池。


General example of nested parallel_pipelines 嵌套parallel_pipelines的一般示例

parallel_pipeline( 2/*only two files at once*/,
    make_filter<void,std::string>(
        filter::serial,
        [&](flow_control& fc)-> std::string {
            if( !files.empty() ) {
                std::string filename = files.front();
                files.pop();
                return filename;
             } else {
                fc.stop();
                return "stop";
            }
        }    
    ) &
    make_filter<std::string,void>(
        filter::parallel,
        [](std::string s) {

            // a nested pipeline
            parallel_pipeline( 1024/*only two files at once*/,
                make_filter<void,char>(
                    filter::serial,
                    [&s](flow_control& fc)-> char {
                        if( !s.empty() ) {
                            char c = s.back();
                            s.pop_back();
                            return c;
                         } else {
                            fc.stop();
                            return 0;
                        }
                    }    
                ) &
                make_filter<char,void>(
                    filter::parallel,
                    [](char c) {
                        putc(c, stdout);
                    } 
                )
            );
        } 
    )
);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM