简体   繁体   English

并行化这个的最佳方法是什么?

[英]What’s the best way to parallelize this?

I have a file which is composed of some 800 000 lines. 我有一个由大约800 000行组成的文件。 Each line is formed by an id, a code and data, each field separated by a TAB. 每行由id,代码和数据组成,每个字段由TAB分隔。

3445    aaaa    Some data here for instance
89002   aree    Some other data

As a pure exercise to get acquainted with OpenCL, I decided to parse this file using OpenCL. 作为熟悉OpenCL的纯练习,我决定使用OpenCL解析此文件。 Each work item goes through one single line and processes it. 每个工作项都经过一行并处理它。 Each line is 4000-character long. 每行长度为4000个字符。

__kernel void parse_line(
            __global const char * lines,   // IN
            __global unsigned * id,        // OUT
            __global char * code,          // OUT
            __global char * data           // OUT
        )
{
   // parse the line to extract id, code and data
}

Given that CL_DEVICE_MAX_WORK_GROUP_SIZE is 1024, I can't have more than 1024 work items at the same time. 鉴于CL_DEVICE_MAX_WORK_GROUP_SIZE为1024,我不能同时拥有超过1024个工作项。 I can't either pump the entire file into the GPU memory ( CL_DEVICE_MAX_MEM_ALLOC_SIZE is only 268353536). 我无法将整个文件泵入GPU内存( CL_DEVICE_MAX_MEM_ALLOC_SIZE仅为268353536)。

A first idea could be to parse a first batch of 1024 sentences, then a second one, and so on, keeping the kernel with the task to treat one single sentence. 第一个想法可能是解析第一批1024个句子,然后是第二个句子,依此类推,使内核完成处理单个句子的任务。 I could also rewrite the kernel so that instead of parsing one sentence it parses 16, then the 1024 work items would treat some 16384 sentences. 我还可以重写内核,这样就不会解析一个句子来解析16,而是1024个工作项会处理16384个句子。

I am pretty new to OpenCL as mentioned earlier, so I am really looking for advises on how to best do that. 如前所述,我对OpenCL很新,所以我真的在寻找如何最好地做到这一点的建议。

OpenCL wouldn't have been my first choice for text processing. OpenCL不是我文本处理的首选。 Though, there is probably some set of problems for which it makes sense. 虽然,可能存在一些有意义的问题。 Can you decompose the entire algorithm into steps and see what the bottleneck is (are you going to do anything with the data after parsing the file?)? 你能否将整个算法分解成步骤,看看瓶颈是什么(解析文件后你会对数据做什么?)? Moving those strings over various buses to be reduced later is likely suboptimal. 将这些字符串移到各种总线上以便稍后减少可能不是最理想的。 Reduce them at the earliest opportunity. 尽早减少它们。 It looks like you're not even reducing them, just splitting the stream, but keeping the data as character strings? 看起来你甚至没有减少它们,只是拆分流,但保持数据为字符串?

If indeed parsing and converting the values is the bottleneck, then I would recommend continuing your experiment of breaking down the large file into blocks which can fit into memory. 如果确实解析和转换值是瓶颈,那么我建议继续你的实验,将大文件分解成适合内存的块。

Is the bottleneck the reading of the file or the parsing? 瓶颈是读取文件还是解析? If it's the reading then there is not much you can besides storing the file on a faster medium. 如果它是读数,那么除了将文件存储在更快的介质上之外,没有太多可以。 If it's the parsing, you could read the whole file into an array or std::vector then use threads, where each thread is parsing a part of the array/vector. 如果是解析,你可以将整个文件读入数组或std::vector然后使用线程,其中每个线程都在解析数组/向量的一部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM