What’s the best way to parallelize this?

Question

I have a file which is composed of some 800 000 lines. Each line is formed by an id, a code and data, each field separated by a TAB.

3445    aaaa    Some data here for instance
89002   aree    Some other data

As a pure exercise to get acquainted with OpenCL, I decided to parse this file using OpenCL. Each work item goes through one single line and processes it. Each line is 4000-character long.

__kernel void parse_line(
            __global const char * lines,   // IN
            __global unsigned * id,        // OUT
            __global char * code,          // OUT
            __global char * data           // OUT
        )
{
   // parse the line to extract id, code and data
}

Given that CL_DEVICE_MAX_WORK_GROUP_SIZE is 1024, I can't have more than 1024 work items at the same time. I can't either pump the entire file into the GPU memory ( CL_DEVICE_MAX_MEM_ALLOC_SIZE is only 268353536).

A first idea could be to parse a first batch of 1024 sentences, then a second one, and so on, keeping the kernel with the task to treat one single sentence. I could also rewrite the kernel so that instead of parsing one sentence it parses 16, then the 1024 work items would treat some 16384 sentences.

I am pretty new to OpenCL as mentioned earlier, so I am really looking for advises on how to best do that.

Answer 1

OpenCL wouldn't have been my first choice for text processing. Though, there is probably some set of problems for which it makes sense. Can you decompose the entire algorithm into steps and see what the bottleneck is (are you going to do anything with the data after parsing the file?)? Moving those strings over various buses to be reduced later is likely suboptimal. Reduce them at the earliest opportunity. It looks like you're not even reducing them, just splitting the stream, but keeping the data as character strings?

If indeed parsing and converting the values is the bottleneck, then I would recommend continuing your experiment of breaking down the large file into blocks which can fit into memory.

Answer 2

Is the bottleneck the reading of the file or the parsing? If it's the reading then there is not much you can besides storing the file on a faster medium. If it's the parsing, you could read the whole file into an array or std::vector then use threads, where each thread is parsing a part of the array/vector.

What’s the best way to parallelize this?

Question

2 answers

solution1
3 ACCPTED 2012-09-16 01:48:02

solution2
1 2012-09-16 02:04:28

What’s the best way to parallelize this?

Question

2 answers

solution1 3 ACCPTED 2012-09-16 01:48:02

solution2 1 2012-09-16 02:04:28

solution1
3 ACCPTED 2012-09-16 01:48:02

solution2
1 2012-09-16 02:04:28