简体   繁体   中英

parallel for is slower than sequential for

My program shall perform a parallel distinct rotation of words and texts.

If you do not know what this means: Rotations of "BANANA" are

  • BANANA
  • ANANAB
  • NANABA
  • ANABAN
  • NABANA
  • ABANAN

(simply put the first letter to the end.)

vector<string> rotate_sequentiell( string* word )
{
vector<string> all_rotations;

for ( unsigned int i = 0; i < word->size(); i++ )
{
    string rotated = word->substr( i ) + word->substr( 0,i );
    all_rotations.push_back( rotated );
}

if ( verbose ) { printVec(&all_rotations, "Rotations"); }


return all_rotations;
}

We should be able to make this parallel. Instead of moving just one letter to the end, I want to move two letters at once to the end, so for example, we take BANANA Take te "BA" to the end and get NANA BA, which is the third entry in the list above.

I implemented it like this

vector<string> rotate_parallel( string* word )
{
vector<string> all_rotations( word->size() );

#pragma omp parallel for
for ( unsigned int i = 0; i < word->size(); i++ )
{
    string rotated = word->substr( i ) + word->substr( 0,i );
    all_rotations[i] = rotated;
}

if ( verbose ) { printVec(&all_rotations, "Rotations"); }

return all_rotations;
}

I pre-calculated the number of possible rotations and used the #pragma omp parallel for, so it should do what I think it does.

To test these functions, I have a 40KB large text-file which is meant to be "rotated". I wanna have all the distinct rotations of a giant text.

What happens now is, that the sequential procedure tooks like 4.3 seconds and the parallel tooks like 6.5 seconds.

Why is that so? What am I doing wrong?

This is how I measure time:

clock_t start, finish;
start = clock();
bwt_encode_parallel( &glob_word, &seperator );
finish = clock();
cout << "Time (seconds): "
     << ((double)(finish - start))/CLOCKS_PER_SEC;

I compile my code with

g++ -O3 -g -Wall -lboost_regex -fopenmp -fmessage-length=0

The parallel version has 2 sources of additional work compared to the sequential version: (1) overhead of starting the threads, and (2) coordination and locking between the threads.

Impact of (1) Should diminish when the data set grows larger, and probably can't be worth 2 seconds anyway, but this would set the limit of how small jobs it makes sense to parallelize.

(2) is in your case probably mostly caused by omp assigning tasks to the threads, and the different threads doing memory allocation for the 2 intermediate substrings and the final string "rotated" - the memory allocation routine probably has to get a global lock before it can reserve a piece of the heap for you.

Preallocating the final storage in a single thread and guiding OMP to run the parallel loop in large (2048) blocks of iterations per thread tilts the result to to favor the parallel execution. I get about 700ms for the single threaded and 330ms for the multithreaded version with the code below:

 enum {SZ = 40960};
 std::string word;
 word.resize(SZ);
 for (int i = 0; i < SZ; i++) {
   word[i] = (i & 127) + 1;  // put stuff into the word
 }
 std::vector<std::string> all_rotations(SZ);
 clock_t start, finish;
 start = clock();
 for (int i = 0; i < (int)word.size(); i++) {
   all_rotations[i].reserve(SZ);
 }
 #pragma omp parallel for schedule (static, 2048)
 for (int i = 0; i < (int)word.size(); i++) {
   std::string rotated = word.substr(i) + word.substr(0, i);
   all_rotations[i] = rotated;
 }
 finish = clock();
 printf("Time (seconds): %0.3lf\n", ((double)(finish - start))/CLOCKS_PER_SEC);

Last, when you need the results of the burrows wheeler transform, you don't necessarily want N copies of a string that contains N characters. It would save space and processing to treat the string as a ring buffer and read each rotation from a different offset in the buffer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM