简体   繁体   中英

openMp : parallelize std::map iteration

There are some posts about this issue but none of them satisfies me. I don't have openMp 3.0 support and I need to parallelize a iteration over a map. I want to know if this solution would work or not :

auto element = myMap.begin();

#pragma omp parallel for shared(element)
for(int i = 0 ; i < myMap.size() ; ++i){
 MyKeyObject * current_first = nullptr;
 MyValueObject * current_second = nullptr;
#pragma omp critical
{
    current_first = element->first;
    current_second = element->second;
    ++element;
}

// Here I can use 'current' as in a usual loop
}

So I am using the for loop just to make sure the threads will handle equally the same number of elements of the map. Is that a correct guess or would that fail ?

ps : I am working on visual studio 2012 so if you have a hint about how to make my compiler support openMp 3.0, that would also solve my problem..

This is not a direct answer to your question, but I will try to save you some of the future bad "OpenMP with Visual Studio" experience.

The Microsoft C/C++ Compiler only supports OpenMP 2.0. There is no way to make it support OpenMP 3.0 or higher since OpenMP is built into the compiler core and is not an add-on package (unless someone comes up with an external source-to-source transformation engine) and Microsoft seems not to be interested in providing further OpenMP support while pushing their own solutions (see below). You should therefore either get the Intel C/C++ Compiler that integrates with Visual Studio or a standalone compiler like GCC or the PGI C/C++ compiler.

If you are developing specifically for Windows, then you might want to abandon OpenMP and use the Concurrency Runtime and specifically PPL instead. PPL comes with Visual Studio 2012 and newer and provides data- and task-parallel equivalents to some of the algorithms in STL. What you are interested in is concurrency::parallel_for_each() , which is the parallel version of std::for_each() . It works with forward iterators, although not as efficiently as with random iterators. But you have to make sure that processing one element of the map takes at least a thousand instructions, otherwise the parallelisation won't be beneficial.

If you aim for cross-platform compatibility, then Intel Threading Building Blocks (Intel TBB for short) is the alternative to PPL. It provides the tbb::parallel_do() algorithm, which is specifically designed to work with forward iterators. The same warning about the amount of work per map element applies.

Your method will work since you access and iterate the shared object element in a critical section. Whether of not this is good for performance you will have to test. Here is an alternative method you may want to consider. Let me call this the "fast-forward" method.

Let's assume you want to do this in parallel

for(auto element = myMap.begin(); element !=myMap.end(); ++element) {
    foo(element->first, element->second);
}

You can do this with OpenMP 2.0

#pragma omp parallel
{
    size_t cnt = 0;
    int ithread = omp_get_thread_num();
    int nthreads = omp_get_num_threads();
    for(auto element = myMap.begin(); element !=myMap.end(); ++element, cnt++) {
        if(cnt%nthreads != ithread) continue;
        foo(element->first, element->second);
    }
}

Every thread runs through myMap.size() iteartors. However, each thread only calls foo myMap.size()/num_threads . Your method only runs through myMap.size()/num_threads iterators. However, it requires using a critical section every iteration.

The fast-forward method is efficient as long as the time to "fast-forward" through nthreads itererators is much less then the time for foo , ie:

nthreads*time(++elements) << time(foo)

If, however, the time for foo is on order the time to iterate and foo is reading/writing memory then foo is likely memory bandwidth bound and won't scale with the number of threads anyway.

Your approach will not work - because of a mix of a conceptual problem and a few bugs.

  1. [bug] you will always miss the first element, since the first thing that you do is increment the elements iterator.
  2. [bug] all threads will iterate over the whole map, since the elements iterator is not shared. BTW, it's not clear what the shared variable 'part' is in your code.
  3. If you make element shared, then the code that is accessing it (outside of the critical section) will see whatever it is currently pointing to, regardless of the thread. You will end up processing some elements more than once and some - not at all.

There is no easy way to parallelize access to a map using an iterator, since the map iterator is not random-access. You may want to split the keys up manually and then use different parts of the key set on different threads.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM