简体   繁体   中英

Parallelism with C++ unordered_map

I have an unordered_map of type std::unordered_map<std::string, int64_t> sMap . This contains a number of strings and a 'weight' associated with each of them. I want to find the strings with the N largest weights.

If I wanted to do this using a single thread, I think I could create a priority queue of pairs like this

std::priority_queue<
    std::pair<std::string, int64_t>,
    std::vector<std::pair<std::string, int64_t>>,
    std::function<bool(std::pair<std::string, int64_t>&,
            std::pair<std::string, int64_t>&)>> prQ(comparePair);

and just go through the whole unordered_map, inserting elements to prQ while maintaining length N.

I want to achieve the same using multiple threads. I was thinking of assigning each thread to work on a few elements of the unordered_map to create a local priority queue of length N which can be merged into a global one at the end.

The problem I am facing right now is that the iterator which I get from unordered_map::begin() does not work with a + operator. At least that is the error that I am getting : error: no match for 'operator+' (operand types are 'std::unordered_map<std::basic_string<char>, long int>::iterator {aka std::__detail::_Node_iterator<std::pair<const std::b asic_string<char>, long int>, false, true>}' and 'int') Thus, I cannot really specify a range of elements to be worked upon by a particular thread. The [] operator would take a key as expected and not an offset.

Essentially, I can't seem to find a way to have a data parallel loop that would work with only a few elements per thread. How can I solve this problem using multiple threads then?

EDIT : @Brian Vandberg asked me to supply a simplified example of the code that generates the error I was talking about.

std::unordered_map<std::string, int64_t> sMap;
//Initialize sMap values
int start = 0, end = 2;
for(auto i = sMap.begin() + start; sMap.begin() + end; ++i) {
    std::cout<<i->first<<"\t"<<i->second<<"\n";
}

First, I'm not sure that I'd go with a priority queue for this problem (either single threaded, or as the part performed by a specific thread). The standard library has nth_element , which you can use to find the n th element in linear time. Following that, finding which elements are larger is also linear time.

You might consider this if speed is the problem, yours if size is a problem ( nth_element will effectively force you to create a copy of the data). In this solution you iterate over the map (or part of it), and push_back only the weights into a vector , on which you perform nth_element . In the 2nd stage, loop again over the map, and choose those whose weight is higher.


Suppose you have the loop:

std::size_t j = 0;
for(const auto &e: sMap)
{
    if(++j % k != i)
        continue;
    // Rest of code goes here.
}

Then if you use it for the i th thread out of k , it will partition the elements between the threads. Moreover, while all threads are iterating over the same elements (if only to skip most of them), it's happening in parallel.


Each thread can generate its candidates for the largest m elements, then choose the largest m elements from the km candidates using the method above (with the nth_element ) or any other method.

It's interesting to ask what size of sMap will generate any speedup in practice.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM