C++ - Multithreading takes longer with more threads

Question

I'm making a parallel password cracker for an assignment. When I launch more than one thread, the times taken to crack take longer the more threads I add. What is the problem here?

Secondly, what resource sharing techniques can I use for optimal performance too? I'm required to use either mutexes, atomic operations or barriers while also using semaphores, conditional variables or channels. Mutexes seem to slow my program down quite drastically.

Here is an example of my code for context:

std::mutex mtx;
std::condition_variable cv;

void run()
{
  std::unique_lock<std::mutex> lck(mtx);
  ready = true;
  cv.notify_all();
}

crack()
{
  std::lock_guard<std::mutex> lk(mtx);
  ...do cracking stuff
}

main()
{
  ....

  std::thread *t = new std::thread[uiThreadCount];

  for(int i = 0; i < uiThreadCount; i++)
  {
    t[i] = std::thread(crack, params);
  }

  run();

  for(int i = 0; i < uiThreadCount; i++)
  {
    t[i].join();
  }

}

Answer 1

When writing multi-threaded code, it's generally a good idea to share as few resources as possible, so you can avoid having to synchronize using a mutex or an atomic .

There are a lot of different ways to do password cracking, so I'll give a slightly simpler example. Let's say you have a hash function, and a hash, and you're trying to guess what input produces the hash (this is basically how a password would get cracked).

We can write the cracker like this. It'll take the hash function and the password hash, check a range of values, and invoke the callback function if it found a match.

auto cracker = [](auto passwdHash, auto hashFunc, auto min, auto max, auto callback) {
    for(auto i = min; i < max; i++) {
        auto output = hashFunc(i); 
        if(output == passwdHash) {
             callback(i);
        }
    }
};

Now, we can write a parallel version. This version only has to synchronize when it finds a match, which is pretty rare.

auto parallel_cracker = [](auto passwdHash, auto hashFunc, auto min, auto max, int num_threads) {
    // Get a vector of threads
    std::vector<std::thread> threads;
    threads.reserve(num_threads);

    // Make a vector of all the matches it discovered
    using input_t = decltype(min); 
    std::vector<input_t> matches; 
    std::mutex match_lock;

    // Whenever a match is found, this function gets called
    auto callback = [&](input_t match) {
        std::unique_lock<std::mutex> _lock(match_lock); 
        std::cout << "Found match: " << match << '\n';
        matches.push_back(match); 
    };

    for(int i = 0; i < num_threads; i++) {
        auto sub_min = min + ((max - min) * i) / num_threads;
        auto sub_max = min + ((max - min) * (i + 1)) / num_threads;
        matches.push_back(std::thread(cracker, passwdHash, hashFunc, sub_min, sub_max, callback)); 
    }

    // Join all the threads
    for(auto& thread : threads) {
        thread.join(); 
    }
    return matches; 
};

Answer 2

yes, not surprising with the way it's written: putting a mutex at the beginning of your thread ( crack function), you effectively make them run sequentially

I understand you want to achieve a "synchronous start" of the threads (by the intention of using conditional variable cv ), but you don't use it properly - without use of one of its wait methods, the call cv.notify_all() is useless: it does not do what you intended, instead your threads will simply run sequentially.

using wait() from the std::condition_variable in your crack() call is imperative: it will release the mtx (which you just grabbed with the mutex guard lk ) and will block the execution of the thread until the cv.notify_all() . After the call, your other threads (except the first one, whichever it will be) will remain under the mtx so if you really want the "parallel" execution, you'd then need to unlock the mtx .

Here, how your crack thread should look like:

crack()
{
  std::unique_lock<std::mutex> lk(mtx);
  cv.wait(lk);
  lk.unlock();

  ...do cracking stuff

}

btw, you don't need ready flag in your run() call - it's entirely redundant/unused.

I'm required to use either mutexes, atomic operations or barriers while also using semaphores, conditional variables or channels

- different tools/techniques are good for different things, the question is too general

C++ - Multithreading takes longer with more threads

Question

2 answers

solution1
1 2019-04-29 21:51:49

solution2
0 2019-04-29 21:43:36

C++ - Multithreading takes longer with more threads

Question

2 answers

solution1 1 2019-04-29 21:51:49

solution2 0 2019-04-29 21:43:36

solution1
1 2019-04-29 21:51:49

solution2
0 2019-04-29 21:43:36