How can I raise a matrix to a power with multiple threads?

Question

I am trying to raise a matrix to a power with multiple threads, but I am not very good with threads. Also I enter the number of threads from keyboard and that number is in range [1, matrix height], then I do the following:

unsigned period = ceil((double)A.getHeight() / threadNum);
unsigned prev = 0, next = period;
for (unsigned i(0); i < threadNum; ++i) {
        threads.emplace_back(&power<long long>, std::ref(result), std::ref(A), std::ref(B), prev, next, p);

        if (next + period > A.getHeight()) {
            prev = next;
            next = A.getHeight();
        }
        else {
            prev = next;
            next += period;
        }
    }

It was easy for me to multiply one matrix by another with multiple threads, but here the problem is that once 1 step is done, for example I need to raise A to the power of 3, A^2 would be that one step, after that step I have to wait for all the threads to finish up, before moving on to doing A^2*A. How can I make my threads wait for that? I'm using std::thread's.

After the first reply was posted I realized that I forgot to mention that I want to create those threads only once, and not recreate them for each multiplication step.

Answer 1

I would suggest using condition_variable .

Algorithm would be something like this:

Split the matrix in N parts for N threads.
Each thread calculates the necessary resulting sub matrix for a single multiplication.
Then it increments an atomic threads_finished counter using fetch_add and waits on a shared condition variable.
Last thread that finishes (fetch_add()+1 == thread count), notifies all threads, that they can now continue processing.
Profit.

Edit: Here is and example how to stop threads:

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <vector>
#include <algorithm>
#include <atomic>

void sync_threads(std::condition_variable & cv, std::mutex & mut, std::vector<int> & threads, const int idx) {
    std::unique_lock<std::mutex> lock(mut);
    threads[idx] = 1; 
    if(std::find(threads.begin(),threads.end(),0) == threads.end()) {
        for(auto & i: threads)
            i = 0;
        cv.notify_all();
    } else {
        while(threads[idx])
            cv.wait(lock);
    }
}

int main(){

    std::vector<std::thread> threads;

    std::mutex mut;
    std::condition_variable cv;

    int max_threads = 10;
    std::vector<int> thread_wait(max_threads,0);

    for(int i = 0; i < max_threads; i++) {
        threads.emplace_back([&,i](){
                std::cout << "Thread "+ std::to_string(i)+" started\n";
                sync_threads(cv,mut,thread_wait,i);
                std::cout << "Continuing thread " + std::to_string(i) + "\n";
                sync_threads(cv,mut,thread_wait,i);
                std::cout << "Continuing thread for second time " + std::to_string(i) + "\n";

            });
    }

    for(auto & i: threads)
        i.join();
}

The interesting part is here:

void sync_threads(std::condition_variable & cv, std::mutex & mut, std::vector<int> & threads, const int idx) {
    std::unique_lock<std::mutex> lock(mut); // Lock because we want to modify cv
    threads[idx] = 1; // Set my idx to 1, so we know we are sleeping
    if(std::find(threads.begin(),threads.end(),0) == threads.end()) {
        // I'm the last thread, wake up everyone
        for(auto & i: threads)
            i = 0;
        cv.notify_all();
    } else { //I'm not the last thread - sleep until all are finished
        while(threads[idx]) // In loop so, if we wake up unexpectedly, we go back to sleep. (Thanks for pointing that out Yakk)
            cv.wait(lock);
    }
}

Answer 2

Here is a mass_thread_pool :

// launches n threads all doing task F with an index:
template<class F>
struct mass_thread_pool {
  F f;
  std::vector< std::thread > threads;
  std::condition_variable cv;
  std::mutex m;
  size_t task_id = 0;
  size_t finished_count = 0;
  std::unique_ptr<std::promise<void>> task_done;
  std::atomic<bool> finished;

  void task( F f, size_t n, size_t cur_task ) {
    //std::cout << "Thread " << n << " launched" << std::endl;
    do {
      f(n);
      std::unique_lock<std::mutex> lock(m);

      if (finished)
        break;

      ++finished_count;
      if (finished_count == threads.size())
      {
        //std::cout << "task set finished" << std::endl;
        task_done->set_value();
        finished_count = 0;
      }
      cv.wait(lock,[&]{if (finished) return true; if (cur_task == task_id) return false; cur_task=task_id; return true;});
    } while(!finished);
    //std::cout << finished << std::endl;
    //std::cout << "Thread " << n << " finished" << std::endl;
  }

  mass_thread_pool() = delete;
  mass_thread_pool(F fin):f(fin),finished(false) {}
  mass_thread_pool(mass_thread_pool&&)=delete; // address is party of identity

  std::future<void> kick( size_t n ) {
    //std::cout << "kicking " << n << " threads off.  Prior count is " << threads.size() << std::endl;
    std::future<void> r;
    {
      std::unique_lock<std::mutex> lock(m);
      ++task_id;
      task_done.reset( new std::promise<void>() );
      finished_count = 0;
      r = task_done->get_future();
      while (threads.size() < n) {
        size_t i = threads.size();
        threads.emplace_back( &mass_thread_pool::task, this, f, i, task_id );
      }
      //std::cout << "count is now " << threads.size() << std::endl;
    }
    cv.notify_all();
    return r;
  }
  ~mass_thread_pool() {
    //std::cout << "destroying thread pool" << std::endl;
    finished = true;
    cv.notify_all();
    for (auto&& t:threads) {
      //std::cout << "joining thread" << std::endl;
      t.join();
    }
    //std::cout << "destroyed thread pool" << std::endl;
  }
};

you construct it with a task, and then you kick(77) to launch 77 copies of that task (each with a different index).

kick returns a std::future<void> . You must wait on this future for all of the tasks to be finished.

Then you can either destroy the thread pool, or call kick(77) again to relaunch the task.

The idea is that the function object you pass to mass_thread_pool has access to both your input and output data (say, the matrices you want to multiply, or pointers to them). Each kick causes it to call your function once for each index. You are in charge of turning indexes into offsets of whatever.

Live example where I use it to add 1 to an entry in another vector . Between iterations, we swap vectors. This does 2000 iterations, and launches 10 threads, and calls the lambda 20000 times.

Note the auto&& pool = make_pool( lambda ) bit. Use of auto&& is required -- as the thread pool has pointers into itself, I disabled both move and copy construct on a mass thread pool. If you really need to pass it around, create a unique pointer to the thread pool.

I ran into some issues with std::promise resetting, so I wrapped it in a unique_ptr. That may not be required.

Trace statements I used to debug it are commented out.

Calling kick with a different n may or may not work. Definitely calling it with a smaller n will not work the way you expect (it will ignore the n in that case).

No processing is done until you call kick . kick is short for "kick off".

...

In the case of your problem, what I'd do is make a multipier object that owns a mass_thread_pool .

The multiplier has a pointer to 3 matrices ( a , b and out ). Each of the n subtasks generate some subsection of out .

You pass 2 matrices to the multiplier, it sets a pointer to out to a local matrix and a and b to the passed in matrices, does a kick , then a wait, then returns the local matrix.

For powers, you use the above multiplier to build a power-of-two tower, while multiply-accumulating based off the bits of the exponent into your result (again using the above multiplier).

A fancier version of the above could allow queuing up of multiplications and std::future<Matrix> s (and multiplications of future matrixes).

Answer 3

I would start with a simple decomposition:

matrix multiplication gets multithreaded
matrix exponent calls the multiplication several times.

Something like that:

Mat multithreaded_multiply(Mat const& left, Mat const& right) {...}

Mat power(Mat const& M, int n)
{
    // Handle degenerate cases here (n = 0, 1)

    // Regular loop
    Mat intermediate = M;
    for (int i = 2; i <= n; ++i) 
    {
        intermediate = multithreaded_multiply(M, intermediate);
    }
}

For waiting for std::thread , you have the method join() .

Answer 4

Not a programming but math answer: for every square matrix there is a set of so called "eigenvalues" and "eigenvectors", so that M * E_i = lambda_i * E_i. M is the matrix, E_i is the eigenvector, lambda_i is the eigenvalue, which is just a complex number. So M^n * E_i = lambda_i^n *E_i. So you need only the nth power of a complex number instead of a matrix. The eigenvectors are orthogonal, ie any vector V = sum_i a_i * E_i. So M^n * V = sum_i a_i lambda^n E_i. Depending on your problem this might speed up things significantly.

How can I raise a matrix to a power with multiple threads?

Question

4 answers

solution1
2 ACCPTED 2015-04-10 12:09:02

solution2
1 2015-04-10 14:47:30

solution3
0 2015-04-10 11:33:25

solution4
-1 2015-04-10 12:53:00

How can I raise a matrix to a power with multiple threads?

Question

4 answers

solution1 2 ACCPTED 2015-04-10 12:09:02

solution2 1 2015-04-10 14:47:30

solution3 0 2015-04-10 11:33:25

solution4 -1 2015-04-10 12:53:00

solution1
2 ACCPTED 2015-04-10 12:09:02

solution2
1 2015-04-10 14:47:30

solution3
0 2015-04-10 11:33:25

solution4
-1 2015-04-10 12:53:00