如何将矩阵提升为具有多个线程的幂？

Question

I am trying to raise a matrix to a power with multiple threads, but I am not very good with threads. 我正在尝试将矩阵提升为具有多个线程的幂，但是对于线程我不是很好。 Also I enter the number of threads from keyboard and that number is in range [1, matrix height], then I do the following: 另外，我输入了键盘上的线程数，该数在[1，矩阵高度]范围内，然后执行以下操作：

unsigned period = ceil((double)A.getHeight() / threadNum);
unsigned prev = 0, next = period;
for (unsigned i(0); i < threadNum; ++i) {
        threads.emplace_back(&power<long long>, std::ref(result), std::ref(A), std::ref(B), prev, next, p);

        if (next + period > A.getHeight()) {
            prev = next;
            next = A.getHeight();
        }
        else {
            prev = next;
            next += period;
        }
    }

It was easy for me to multiply one matrix by another with multiple threads, but here the problem is that once 1 step is done, for example I need to raise A to the power of 3, A^2 would be that one step, after that step I have to wait for all the threads to finish up, before moving on to doing A^2*A. 我很容易用多个线程将一个矩阵与另一个矩阵相乘，但是这里的问题是，一旦完成了1步，例如，我需要将A提高到3的幂，那么A ^ 2就是那一步，之后那一步，我必须等待所有线程完成，然后再继续执行A ^ 2 * A。 How can I make my threads wait for that? 如何让我的线程等待呢？ I'm using std::thread's. 我正在使用std :: thread。

After the first reply was posted I realized that I forgot to mention that I want to create those threads only once, and not recreate them for each multiplication step. 发布第一个答复后，我意识到我忘了提到我只想创建那些线程一次，而不是为每个乘法步骤重新创建它们。

Answer 1

I would suggest using condition_variable . 我建议使用condition_variable 。

Algorithm would be something like this: 算法将如下所示：

Split the matrix in N parts for N threads. 将矩阵分成N个部分，用于N个线程。
Each thread calculates the necessary resulting sub matrix for a single multiplication. 每个线程为单个乘法计算必要的结果子矩阵。
Then it increments an atomic threads_finished counter using fetch_add and waits on a shared condition variable. 然后它递增原子threads_finished使用计数器fetch_add并等待在共享条件变量。
Last thread that finishes (fetch_add()+1 == thread count), notifies all threads, that they can now continue processing. 最后一个完成的线程（fetch_add（）+ 1 ==线程数），通知所有线程，它们现在可以继续处理。
Profit. 利润。

Edit: Here is and example how to stop threads: 编辑：这是和示例如何停止线程：

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <vector>
#include <algorithm>
#include <atomic>

void sync_threads(std::condition_variable & cv, std::mutex & mut, std::vector<int> & threads, const int idx) {
    std::unique_lock<std::mutex> lock(mut);
    threads[idx] = 1; 
    if(std::find(threads.begin(),threads.end(),0) == threads.end()) {
        for(auto & i: threads)
            i = 0;
        cv.notify_all();
    } else {
        while(threads[idx])
            cv.wait(lock);
    }
}

int main(){

    std::vector<std::thread> threads;

    std::mutex mut;
    std::condition_variable cv;

    int max_threads = 10;
    std::vector<int> thread_wait(max_threads,0);

    for(int i = 0; i < max_threads; i++) {
        threads.emplace_back([&,i](){
                std::cout << "Thread "+ std::to_string(i)+" started\n";
                sync_threads(cv,mut,thread_wait,i);
                std::cout << "Continuing thread " + std::to_string(i) + "\n";
                sync_threads(cv,mut,thread_wait,i);
                std::cout << "Continuing thread for second time " + std::to_string(i) + "\n";

            });
    }

    for(auto & i: threads)
        i.join();
}

The interesting part is here: 有趣的部分在这里：

void sync_threads(std::condition_variable & cv, std::mutex & mut, std::vector<int> & threads, const int idx) {
    std::unique_lock<std::mutex> lock(mut); // Lock because we want to modify cv
    threads[idx] = 1; // Set my idx to 1, so we know we are sleeping
    if(std::find(threads.begin(),threads.end(),0) == threads.end()) {
        // I'm the last thread, wake up everyone
        for(auto & i: threads)
            i = 0;
        cv.notify_all();
    } else { //I'm not the last thread - sleep until all are finished
        while(threads[idx]) // In loop so, if we wake up unexpectedly, we go back to sleep. (Thanks for pointing that out Yakk)
            cv.wait(lock);
    }
}

Answer 2

Here is a mass_thread_pool : 这是一个mass_thread_pool ：

// launches n threads all doing task F with an index:
template<class F>
struct mass_thread_pool {
  F f;
  std::vector< std::thread > threads;
  std::condition_variable cv;
  std::mutex m;
  size_t task_id = 0;
  size_t finished_count = 0;
  std::unique_ptr<std::promise<void>> task_done;
  std::atomic<bool> finished;

  void task( F f, size_t n, size_t cur_task ) {
    //std::cout << "Thread " << n << " launched" << std::endl;
    do {
      f(n);
      std::unique_lock<std::mutex> lock(m);

      if (finished)
        break;

      ++finished_count;
      if (finished_count == threads.size())
      {
        //std::cout << "task set finished" << std::endl;
        task_done->set_value();
        finished_count = 0;
      }
      cv.wait(lock,[&]{if (finished) return true; if (cur_task == task_id) return false; cur_task=task_id; return true;});
    } while(!finished);
    //std::cout << finished << std::endl;
    //std::cout << "Thread " << n << " finished" << std::endl;
  }

  mass_thread_pool() = delete;
  mass_thread_pool(F fin):f(fin),finished(false) {}
  mass_thread_pool(mass_thread_pool&&)=delete; // address is party of identity

  std::future<void> kick( size_t n ) {
    //std::cout << "kicking " << n << " threads off.  Prior count is " << threads.size() << std::endl;
    std::future<void> r;
    {
      std::unique_lock<std::mutex> lock(m);
      ++task_id;
      task_done.reset( new std::promise<void>() );
      finished_count = 0;
      r = task_done->get_future();
      while (threads.size() < n) {
        size_t i = threads.size();
        threads.emplace_back( &mass_thread_pool::task, this, f, i, task_id );
      }
      //std::cout << "count is now " << threads.size() << std::endl;
    }
    cv.notify_all();
    return r;
  }
  ~mass_thread_pool() {
    //std::cout << "destroying thread pool" << std::endl;
    finished = true;
    cv.notify_all();
    for (auto&& t:threads) {
      //std::cout << "joining thread" << std::endl;
      t.join();
    }
    //std::cout << "destroyed thread pool" << std::endl;
  }
};

you construct it with a task, and then you kick(77) to launch 77 copies of that task (each with a different index). 您可以用一个任务构造它，然后用kick(77)启动该任务的77个副本（每个副本具有不同的索引）。

kick returns a std::future<void> . kick返回一个std::future<void> 。 You must wait on this future for all of the tasks to be finished. 您必须等待这个未来，以完成所有任务。

Then you can either destroy the thread pool, or call kick(77) again to relaunch the task. 然后，您可以销毁线程池，或再次调用kick(77)重新启动任务。

The idea is that the function object you pass to mass_thread_pool has access to both your input and output data (say, the matrices you want to multiply, or pointers to them). 这个想法是，传递给mass_thread_pool的函数对象可以访问输入和输出数据（例如，要相乘的矩阵或指向它们的指针）。 Each kick causes it to call your function once for each index. 每次kick都会导致它为每个索引调用一次函数。 You are in charge of turning indexes into offsets of whatever. 您负责将索引转换为任何偏移量。

Live example where I use it to add 1 to an entry in another vector . 实时示例，在该示例中，我使用它向另一个vector的条目添加1。 Between iterations, we swap vectors. 在迭代之间，我们交换向量。 This does 2000 iterations, and launches 10 threads, and calls the lambda 20000 times. 这将执行2000次迭代，并启动10个线程，并调用lambda 20000次。

Note the auto&& pool = make_pool( lambda ) bit. 注意auto&& pool = make_pool( lambda )位。 Use of auto&& is required -- as the thread pool has pointers into itself, I disabled both move and copy construct on a mass thread pool. 需要使用auto&& -因为线程池本身具有指针，所以我禁用了大容量线程池上的move和copy构造。 If you really need to pass it around, create a unique pointer to the thread pool. 如果确实需要传递它，请创建指向线程池的唯一指针。

I ran into some issues with std::promise resetting, so I wrapped it in a unique_ptr. 我在重置std::promise遇到了一些问题，因此我将其包装在unique_ptr中。 That may not be required. 这可能不是必需的。

Trace statements I used to debug it are commented out. 我用来调试它的跟踪语句已被注释掉。

Calling kick with a different n may or may not work. 用不同的n调用kick可能有效，也可能无效。 Definitely calling it with a smaller n will not work the way you expect (it will ignore the n in that case). 绝对用较小的n调用它将无法按您期望的方式工作（在这种情况下，它将忽略n ）。

No processing is done until you call kick . 在您致电kick之前，不会进行任何处理。 kick is short for "kick off". kick是“开球”的简称。

... ...

In the case of your problem, what I'd do is make a multipier object that owns a mass_thread_pool . 在您遇到问题的情况下，我要做的是制作一个拥有mass_thread_pool对象。

The multiplier has a pointer to 3 matrices ( a , b and out ). 乘法器有一个指向3个矩阵（ a ， b和out ）的指针。 Each of the n subtasks generate some subsection of out . n个子任务中的每一个都会生成out 。

You pass 2 matrices to the multiplier, it sets a pointer to out to a local matrix and a and b to the passed in matrices, does a kick , then a wait, then returns the local matrix. 您通过2点矩阵的乘法，它设置一个指向out一个本地矩阵， a和b的矩阵中传递，并一kick ，然后等待，然后返回本地矩阵。

For powers, you use the above multiplier to build a power-of-two tower, while multiply-accumulating based off the bits of the exponent into your result (again using the above multiplier). 对于幂，您可以使用上面的乘法器来构建一个2的幂的塔，同时根据指数的位数乘以累加到您的结果中（再次使用上面的乘法器）。

A fancier version of the above could allow queuing up of multiplications and std::future<Matrix> s (and multiplications of future matrixes). 上面的一个更好的版本可以允许对乘法和std::future<Matrix> （以及未来矩阵的乘法）进行排队。

Answer 3

I would start with a simple decomposition: 我将从一个简单的分解开始：

matrix multiplication gets multithreaded 矩阵乘法获取多线程
matrix exponent calls the multiplication several times. 矩阵指数多次调用乘法。

Something like that: 像这样：

Mat multithreaded_multiply(Mat const& left, Mat const& right) {...}

Mat power(Mat const& M, int n)
{
    // Handle degenerate cases here (n = 0, 1)

    // Regular loop
    Mat intermediate = M;
    for (int i = 2; i <= n; ++i) 
    {
        intermediate = multithreaded_multiply(M, intermediate);
    }
}

For waiting for std::thread , you have the method join() . 为了等待std::thread ，您可以使用join() 方法。

Answer 4

Not a programming but math answer: for every square matrix there is a set of so called "eigenvalues" and "eigenvectors", so that M * E_i = lambda_i * E_i. 不是编程而是数学答案：对于每个方阵，都有一组所谓的“特征值”和“特征向量”，因此M * E_i = lambda_i * E_i。 M is the matrix, E_i is the eigenvector, lambda_i is the eigenvalue, which is just a complex number. M是矩阵，E_i是特征向量，lambda_i是特征值，它只是一个复数。 So M^n * E_i = lambda_i^n *E_i. 所以M ^ n * E_i = lambda_i ^ n * E_i。 So you need only the nth power of a complex number instead of a matrix. 因此，您只需要复数的n次方而不是矩阵。 The eigenvectors are orthogonal, ie any vector V = sum_i a_i * E_i. 特征向量是正交的，即任何向量V = sum_i a_i * E_i。 So M^n * V = sum_i a_i lambda^n E_i. 因此M ^ n * V = sum_i a_i lambda ^ n E_i。 Depending on your problem this might speed up things significantly. 根据您的问题，这可能会大大加快速度。

如何将矩阵提升为具有多个线程的幂？

问题描述

4 个解决方案

解决方案1
2 已采纳 2015-04-10 12:09:02

解决方案2
1 2015-04-10 14:47:30

解决方案3
0 2015-04-10 11:33:25

解决方案4
-1 2015-04-10 12:53:00

如何将矩阵提升为具有多个线程的幂？

问题描述

4 个解决方案

解决方案1 2 已采纳 2015-04-10 12:09:02

解决方案2 1 2015-04-10 14:47:30

解决方案3 0 2015-04-10 11:33:25

解决方案4 -1 2015-04-10 12:53:00

解决方案1
2 已采纳 2015-04-10 12:09:02

解决方案2
1 2015-04-10 14:47:30

解决方案3
0 2015-04-10 11:33:25

解决方案4
-1 2015-04-10 12:53:00