Matrix multiplication using multiple threads

Question

So I am trying to compute (M by N matrix) times (N by 1 vector) operations with threads into a resulting vector. The question in my book says that I should think about how many threads to use, and I assume since the result matrix should be M by 1 then I should use M threads, one for each set of operations. M is height, and N is width.

To create the threads I use

thread* myThreads = new thread[height];

Then I call the MatrixMultThreads function i times. At the end I join all the threads.

    for (int i = 0; i < height; i++)
    {
        myThreads[i] = thread(MatrixMultThreads, my2DArray, vector, height, width);
    }
    for (int i = 0; i < height; i++)
    {
        myThreads[i].join();
    }

What I am having trouble figuring out is how should I sum up all the resulting values in the correct order. How would I tell each specific thread what to do. I was thinking, maybe I should create a global variable step_i and set it to 0, then each time the function is called I can iterate that variable. then since I can pass the width of the array, I go through each step_i and add arr[i][j] * vector[j]

Answer 1

You can store the results of the rows dot the Nx1 vector in a Mx1 vector and then do the sum.

By the way, you would be much better using OpenMP for such a problem, it would automatize most of your threads managements according to the number of cores on your machine, since here you might spawn a lot of threads:

https://www.openmp.org/

http://www.bowdoin.edu/~ltoma/teaching/cs3225-GIS/fall17/Lectures/openmp.html

Answer 2

What I am having trouble figuring out is how should I sum up all the resulting values in the correct order.

They can be summed out-of-order, which is why this is a good problem to solve with multi-threading. If ordering matters to a specific problem, you can't improve it with multithreading (to be clear, if any sub-problem can be solved out-of-order then that sub-problem is a potential candidate for multithreading).

One solution to your problem is to set up a solution vector at the call site, then pass the corresponding element by reference (also the MatrixMultiply function needs to know which problem it's solving):

void MatrixMultiply(const Array2d& matrix, 
       const vector<int>& vec, int row, int& solution);

// ...

vector<int> result(height);

for (int i = 0; i < height; i++)
{
    threads[i] = thread(MatrixMultiply, array2d, array1d, i, result[i]);
}

Your 2D array should really provide info on its height and width without having to pass these values explicitly.

BONUS INFO :

We could make this solution much more OOP in a way that you'll want to reuse for future problems (and some experienced programmers seem to miss this trick for using arrays):

MatrixMultiply function is really similar to a dot-product function:

template <typename V1, typename V2>
auto DotProduct(const V1& vec1, const V2& vec2)
{
    auto result = vec1[0] * vec2[0];

    for (size_t i = 1; i < vec1.size(); ++i)
        result += vec1[i] * vec2[i];

    return result;
}

template <typename V1, typename V2, typename T>
auto DotProduct(const V1& vec1, const V2& vec2, T& result)
{
    result = DotProduct(vec1, vec2);
}

(The above allows the vectors to be any objects that uses size() and [] methods as expected.)

We can write a wrapper class around std::vector that can be used by our array class to handle all the indexing for us; like this:

template <typename T, typename A>
class SubVector
{
    const typename std::vector<T,A>::iterator m_it;
    const size_t m_size, m_interval_size;

public:

    SubVector (std::vector<T,A>& v, size_t start, size_t sub_size, size_t i_size = 1)
        : m_it(v.begin() + start), m_size(sub_size), m_interval_size(i_size)
    {}
    
    auto size () const
    {
        return m_size;
    }
    
    const T& operator [] (size_t i) const
    {
        return it[i*m_interval_size];
    }

    T& operator [] (size_t i)
    {
        return it[i*m_interval_size];
    }
};

Then you could use this in some kind of Vectorise method in your array; like this:

template <typename T, typename A = std::allocator<T>>
class Array2D
{
    std::vector<T,A> m_data;

    size_t m_width, m_height;

public:

    // your normal methods  

    auto VectoriseRow(int r) const
    {
        return SubVector(m_data, r*m_width, m_width);
    }

    auto VectoriseColumn(int c) const
    {
        return SubVector(m_data, c, m_height, m_width);
    }
}

(Note: We could add the Vectorise feature to std::array or boost::multi_array by just writing a wrapper around them, which makes our array class more generic and saves us from having to do all the work. boost actually has this sort of feature inbuilt with array_view .)

Now our call site can be like so:

vector<int> result(height);

for (int i = 0; i < height; i++)
{
    threads[i] = thread(DotProduct, array2d.VectoriseRow(i), array1d, result[i]);
}

This might seem like a more verbose way of solving the original problem (because it is), but if you use multi-dimensional arrays in your coding you'll find you no longer have to write multi-array-specific functions, or handle ugly indices for sub-problems (even in 1D problems, like Mean of Means). When dealing with those sorts of problems, you'll invariably want to reuse something like the above code.

Matrix multiplication using multiple threads

Question

2 answers

solution1
1 2020-09-24 04:31:13

solution2
1 ACCPTED 2020-09-24 04:51:10

Matrix multiplication using multiple threads

Question

2 answers

solution1 1 2020-09-24 04:31:13

solution2 1 ACCPTED 2020-09-24 04:51:10

solution1
1 2020-09-24 04:31:13

solution2
1 ACCPTED 2020-09-24 04:51:10