I have some serial code that does a matrix-vector multiply with matrices represented as std::vector<std::vector<double>>
and std::vector<double>
, respectively:
void mat_vec_mult(const std::vector<std::vector<double>> &mat, const std::vector<double> &vec,
std::vector<std::vector<double>> *result, size_t beg, size_t end) {
// multiply a matrix by a pre-transposed column vector; returns a column vector
for (auto i = beg; i < end; i++) {
(*result)[i] = {std::inner_product(mat[i].begin(), mat[i].end(), vec.begin(), 0.0)};
}
}
I would like to parallelize it using OpenMP, which I am trying to learn. From here , I got to the following:
void mat_vec_mult_parallel(const std::vector<std::vector<double>> &mat, const std::vector<double> &vec,
std::vector<std::vector<double>> *result, size_t beg, size_t end) {
// multiply a matrix by a pre-transposed column vector; returns a column vector
#pragma omp parallel
{
#pragma omp for nowait
for (auto i = beg; i < end; i++) {
(*result)[i] = {std::inner_product(mat[i].begin(), mat[i].end(), vec.begin(), 0.0)};
}
}
}
This approach has not resulted in any speedup; I would appreciate any help in choosing the correct OpenMP directives.
There are several things that could explain your lack of seeing performance improvement. The most promising ones are these:
These are not the only reasons that could explain some lack of scalability, but with the limited info you give, I think they are the most likely culprits.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.