[英]Assigning to std::vector<std::vector<double>> in parallel
I have some serial code that does a matrix-vector multiply with matrices represented as std::vector<std::vector<double>>
and std::vector<double>
, respectively: 我有一些串行代码,将矩阵向量与分别表示为
std::vector<std::vector<double>>
和std::vector<double>
的矩阵相乘:
void mat_vec_mult(const std::vector<std::vector<double>> &mat, const std::vector<double> &vec,
std::vector<std::vector<double>> *result, size_t beg, size_t end) {
// multiply a matrix by a pre-transposed column vector; returns a column vector
for (auto i = beg; i < end; i++) {
(*result)[i] = {std::inner_product(mat[i].begin(), mat[i].end(), vec.begin(), 0.0)};
}
}
I would like to parallelize it using OpenMP, which I am trying to learn. 我想使用我正在尝试学习的OpenMP将其并行化。 From here , I got to the following:
从这里开始 ,我了解以下内容:
void mat_vec_mult_parallel(const std::vector<std::vector<double>> &mat, const std::vector<double> &vec,
std::vector<std::vector<double>> *result, size_t beg, size_t end) {
// multiply a matrix by a pre-transposed column vector; returns a column vector
#pragma omp parallel
{
#pragma omp for nowait
for (auto i = beg; i < end; i++) {
(*result)[i] = {std::inner_product(mat[i].begin(), mat[i].end(), vec.begin(), 0.0)};
}
}
}
This approach has not resulted in any speedup; 这种方法没有导致任何加速。 I would appreciate any help in choosing the correct OpenMP directives.
在选择正确的OpenMP指令方面,我将不胜感激。
There are several things that could explain your lack of seeing performance improvement. 有几件事可以解释您缺乏看到性能改进的原因。 The most promising ones are these:
最有前途的是:
These are not the only reasons that could explain some lack of scalability, but with the limited info you give, I think they are the most likely culprits. 这些并不是可以解释某些缺乏可伸缩性的唯一原因,但是鉴于您提供的信息有限,我认为它们是最有可能的罪魁祸首。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.