與向量向量並行的OpenMP

Question

我有一個大小為W x H的固定大小的2D矩陣，矩陣中的每個元素都是一個std :: vector。 數據存儲在具有線性索引的向量的向量中。 我正在嘗試找到一種方法來同時填充輸出向量。 這是一些代碼來指示我要執行的操作。

#include <cmath>
#include <chrono>
#include <iostream>
#include <mutex>
#include <vector>
#include <omp.h>

struct Vector2d
{
    double x;
    double y;
};

double generate(double range_min, double range_max)
{
    double val = (double)rand() / RAND_MAX;
    return range_min + val * (range_max - range_min);
}

int main(int argc, char** argv)
{
    (void)argc;
    (void)argv;

    // generate input data
    std::vector<Vector2d> points;
    size_t num = 10000000;
    size_t w = 100;
    size_t h = 100;

    for (size_t i = 0; i < num; ++i)
    {
        Vector2d point;
        point.x = generate(0, w);
        point.y = generate(0, h);
        points.push_back(point);
    }

    // output
    std::vector<std::vector<Vector2d> > output(num, std::vector<Vector2d>());
    std::mutex mutex;

    auto start = std::chrono::system_clock::now();

    #pragma omp parallel for
    for (size_t i = 0; i < num; ++i)
    {
        const Vector2d point = points[i];
        size_t x = std::floor(point.x);
        size_t y = std::floor(point.y);
        size_t id = y * w + x;
        mutex.lock();
        output[id].push_back(point);
        mutex.unlock();
    }

    auto end = std::chrono::system_clock::now();
    std::chrono::duration<double> elapsed_seconds = end - start;
    std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";

    return 0;
}

問題在於啟用了openmp的代碼要慢得多。 我找到了一些使用歸約法填充std :: vector的示例，但我不知道如何使它適應矢量的向量。 任何幫助表示感謝，謝謝！

Answer 1

您可以采取一些措施來改善性能：

我會預分配包含Vector2d類的第二個向量，因為每當您push_back一個新的Vector2d並超過std::vector的容量時，它將重新分配。 因此，如果您不在意在std::vector中初始化Vector2d ，我將只使用：

std::vector<std::vector<Vector2d> > output(num, 
               std::vector<Vector2d>(num, Vector2d(/*whatever goes in here*/)));

然后在for循環中，您可以通過operator[]來訪問第二個向量中的元素，這使您可以擺脫鎖定。

#pragma omp parallel for
for (size_t i = 0; i < num; ++i)
{
    const Vector2d point = points[i];
    size_t x = std::floor(point(0));
    size_t y = std::floor(point(1));
    size_t id = y * w + x;
    output[id][i] = num;
}

盡管我不確定，但上述方法可以滿足您的需求。 否則，您可以為每個std::vector<Vector2d> 保留存儲空間，這將使您進入初始循環：

std::vector<std::vector<Vector2d> > output(num, std::vector<Vector2d>());
for(int i = 0; i < num; ++i) {
    output[i].reserve(num);
}

#pragma omp parallel for
for (size_t i = 0; i < num; ++i)
{
    const Vector2d point = points[i];
    size_t x = std::floor(point(0));
    size_t y = std::floor(point(1));
    size_t id = y * w + x;
    mutex.lock();
    output[id].push_back(point);
    mutex.unlock();
}

這意味着您擺脫了向量的重新分配，但是仍然擁有互斥鎖...

與向量向量並行的OpenMP

問題描述

1 個解決方案

解決方案1
0 2018-01-13 11:29:41

與向量向量並行的OpenMP

問題描述

1 個解決方案

解決方案1 0 2018-01-13 11:29:41

解決方案1
0 2018-01-13 11:29:41