OpenMP 並行乘法比順序乘法慢

Question

我正在學習OpenMP ，我正在嘗試做一個簡單的任務： A[r][c] * X[c] = B[r] （矩陣向量乘法）。 問題是：順序代碼比並行快，我不知道為什么！ 我的代碼：

#include <omp.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <sys/time.h>
#include <sys/types.h>


// Defined variables
#define row_matriz_A 80000
#define col_matriz_A 800
#define THREADS_NUM 4

// FUNCAO - GERAR MATRIZES
void gerarMatrizes(int r, int c, int mA[], int vX[], int vB[]){...}

// FUNCAO - SEQUENTIAL MULTIPLICATION
void multSequencial(int r, int c, int mA[], int vX[], int vB[]){
    // Variables
    int i, j, offset, sum;                        
    struct timeval tv1,tv2;  
    double t1, t2;        
    // Begin Time
    gettimeofday(&tv1, NULL);
    t1 = (double)(tv1.tv_sec) + (double)(tv1.tv_usec)/ 1000000.00;
    for(i = 0; i < r; i++){
        sum = 0;
        for(j = 0; j < c; j++){
            offset = i * c + j;
            sum += mA[offset] * vX[j];
        }
        vB[i] = sum;
    }
    // End time
    gettimeofday(&tv2, NULL);
    t2 = (double)(tv2.tv_sec) + (double)(tv2.tv_usec)/ 1000000.00;
    printf("\nO tempo de execucao sequencial foi: %lf segundos.\n", (t2 - t1));
    return;
}

// FUNCAO - MULTIPLICACAO PARALELA COM OpenMP
void matvecHost(int r, int c, int mA[], int vX[], int vB[]){
    // Variaveis
    int tID, i, j, offset, sum;
    struct timeval tv1, tv2;
    double t1, t2;
    // Init vB
    for(i = 0; i < r; i++) vB[i] = 0;
    // BEGIN Time
    gettimeofday(&tv1, NULL);
    t1 = (double)(tv1.tv_sec) + (double)(tv1.tv_usec)/ 1000000.00;
    omp_set_num_threads(THREADS_NUM);
    #pragma omp parallel private(tID, i, j) shared(mA, vB, vX)
    {
        tID = omp_get_thread_num();     
        #pragma omp for
            for(i = 0; i < r; i++){
                sum = 0;
                for(j = 0; j < c; j++){
                    offset = i * c + j;
                    sum += mA[offset] * vX[j];
                }
                vB[i] = sum;
            }
    }
    // End time
    gettimeofday(&tv2, NULL);
    t2 = (double)(tv2.tv_sec) + (double)(tv2.tv_usec)/ 1000000.00;
    printf("\nO tempo de execucao OpenMP foi: %lf segundos.\n", (t2 - t1));
    return;
}

// FUNCAO - PRINCIPAL
int main(int argc, char * argv[]) {
    int row, col;
    row = row_matriz_A;
    col = col_matriz_A;
    int *matrizA = (int *)calloc(row * col, sizeof(int));
    int *vectorX = (int *)calloc(col * 1, sizeof(int));
    int *vectorB = (int *)calloc(row * 1, sizeof(int));
    gerarMatrizes(row, col, matrizA, vectorX, vectorB);                    
    multSequencial(row, col, matrizA, vectorX, vectorB);
    matvecHost(row, col, matrizA, vectorX, vectorB);
    return 0;
}

以前不起作用的解決方案：

在我的平方中使用折疊
增加行和列的大小
增加線程數（有老師推薦使用線程數==線程物理數）
使用 malloc 而不是 m[i][j]

編輯 - 答案

我的並行塊已根據正確答案正確更改：

#pragma omp parallel private(i, j, sum) shared(mA, vB, vX)
{
    #pragma omp for
        for(i = 0; i < r; i++){
            sum = 0;
            for(j = 0; j < c; j++){
                sum += mA[i * c + j] * vX[j];
            }
            vB[i] = sum;
        }
}

我還是有些疑惑：

如果我在並行塊中定義i 、 j和sum ，它們會自動設置為私有嗎？ 這是否提高了我的代碼速度？

Answer 1

您在sum和offset上有競爭條件——它們在線程之間共享，而不是線程私有的。

這也可能解釋了速度變慢：在 x86 上，CPU 實際上會努力確保對共享變量的訪問“有效”。 這涉及在每次 (!) 寫入offset和sum后刷新緩存行 - 因此所有線程都瘋狂地寫入相同的變量，但每個線程都必須等到前一個線程（在不同內核上）的寫入到達刷新后再次本地緩存。 當然，它會產生完全荒謬的結果。

我不知道您為什么要在函數開始時聲明所有變量 - 這很容易出現此類錯誤。 如果您tID在盡可能小的范圍內聲明i 、 j 、 sum和offset （以及未使用的tID ），您就不會遇到這個問題，因為在這種情況下它們將自動成為線程私有的。

OpenMP 並行乘法比順序乘法慢

問題描述

編輯 - 答案

1 個解決方案

解決方案1
1 已采納 2019-11-29 13:57:17

OpenMP 並行乘法比順序乘法慢

問題描述

編輯 - 答案

1 個解決方案

解決方案1 1 已采納 2019-11-29 13:57:17

解決方案1
1 已采納 2019-11-29 13:57:17