简体   繁体   English

OpenMP的C:矩阵乘以向量

[英]C with OpenMP: Matrix times vector

I tried to make matrix times vector with a lot of loop and i want to speedup the process 我试图用很多循环使矩阵乘以向量,并且我想加快这个过程

Here's my code 这是我的代码

#include <stdio.h>
#include <time.h>
#include <omp.h>

int main()
{
    int i, j, n, a[719][719], b[719], c[719];

    clock_t start = clock();

    n = 100; //Max 719

    printf("Matrix A\n");

    for (i = 0; i < n; ++i) {
        for (j = 0; j < n; ++j) {
            a[i][j] = 10;
            printf("%d ", a[i][j]);
        }
        printf("\n");
    }

    printf("\nMatrix B\n");

    #pragma omp parallel private(i) shared(b)
    {
        #pragma omp for
        for (i = 0; i < n; ++i) {
            b[i] = 5;
            printf("%d\n", b[i]);
        }
    }

    printf("\nA * B\n");

    #pragma omp parallel private(i) shared(c)
    {
        #pragma omp for
        for (i = 0; i < n; ++i) {
            c[i] = 0;
        }
    }

    #pragma omp parallel private(i,j) shared(n,a,b,c)
    {
        #pragma omp for schedule(dynamic)
        for (i = 0; i < n; ++i) {
            for (j = 0; j < n; ++j) {
                c[i] += b[j] * a[j][i];
            }
        }
    }


    #pragma omp parallel private(i) shared(c)
    {
        #pragma omp for
        for (i = 0; i < n; ++i) {
            printf("%d\n", c[i]);
        }
    }

    clock_t stop = clock();
    double elapsed = (double)(stop - start) / CLOCKS_PER_SEC;
    printf("\nTime elapsed: %.5f\n", elapsed);

    return 0;
}

I also think that there are a lot of ineffective part in this code, I would appreciate it if anyone could fix the code into the effective one and speed up the process 我还认为该代码中有很多无效的部分,如果有人可以将代码修复为有效的代码并加快处理速度,我将不胜感激

I warn you that i recently tried to make a similar thing (matrix multiplication) and i didn't get the results i hoped: with two cores and hyperthreading i ran the my program and the speedup respect to the linear implementation was very little and only using matrices very big. 我警告过您,我最近尝试做类似的事情(矩阵乘法),但没有得到我希望的结果:使用两个内核和超线程,我运行了我的程序,并且对线性实现的加速方面很少而且只有使用矩阵很大。 With matrices of small size you will only slow down your algorithm due to thread overhead. 对于较小的矩阵,由于线程开销,只会减慢算法速度。

You can use the collapse(n) statement. 您可以使用collapse(n)语句。 The threading is applayed to the nested loops. 线程被应用于嵌套循环。 You should reduced your overhead. 您应该减少开销。 A fast overview of OpenMP directives can be found here (also collapse): http://bisqwit.iki.fi/story/howto/openmp/ . 可以在此处找到OpenMP指令的快速概述(也可以折叠): http : //bisqwit.iki.fi/story/howto/openmp/

You can check the code i wrote here: http://pastebin.com/edi4DgrJ You can define at compile time the size of the matrices. 您可以检查我在此处编写的代码: http : //pastebin.com/edi4DgrJ您可以在编译时定义矩阵的大小。 Just change the define. 只需更改定义。

You can also use "condensed" OpenMP directives ( like parallel for ) that speed up your programming session (and i also think that readability of the code is better). 您还可以使用“压缩的” OpenMP指令(如parallel for )来加快您的编程会话速度(我也认为代码的可读性更好)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM