简体   繁体   English

OpenMP花费的时间比预期的要多

[英]OpenMP takes more time than expected

So, I am facing some difficulties using openMp. 所以,我在使用openMp时遇到了一些困难。 I am beginner and I do not know what I am doing wrong. 我是初学者,我不知道我做错了什么。 This is a project for one of my courses at University, so I do not seek for the solution, rather for a hint or an explanation. 这是我在大学的一门课程的项目,所以我不寻求解决方案,而是寻求提示或解释。

The project is to calculate the hamming distance between 2 strings that belong in different sets (lets say setA and setB). 该项目是计算属于不同集合的2个字符串之间的汉明距离(比如setA和setB)。 Those two sets may contain 100,1000 or 10000 strings which each one of them is composed by the same length of chars. 这两组可以包含100,1000或10000个字符串,每个字符串由相同长度的字符组成。

My problem is that despite the fact that I have decreased the execution time of the parallel program it still takes more time than the serial algorithm. 我的问题是,尽管我减少了并行程序的执行时间,但它仍然需要比串行算法更多的时间。

So, I attach my codes for showing what I have done so far. 所以,我附上我的代码来展示我到目前为止所做的工作。

serial C code. 串行C代码。

void main(int argc,char **argv)
{

//initialize sets' number and string's length
int m=atoi(argv[1]),n=atoi(argv[2]),I=atoi(argv[3]);
int i=0,j=0,l=0,TotalHammingDistance=0,count;

//creation of 2-dimentional matrices for setA and setB
char **setA = malloc(m * sizeof(char *)); // Allocate row pointers
for(i = 0; i < m; i++)
    setA[i] = malloc((I+1) * sizeof(char));  // Allocate each row separatel

char **setB = malloc(n * sizeof(char *)); // Allocate row pointers
for(i = 0; i < n; i++)
    setB[i] = malloc((I+1) * sizeof(char));  // Allocate each row separatel

// initialize matrices with random string (0 and 1)
for (i=0;i<m;i++){
    for(j=0;j<I;j++){
        setA[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
    }
    setA[i][I]='\0';
}

for (i=0;i<n;i++){
    for(j=0;j<I;j++){
        setB[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
    }
    setB[i][I]='\0';
}

//creation of m*n matrix to store all hamming distances and initialize it
int **HamDist = malloc(m * sizeof(int *)); // Allocate row pointers
for(i = 0; i < m; i++)
  HamDist[i] = malloc(n * sizeof(int));

for(i=0;i<m;i++){
    for(j=0;j<n;j++){
        HamDist[i][j]=0;
    }
}

clock_t start=clock();
//Calculate hamming distance for all combinations of the strings
for (i=0;i<m;i++){
    for(j=0;j<n;j++){
        count=0;
        for(l=0;l<=I;l++) {
            if (setA[i][l] != setB[j][l])
                count++;
        }
        HamDist[i][j]=count;
        TotalHammingDistance+=HamDist[i][j];
    }
}
clock_t end =clock();
double hamm_time=(double)(end-start)/CLOCKS_PER_SEC;

printf("\n|Total Hamming execution time= %f",hamm_time);
printf("\n\n*|The Total Hamming Distance is: %d\n",TotalHammingDistance );
} 

OpenMp C code OpenMp C代码

void main(int argc,char **argv)
{
//initialize sets' number and string's length
    int m=atoi(argv[1]),n=atoi(argv[2]),I=atoi(argv[3]);
     int i=0,j=0,TotalHammingDistance=0, tid,nthreads,chunk;

    //creation of 2-dimentional matrices for setA and setB      
    char **setA = malloc(m * sizeof(char *)); // Allocate row pointers
    for(i = 0; i < m; i++)
      setA[i] = malloc((I+1) * sizeof(char));  // Allocate each row separatel

    char **setB = malloc(n * sizeof(char *)); // Allocate row pointers
    for(i = 0; i < n; i++)
      setB[i] = malloc((I+1) * sizeof(char));  // Allocate each row separatel

    // initialize matrices with random string (0 and 1)
    for (i=0;i<m;i++){
        for(j=0;j<I;j++){
            setA[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
        }
        setA[i][I]='\0';
    }

    for (i=0;i<n;i++){
        for(j=0;j<I;j++){
            setB[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
        }
        setB[i][I]='\0';
    }

    //creation of m*n matrix to store all hamming distances and initialize it
    uint16_t **HamDist = malloc(m * sizeof(uint16_t *)); // Allocate row pointers
    for(i = 0; i < m; i++)
      HamDist[i] = malloc(n * sizeof(uint16_t));

    for(i=0;i<m;i++){
        for(j=0;j<n;j++){
            HamDist[i][j]=0;
        }
    }

    printf("\n HamDist set \n" );
    int count=0;
    clock_t start=clock();

    omp_set_num_threads(2);
    #pragma omp parallel shared(setA, setB,HamDist ) 
    {
        int k,p,l,count=0;
        #pragma omp for schedule(dynamic, 10000)        
        for (k=0;k<m;k++){
             for(p=0;p<n;p++){
                count=0;
                for(l=0;l<=I;l++){
                    if (setA[k][l] != setB[p][l]){
                        count++;
                    }
                }
                HamDist[k][p]=count;
            }
        }
    }

    clock_t end =clock();
    double per_time=(double)(end-start)/CLOCKS_PER_SEC;
    printf("\n|Total time for two sets= %f",per_time);

    /**/
    for (i=0;i<m;i++){
          for(j=0;j<n;j++){
              TotalHammingDistance+=HamDist[i][j];
          }
    }

printf("\n|Total execution time= %f",per_time);
printf("\n\n*|The Total Hamming Distance is: %d\n",TotalHammingDistance );
}

The execution time that I receive is around 42.011104 for the openmp program and about 32.876482 for the serial algorithm (m=n=10000 and I= 100, where m,n describes the number of strings at each set and I is the string length) 我收到的openmp程序的执行时间约为42.011104,串行算法的执行时间约为32.876482(m = n = 10000,I = 100,其中m,n描述每组的字符串数,I是字符串长度)

I strongly believe that the parallel program should takes less execution time. 我坚信并行程序应该花费更少的执行时间。 Any idea?? 任何的想法??

Thanks in advance! 提前致谢!

Measuring multiprocessor performance is a bit more complicated but we can do a good approximation of "Does it work or not?" 测量多处理器性能有点复杂,但我们可以很好地近似“它是否工作?” with time(1) . 随着time(1) If I do it with your code as it is (with GCC gcc-4.8.real (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5 invoked with gcc -W -Wall -Wextra -O3 -fopenmp openmptest.c -o openmptest ) I got 如果我按照你的代码那样做(使用GCC gcc-4.8.real(Ubuntu 4.8.5-2ubuntu1~14.04.1)4.8.5用gcc -W -Wall -Wextra -O3 -fopenmp openmptest.c -o openmptest )我得到了

$ time ./openmptest 10000 10000 100

 HamDist set 

|Total time for two sets= 9.620011
|Total execution time= 9.620011

*|The Total Hamming Distance is: 1248788142

real    0m9.815s
user    0m9.700s
sys 0m0.116s

Where both, real and user are roughly the same value and also roughly the same as the normal version. 其中,real和user的值大致相同,也与普通版本大致相同。 If I remove schedule(dynamic, 10000) completely and let Openmp decide for itself I get 如果我完全删除schedule(dynamic, 10000)并让Openmp自己决定

$ time ./openmptest 10000 10000 100
 HamDist set 

|Total time for two sets= 9.187761
|Total execution time= 9.187761

*|The Total Hamming Distance is: 1248788142

real    0m4.819s
user    0m9.265s
sys 0m0.112s

That is 5/9 instead of 9/9. 那是5/9而不是9/9。 If I set omp_set_num_threads(2) to 4 instead (I have four CPUs here.) I get 如果我将omp_set_num_threads(2)设置为4(我这里有4个CPU。)我得到了

$ time ./openmptest 10000 10000 100
 HamDist set 

|Total time for two sets= 11.438243
|Total execution time= 11.438243

*|The Total Hamming Distance is: 1248788142

real    0m3.080s
user    0m11.540s
sys 0m0.104s

That is 3/11 < 5/9 < 9/9. 那是3/11 <5/9 <9/9。 So it works as expected if you let OpenMP do it itself. 因此,如果您让OpenMP自行完成,它会按预期工作。 Removing omp_set_num_threads() gave no difference to the last try. 删除omp_set_num_threads()与上次尝试没有区别。

You have a very simple program where OpenMP's defaults work quite well. 你有一个非常简单的程序,其中OpenMP的默认设置工作得很好。 Fine-tuning OpenMP is a science in and of itself but for example @Davislor 's comment about using reduction seems to be a good one to start with. 微调OpenMP本身就是一门科学,但是例如@Davislor关于使用reduction的评论似乎是一个很好的开始。

BTW: You also have a lot of warnings, one of them is about shadowing count which you declared two times, one before the loop and one inside. 顺便说一句:你也有很多警告,其中一个是关于你声明两次的阴影count ,一个在循环之前,一个在里面。 You should get rid of all the warnings. 你应该摆脱所有的警告。 It happens more often than not that a very significant information is hidden in between those dozens of warnings. 在这些数十个警告之间隐藏着非常重要的信息,这种情况经常发生。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 OpenMP 矩阵乘法花费的时间比预期的要长 - OpenMP matrix multiplication takes more time than expected 对于小于 450 的值,Parallel Openmp 需要更多时间 - Parallel Openmp takes more time for value less than 450 使用 OpenMP 执行并行代码比执行串行代码需要更多时间 - Parallel code with OpenMP takes more time to execute than serial code 使用“ w”比“ a”模式花费更多时间 - fopen takes more time with “w” than “a” mode 与单个语句相比,两个语句中的strncmp需要更多时间 - strncmp in two statements takes more time than in single statement 与通过bash脚本调用函数相比,直接调用函数要花费更多时间 - Calling a function directly takes more time than calling it by a bash script 使用OpenMP乘法矩阵比序列化方法花费更多的时间 - Multiplying matrices using OpenMP taking much more time than serialized way 使用 OpenMP 并行执行比 C 中的串行执行花费更长的时间? - Parallel exection using OpenMP takes longer than serial execution in C? aio_write比ext4上的普通写入花费更多时间? - aio_write takes more time than plain write on ext4? 使用2个线程对2个数组进行排序要比对2个数组进行逐一排序花费更多的时间 - Sorting 2 arrays using 2 threads takes more time than sorting the 2 arrays one by one
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM