[英]OpenMP takes more time than expected
So, I am facing some difficulties using openMp. 所以,我在使用openMp时遇到了一些困难。 I am beginner and I do not know what I am doing wrong.
我是初学者,我不知道我做错了什么。 This is a project for one of my courses at University, so I do not seek for the solution, rather for a hint or an explanation.
这是我在大学的一门课程的项目,所以我不寻求解决方案,而是寻求提示或解释。
The project is to calculate the hamming distance between 2 strings that belong in different sets (lets say setA and setB). 该项目是计算属于不同集合的2个字符串之间的汉明距离(比如setA和setB)。 Those two sets may contain 100,1000 or 10000 strings which each one of them is composed by the same length of chars.
这两组可以包含100,1000或10000个字符串,每个字符串由相同长度的字符组成。
My problem is that despite the fact that I have decreased the execution time of the parallel program it still takes more time than the serial algorithm. 我的问题是,尽管我减少了并行程序的执行时间,但它仍然需要比串行算法更多的时间。
So, I attach my codes for showing what I have done so far. 所以,我附上我的代码来展示我到目前为止所做的工作。
serial C code. 串行C代码。
void main(int argc,char **argv)
{
//initialize sets' number and string's length
int m=atoi(argv[1]),n=atoi(argv[2]),I=atoi(argv[3]);
int i=0,j=0,l=0,TotalHammingDistance=0,count;
//creation of 2-dimentional matrices for setA and setB
char **setA = malloc(m * sizeof(char *)); // Allocate row pointers
for(i = 0; i < m; i++)
setA[i] = malloc((I+1) * sizeof(char)); // Allocate each row separatel
char **setB = malloc(n * sizeof(char *)); // Allocate row pointers
for(i = 0; i < n; i++)
setB[i] = malloc((I+1) * sizeof(char)); // Allocate each row separatel
// initialize matrices with random string (0 and 1)
for (i=0;i<m;i++){
for(j=0;j<I;j++){
setA[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
}
setA[i][I]='\0';
}
for (i=0;i<n;i++){
for(j=0;j<I;j++){
setB[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
}
setB[i][I]='\0';
}
//creation of m*n matrix to store all hamming distances and initialize it
int **HamDist = malloc(m * sizeof(int *)); // Allocate row pointers
for(i = 0; i < m; i++)
HamDist[i] = malloc(n * sizeof(int));
for(i=0;i<m;i++){
for(j=0;j<n;j++){
HamDist[i][j]=0;
}
}
clock_t start=clock();
//Calculate hamming distance for all combinations of the strings
for (i=0;i<m;i++){
for(j=0;j<n;j++){
count=0;
for(l=0;l<=I;l++) {
if (setA[i][l] != setB[j][l])
count++;
}
HamDist[i][j]=count;
TotalHammingDistance+=HamDist[i][j];
}
}
clock_t end =clock();
double hamm_time=(double)(end-start)/CLOCKS_PER_SEC;
printf("\n|Total Hamming execution time= %f",hamm_time);
printf("\n\n*|The Total Hamming Distance is: %d\n",TotalHammingDistance );
}
OpenMp C code OpenMp C代码
void main(int argc,char **argv)
{
//initialize sets' number and string's length
int m=atoi(argv[1]),n=atoi(argv[2]),I=atoi(argv[3]);
int i=0,j=0,TotalHammingDistance=0, tid,nthreads,chunk;
//creation of 2-dimentional matrices for setA and setB
char **setA = malloc(m * sizeof(char *)); // Allocate row pointers
for(i = 0; i < m; i++)
setA[i] = malloc((I+1) * sizeof(char)); // Allocate each row separatel
char **setB = malloc(n * sizeof(char *)); // Allocate row pointers
for(i = 0; i < n; i++)
setB[i] = malloc((I+1) * sizeof(char)); // Allocate each row separatel
// initialize matrices with random string (0 and 1)
for (i=0;i<m;i++){
for(j=0;j<I;j++){
setA[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
}
setA[i][I]='\0';
}
for (i=0;i<n;i++){
for(j=0;j<I;j++){
setB[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
}
setB[i][I]='\0';
}
//creation of m*n matrix to store all hamming distances and initialize it
uint16_t **HamDist = malloc(m * sizeof(uint16_t *)); // Allocate row pointers
for(i = 0; i < m; i++)
HamDist[i] = malloc(n * sizeof(uint16_t));
for(i=0;i<m;i++){
for(j=0;j<n;j++){
HamDist[i][j]=0;
}
}
printf("\n HamDist set \n" );
int count=0;
clock_t start=clock();
omp_set_num_threads(2);
#pragma omp parallel shared(setA, setB,HamDist )
{
int k,p,l,count=0;
#pragma omp for schedule(dynamic, 10000)
for (k=0;k<m;k++){
for(p=0;p<n;p++){
count=0;
for(l=0;l<=I;l++){
if (setA[k][l] != setB[p][l]){
count++;
}
}
HamDist[k][p]=count;
}
}
}
clock_t end =clock();
double per_time=(double)(end-start)/CLOCKS_PER_SEC;
printf("\n|Total time for two sets= %f",per_time);
/**/
for (i=0;i<m;i++){
for(j=0;j<n;j++){
TotalHammingDistance+=HamDist[i][j];
}
}
printf("\n|Total execution time= %f",per_time);
printf("\n\n*|The Total Hamming Distance is: %d\n",TotalHammingDistance );
}
The execution time that I receive is around 42.011104 for the openmp program and about 32.876482 for the serial algorithm (m=n=10000 and I= 100, where m,n describes the number of strings at each set and I is the string length) 我收到的openmp程序的执行时间约为42.011104,串行算法的执行时间约为32.876482(m = n = 10000,I = 100,其中m,n描述每组的字符串数,I是字符串长度)
I strongly believe that the parallel program should takes less execution time. 我坚信并行程序应该花费更少的执行时间。 Any idea??
任何的想法??
Thanks in advance! 提前致谢!
Measuring multiprocessor performance is a bit more complicated but we can do a good approximation of "Does it work or not?" 测量多处理器性能有点复杂,但我们可以很好地近似“它是否工作?” with
time(1)
. 随着
time(1)
。 If I do it with your code as it is (with GCC gcc-4.8.real (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5 invoked with gcc -W -Wall -Wextra -O3 -fopenmp openmptest.c -o openmptest
) I got 如果我按照你的代码那样做(使用GCC gcc-4.8.real(Ubuntu 4.8.5-2ubuntu1~14.04.1)4.8.5用
gcc -W -Wall -Wextra -O3 -fopenmp openmptest.c -o openmptest
)我得到了
$ time ./openmptest 10000 10000 100
HamDist set
|Total time for two sets= 9.620011
|Total execution time= 9.620011
*|The Total Hamming Distance is: 1248788142
real 0m9.815s
user 0m9.700s
sys 0m0.116s
Where both, real and user are roughly the same value and also roughly the same as the normal version. 其中,real和user的值大致相同,也与普通版本大致相同。 If I remove
schedule(dynamic, 10000)
completely and let Openmp decide for itself I get 如果我完全删除
schedule(dynamic, 10000)
并让Openmp自己决定
$ time ./openmptest 10000 10000 100
HamDist set
|Total time for two sets= 9.187761
|Total execution time= 9.187761
*|The Total Hamming Distance is: 1248788142
real 0m4.819s
user 0m9.265s
sys 0m0.112s
That is 5/9 instead of 9/9. 那是5/9而不是9/9。 If I set
omp_set_num_threads(2)
to 4 instead (I have four CPUs here.) I get 如果我将
omp_set_num_threads(2)
设置为4(我这里有4个CPU。)我得到了
$ time ./openmptest 10000 10000 100
HamDist set
|Total time for two sets= 11.438243
|Total execution time= 11.438243
*|The Total Hamming Distance is: 1248788142
real 0m3.080s
user 0m11.540s
sys 0m0.104s
That is 3/11 < 5/9 < 9/9. 那是3/11 <5/9 <9/9。 So it works as expected if you let OpenMP do it itself.
因此,如果您让OpenMP自行完成,它会按预期工作。 Removing
omp_set_num_threads()
gave no difference to the last try. 删除
omp_set_num_threads()
与上次尝试没有区别。
You have a very simple program where OpenMP's defaults work quite well. 你有一个非常简单的程序,其中OpenMP的默认设置工作得很好。 Fine-tuning OpenMP is a science in and of itself but for example @Davislor 's comment about using
reduction
seems to be a good one to start with. 微调OpenMP本身就是一门科学,但是例如@Davislor关于使用
reduction
的评论似乎是一个很好的开始。
BTW: You also have a lot of warnings, one of them is about shadowing count
which you declared two times, one before the loop and one inside. 顺便说一句:你也有很多警告,其中一个是关于你声明两次的阴影
count
,一个在循环之前,一个在里面。 You should get rid of all the warnings. 你应该摆脱所有的警告。 It happens more often than not that a very significant information is hidden in between those dozens of warnings.
在这些数十个警告之间隐藏着非常重要的信息,这种情况经常发生。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.