简体   繁体   中英

OpenMP implementation increasingly slow with thread count increase

I have been trying to learn to use OpenMP. However my code seemed to be running more quickly in series that parallel.

Indeed the more threads used, the slower the computation time.

To illustrate this I ran an experiment. I am trying to do the following operation:

long int C[num], D[num];
for (i=0; i<num; i++) C[i] = i;
for (i=0; i<num; i++){
    for (j=0; j<N; j++) {
        D[i] = pm(C[i]);
    }
 }

where the function pm is simply

int pm(int val) {
    val++;
    val--;
    return val;
}

I implemented the inner loop in parallel and compared the run times as a function of the number of iterations on the inner loop (N) and the number of threads used. The code for the experiment is below.

#include <stdio.h>
#include <iostream>
#include <time.h>
#include "omp.h"
#include <fstream>
#include <cstdlib>
#include <cmath>

static long num = 1000;
using namespace std;

int pm(int val) {
    val++;
    val--;
    return val;
}

int main() {

    int i, j, k, l;
    int iter = 8;
    int iterT = 4;
    long inum[iter];
    for (i=0; i<iter; i++) inum[i] = pow(10, i); 

    double serial[iter][iterT], parallel[iter][iterT];

    ofstream outdata;
    outdata.open("output.dat");
    if (!outdata) {
        std::cerr << "Could not open file." << std::endl;
        exit(1);
    }

    """Experiment Start"""
    for (l=1; l<iterT+1; l++) {
        for (k=0; k<iter; k++) {
            clock_t start = clock();
            long int A[num], B[num];
            omp_set_num_threads(l);
            for (i=0; i<num; i++) A[i] = i;
            for (i=0; i<num; i++){
                #pragma omp parallel for schedule(static)
                for (j=0; j<inum[k]; j++) {
                    B[i] = pm(A[i]);
                }
            }  
            clock_t finish = clock();
            parallel[k][l-1] = (double) (finish - start) /\ 
            CLOCKS_PER_SEC * 1000.0;

            start =   clock();
            long int C[num], D[num];
            for (i=0; i<num; i++) C[i] = i;
            for (i=0; i<num; i++){
                for (j=0; j<inum[k]; j++) {
                    D[i] = pm(C[i]);
                }
            }
            finish = clock();
            serial[k][l-1] = (double) (finish - start) /\ 
            CLOCKS_PER_SEC * 1000.0;
        }
    }
    """Experiment End"""


    for (j=0; j<iterT; j++) {
        for (i=0; i<iter; i++) {
            outdata << inum[i] << " " << j + 1 << " " << serial[i][j]\
            << " " << parallel[i][j]<< std::endl;
        }
    }
    outdata.close();
    return 0;
}

The link below is a plot of log(T) against log(N) for each thread count.

A plot of the run times for varying number of threads and magnitude of computational task.

(I just noticed that the legend labels for serial and parallel are the wrong way around).

As you can see using more than one thread increases the time greatly. Adding more threads increases the time taken linearly as a function of number of threads.

Can anyone tell me whats going on?

Thanks!

Freakish above was correct about the pm() function doing nothing, and the compiler was getting confused.

It also turns out that the rand() function does not play well withing OpenMP for loops.

Adding the function sqrt(i) (i being the loop index) I achieved the expected speedup to my code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM