简体   繁体   中英

Increasing pthreads thread count has no effect on speed

In the following program, increasing the number of threads does not yield any speed-up benefits at all (measured with the time command under linux). I have run on it on the following processors:

  • Intel i5 M520
  • Intel Xeon X5650

It seems to me that the logic of dividing the work among the threads is correct. I have even tried removing the lock, which obviously gives the wrong result, but still no increase in speed. Any ideas?

#include <pthread.h>
#include <stdio.h>
#include <math.h>
#include <stdlib.h>

double sum;
pthread_mutex_t mutex;
typedef struct {
    int start;
    int end;
}sumArg;

void *sumRoots(void *arg) {
    sumArg s = *((sumArg *) arg);
    int i = s.start;
    double tmp;
    while(i <= s.end) {
        tmp = sqrt(i);
        pthread_mutex_lock(&mutex);
        sum += tmp;
        pthread_mutex_unlock(&mutex);
        i++;
    }
    free(arg);
}

int main(int argc, char const *argv[]) {
    int threadCount = atoi(argv[1]);
    int N = atoi(argv[2]);
    if (N < 1 || threadCount < 1) printf("Usage: ./sumOfRoots threads N\n");

    pthread_t tid[threadCount];
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_mutex_init(&mutex, NULL);

    sumArg *s;
    int i = 0;
    while(i < threadCount) {
        s = (sumArg *) malloc(sizeof(sumArg));
        s->start = ((N/threadCount) * i) + 1;
        s->end = (N/threadCount) * (i + 1);
        pthread_create(&tid[i], &attr, sumRoots, s);
        i++;
    }

    i = 0;
    while(i < threadCount) pthread_join(tid[i++], NULL);
    printf("sum: %f\n", sum);
    return 0;
}

Edit: Here are some runs of the program, all running on an i5 M520:

time ./sumOfRoots 1 1000000000
sum: 21081851083600.558594

real    0m21.933s
user    0m21.268s
sys     0m0.000s

time ./sumOfRoots 2 1000000000
sum: 21081851083600.691406

real    0m21.207s
user    0m21.020s
sys     0m0.008s

time ./sumOfRoots 4 1000000000
sum: 21081851083600.863281

real    0m21.488s
user    0m21.116s
sys     0m0.016s

time ./sumOfRoots 8 1000000000
sum: 21081851083601.777344

real    0m21.432s
user    0m21.092s
sys     0m0.020s

I believe the variation in the sum is caused by floating point precision loss.

The reason why the timing remains virtually unchanged is that it it dominated by synchronization. On my computer a single-thread solution was even faster!

Changing the code as follows brings the timing in line with expectations:

void *sumRoots(void *arg) {
    sumArg s = *((sumArg *) arg);
    int i = s.start;
    double tmp = 0;
    while(i <= s.end) {
        tmp += sqrt(i++);
    }
    pthread_mutex_lock(&mutex);
    sum += tmp;
    pthread_mutex_unlock(&mutex);
    free(arg);
    return 0;
}

Now your thread runs for a while without synchronization, and then synchronizes only once during the addition.

The timing I see on my system is as follows:

> time ./a.out 1 1000000000
sum: 21081851083600.558594
real    0m13.220s
user    0m13.098s
sys 0m0.009s

> time ./a.out 2 1000000000
sum: 21081851083600.863281

real    0m6.613s
user    0m12.930s
sys 0m0.027s

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM