简体   繁体   English

增加pthreads线程数对速度没有影响

[英]Increasing pthreads thread count has no effect on speed

In the following program, increasing the number of threads does not yield any speed-up benefits at all (measured with the time command under linux). 在以下程序中,增加线程数并不会产生任何加速优势(使用linux下的time命令测量)。 I have run on it on the following processors: 我在以下处理器上运行它:

  • Intel i5 M520 英特尔i5 M520
  • Intel Xeon X5650 英特尔至强X5650

It seems to me that the logic of dividing the work among the threads is correct. 在我看来,在线程之间划分工作的逻辑是正确的。 I have even tried removing the lock, which obviously gives the wrong result, but still no increase in speed. 我甚至尝试取下锁,这显然会给出错误的结果,但仍然没有增加速度。 Any ideas? 有任何想法吗?

#include <pthread.h>
#include <stdio.h>
#include <math.h>
#include <stdlib.h>

double sum;
pthread_mutex_t mutex;
typedef struct {
    int start;
    int end;
}sumArg;

void *sumRoots(void *arg) {
    sumArg s = *((sumArg *) arg);
    int i = s.start;
    double tmp;
    while(i <= s.end) {
        tmp = sqrt(i);
        pthread_mutex_lock(&mutex);
        sum += tmp;
        pthread_mutex_unlock(&mutex);
        i++;
    }
    free(arg);
}

int main(int argc, char const *argv[]) {
    int threadCount = atoi(argv[1]);
    int N = atoi(argv[2]);
    if (N < 1 || threadCount < 1) printf("Usage: ./sumOfRoots threads N\n");

    pthread_t tid[threadCount];
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_mutex_init(&mutex, NULL);

    sumArg *s;
    int i = 0;
    while(i < threadCount) {
        s = (sumArg *) malloc(sizeof(sumArg));
        s->start = ((N/threadCount) * i) + 1;
        s->end = (N/threadCount) * (i + 1);
        pthread_create(&tid[i], &attr, sumRoots, s);
        i++;
    }

    i = 0;
    while(i < threadCount) pthread_join(tid[i++], NULL);
    printf("sum: %f\n", sum);
    return 0;
}

Edit: Here are some runs of the program, all running on an i5 M520: 编辑:以下是程序的一些运行,全部在i5 M520上运行:

time ./sumOfRoots 1 1000000000
sum: 21081851083600.558594

real    0m21.933s
user    0m21.268s
sys     0m0.000s

time ./sumOfRoots 2 1000000000
sum: 21081851083600.691406

real    0m21.207s
user    0m21.020s
sys     0m0.008s

time ./sumOfRoots 4 1000000000
sum: 21081851083600.863281

real    0m21.488s
user    0m21.116s
sys     0m0.016s

time ./sumOfRoots 8 1000000000
sum: 21081851083601.777344

real    0m21.432s
user    0m21.092s
sys     0m0.020s

I believe the variation in the sum is caused by floating point precision loss. 我相信总和的变化是由浮点精度损失引起的。

The reason why the timing remains virtually unchanged is that it it dominated by synchronization. 时间基本保持不变的原因在于它以同步为主导。 On my computer a single-thread solution was even faster! 在我的计算机上,单线程解决方案甚至更快!

Changing the code as follows brings the timing in line with expectations: 如下更改代码会使时间符合预期:

void *sumRoots(void *arg) {
    sumArg s = *((sumArg *) arg);
    int i = s.start;
    double tmp = 0;
    while(i <= s.end) {
        tmp += sqrt(i++);
    }
    pthread_mutex_lock(&mutex);
    sum += tmp;
    pthread_mutex_unlock(&mutex);
    free(arg);
    return 0;
}

Now your thread runs for a while without synchronization, and then synchronizes only once during the addition. 现在你的线程运行了一段时间没有同步,然后在添加过程中只进行一次同步。

The timing I see on my system is as follows: 我在系统上看到的时间如下:

> time ./a.out 1 1000000000
sum: 21081851083600.558594
real    0m13.220s
user    0m13.098s
sys 0m0.009s

> time ./a.out 2 1000000000
sum: 21081851083600.863281

real    0m6.613s
user    0m12.930s
sys 0m0.027s

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM