简体   繁体   English

操作系统:多线程减速程序(C)

[英]OS: Multiple Threads Slowing Down Program (C)

The idea is to write a program which accepts a number of random numbers to create, then divides the load between however many number of threads input by the user and measure the speed increase we get when using multiple threads. 想法是编写一个程序,该程序接受要创建的多个随机数,然后将负载分配给用户输入的任意多个线程,并测量使用多个线程时获得的速度增加。 My issue; 我的问题; however, is that the more threads I add, the slower my program goes. 但是,是我添加的线程越多,程序运行得越慢。 Not sure what is wrong. 不知道出什么问题了。 Here is a snippet of my code thus far: 到目前为止,这是我的代码片段:

...
    for (i=0; i<numThreads; i++){
        vals *values;
        values = (vals *)malloc(sizeof(vals));
        values->randoms = count;
        values->id = i;
        pthread_create(&tid[i], NULL, run, (void *) values);
    }

    for (i=0; i<numThreads; i++)
        pthread_join(tid[i], NULL);

    timeElapsed = getMilliSeconds() - timeStart;
    printf("Elapsed time:  %lf seconds\n",(double)(timeElapsed/1000.0));

    exit(EXIT_SUCCESS);
}

void *run(void *arg) {
    vals *values;
    long long int i;
    long long int randoms;

    values = (vals*)arg;
    randoms = values->randoms;
    srandom(values->id);

    for (i = 0; i < randoms; i++) {
        random();
    }

    pthread_exit(NULL);
}

vals is a struct which holds two int values (randoms and id). vals是一个包含两个int值(随机数和id)的结构。 randoms contains the amount of random numbers to generate divided by the number of threads (to divide the load) and id holds a unique id for each thread to be used as a seed. randoms包含要生成的随机数除以线程数(以划分负载),id包含每个用作种子的线程的唯一ID。 I needed to create the struct so I could have multiple values passed to my worker function called by the thread. 我需要创建该结构,以便可以将多个值传递给线程调用的worker函数。

Any ideas why it would run slower with more threads? 有什么想法为什么它在更多线程下运行会更慢?

A multi-threaded program may show improved performance in an environment where multiple CPUs are available. 在有多个CPU的环境中,多线程程序可能会显示出更高的性能。 However, when there is a lack of CPU resources available, each thread will have to wait to be scheduled for CPU time. 但是,当缺少可用的CPU资源时,每个线程将必须等待以安排CPU时间。 A 'context switch' is when one thread is switched out of a CPU, and another thread switched in. A 'context switch' is not an insignificant task. “上下文切换”是指一个线程从一个CPU中切换出来,而另一个线程已切换成一个线程。“上下文切换”不是一项无关紧要的任务。

Hence, the more threads, the more threads are waiting for CPU resources, and the more time the kernel spends doing context switches (instead of real work). 因此,线程越多,等待CPU资源的线程就越多,内核花费在进行上下文切换上的时间也就越多(而不是实际工作)。

Quite possibly you're encountering false sharing . 您很有可能遇到虚假共享 Generating a random number involves mutating some shared state, and multiple threads continually modifying the same values effectively eliminates any benefit you get from the CPU's memory cache. 生成随机数会涉及某些共享状态的突变,并且多个线程不断修改相同的值会有效地消除您从CPU的内存缓存中获得的任何好处。 What happens is that every time Thread A wants to access that shared state, it has to wait for Thread B's CPU core to flush its cache. 发生的情况是,每次线程A要访问该共享状态时,它都必须等待线程B的CPU内核刷新其缓存。 And any time Thread B wants to access it, it has to wait for Thread A's CPU core to flush its cache. 而且,只要线程B要访问它,它就必须等待线程A的CPU内核刷新其缓存。

Looked at another way, a single threaded program would do something like: 从另一种角度来看,单线程程序将执行以下操作:

Load state into CPU cache
for (i = 0 to randoms ...)
    generate random number

With two threads, each one is doing this: 有两个线程,每个线程都在执行此操作:

for (i = 0 to randoms ...)
    wait for other CPU core to flush its cache
    generate random number

My issue; 我的问题; however, is that the more threads I add, the slower my program goes. 但是,是我添加的线程越多,程序运行得越慢。

If you have more processing threads than CPU cores, then your program is going to slow down. 如果您的处理线程多于CPU内核,那么您的程序将会变慢。 With two cores, the absolute best you can do with a compute-bound operation is to run twice as fast as the single-thread solution. 使用两个内核,使用计算绑定操作可以做到的绝对最佳结果是运行速度是单线程解决方案的两倍。 If you have three threads then at some point the thread scheduler will have to stop one of the threads so that that third thread can get some time. 如果您有三个线程,那么线程调度器将不得不停止某个线程,以便第三个线程可以花一些时间。 These context switches take time--a relatively large amount of time in the context of a compute bound operation. 这些上下文切换需要时间-在计算绑定操作的上下文中,这是相对大量的时间。 In general, you don't want to have more compute-bound threads than CPU cores. 通常,您不希望有比CPU内核更多的计算绑定线程。

(Absent hyperthreading, of course. With hyperthreading, you could potentially have four threads running concurrently, although you're unlikely to get even a 3x improvement.) (当然,没有超线程。使用超线程,您可能同时运行四个线程,尽管您甚至不可能获得3倍的改进。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM