简体   繁体   English

使用4和8个线程运行的等时执行

[英]Equal time execution for running with 4 and 8 threads

I test some code using OpenMP. 我使用OpenMP测试一些代码。 Here it is: 这里是:

#include <chrono>
#include <iostream>
#include <omp.h>

#define NUM_THREADS 8
#define ARR_SIZE 10000

class A {
private: 
    int a[ARR_SIZE];
public:
    A() {
        for (int i = 0; i < ARR_SIZE; i++)
            a[i] = i;
    }
// <<-----------MAIN CODE HERE--------------->
    void fn(A &o1, A &o2) {
        int some = 0;
        #pragma omp parallel num_threads(NUM_THREADS)
        {
            #pragma omp for reduction(+:some)
            for (int i = 0; i < ARR_SIZE; i++) {
                for (int j = 0; j < ARR_SIZE; j++)
                    some += o1.a[i] * o2.a[j];
            }
        }
        std::cout << some <<std::endl;
    }
};

int main() {
    A a,b,c;
    auto start = std::chrono::high_resolution_clock::now();
    c.fn(a,b);
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = end - start;
    std::cout << elapsed.count();
}

Execution time: 执行时间处理时间:

  • 1 thread : 0.233663 sec 1个线程:0.233663秒
  • 2 threads : 0.12449 sec 2个主题:0.12449秒
  • 4 threads : 0.0665889 sec 4个主题:0.0665889秒
  • 8 threads : 0.0643735 sec 8个主题:0.0643735秒

    As you see, there is almost no difference between 4 and 8 threads execution. 如您所见,4到8个线程执行几乎没有区别。 What can be a reason of a such behavior? 这种行为的原因是什么? Also it would be nice, if you try this code on your machine ;). 如果你在你的机器上尝试这个代码也会很好;)。

PS My processor: PS我的处理器:

Model:               Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz 
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1

You have 4 physical cores. 你有4个物理核心。 The promise of hyperthreading is that each core can "think about" two tasks, and will dynamically between the two when it gets blocked on one (for instance, if it needs to wait for a memory operation to finish). 超线程承诺是每个核心都可以“思考”两个任务,并且当两个任务被阻塞时,它们将动态地在两个任务之间动作(例如,如果它需要等待内存操作完成)。 In theory, this means that the time wasted waiting for some operations to complete is reduced. 从理论上讲,这意味着等待某些操作完成所浪费的时间减少了。 However, in practice, actual performance gains tend to be nowhere close to the 2x improvement that you'd get by doubling the number of cores. 但是,在实践中,实际的性能提升往往没有接近通过将内核数量增加一倍而获得的2倍的改进。 The improvement is typically between 0 and 0.3x, and sometimes it even causes slowdowns. 改善通常在0到0.3x之间,有时甚至会导致减速。

4 threads is essentially the useful thread upper bound for the computer that you are using. 4个线程本质上是您正在使用的计算机的有用线程上限。 A computer with 8 physical cores might get the speedup that you expect. 具有8个物理内核的计算机可能会获得您期望的加速。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM