简体   繁体   English

基于线程数量与执行时间的关系的OpenMP并行编程

[英]OpenMP parallel programming based on number of threads relation with execution time

I have the following code in Visual Studio 2010: 我在Visual Studio 2010中有以下代码:

#pragma omp parallel for num_threads(1)
for(int y=0;y<col;y++)
bands[parametersnumberPredictionBands+1][x][y] = hyperspectral[x][y][z];

The time taken to execute the code when num_threads(1) is less than num_threads(3). num_threads(1)小于num_threads(3)时执行代码所需的时间。

To my understanding when more threads are use in parallel the time taken should be reduced. 据我了解,当并行使用更多线程时,应该减少花费的时间。 Can anyone explain why? 谁能解释为什么?

I am a beginner so any help is very appreciated. 我是一个初学者,因此非常感谢您的帮助。

Creating and destroying threads takes some time. 创建和销毁线程需要一些时间。 When the work done by each thread is trivial, the amount of time it takes to create and destroy threads outweighs the time it takes to do the work itself. 当每个线程完成的工作都很琐碎时,创建和销毁线程所花费的时间就超过了完成工作本身所花费的时间。

Instead try runnign something like 而是尝试运行类似

const int N = 100000;
int A[N];
#pragma omp parallel for
for (int i = 0; i < N; i++) {
    A[i] = i;
}

Here the time taken will be significantly lower with 3 threads as the amount of work done outweighs the work to create and destroy each thread. 在这里,使用3个线程花费的时间将大大减少,因为完成的工作量超过了创建和销毁每个线程的工作量。

Maybe to see this more clearly another example will help. 也许更清楚地看到这一点,另一个例子会有所帮助。 Say we want to parallelize a loop that iterates 10 times, each iteration takes 1 second, and each additional thread created takes 0.5 seconds. 假设我们要并行化一个迭代10次的循环,每次迭代需要1秒,创建的每个其他线程都需要0.5秒。

A single thread will finish the task in 10 seconds. 一个线程将在10秒内完成任务。 If we create another thread the working time can be split in half, resulting in only 5 seconds of work. 如果我们创建另一个线程,则可以将工作时间减半,从而只需5秒钟的工作。 But it takes 0.5 seconds to create the extra thread so the total run time is 5.5 seconds. 但是创建额外的线程需要0.5秒,因此总运行时间为5.5秒。 It is still faster than originally, but doubling the number of threads did not half our run time . 它仍然比原始速度快,但是线程数量加倍并没有使我们的运行时间减半

Lets say we want to run our loop really fast and we create 9 additional threads for a total of 10 threads. 可以说,我们想真正快速地运行循环,并创建9个额外的线程,总共10个线程。 The working time can now be split among all 10 threads, resulting in 1 second of work. 现在可以将工作时间分配到所有10个线程中,从而导致工作时间为1秒。 However, creating 9 threads costs 4.5 seconds. 但是,创建9个线程需要4.5秒。 Our total run time is now 5.5 seconds which is the same as running with only 2 threads! 现在,我们的总运行时间为5.5秒,这与仅2个线程运行相同! The overhead of creating threads outweighs the amount of work to be done. 创建线程的开销超过了要完成的工作量。 Continuing to add threads at this point will only slow our program down. 在这一点上继续添加线程只会减慢我们的程序速度。

Essentially there is a point for all programs with a fixed amount of work where more threads will not speed up the run time. 本质上,对于所有具有固定工作量的程序而言,更多线程将不会加速运行时间。 Large workloads outweigh the cost of thread creating for a larger number of threads. 大量的工作量超过了为大量线程创建线程的成本。 Small workloads are almost immediately dwarfed by the thread creation overhead. 较小的工作负载几乎立即与线程创建开销相形见.。

TL;DR TL; DR

Parallelism is best suited for large problems. 并行处理最适合大问题。 Parallelizing trivial problems will only make them slower. 并行化琐碎的问题只会使它们变慢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM