简体   繁体   English

使用 std::thread 时与 g++/OpenMP 相关的错误?

[英]Bug related to g++/OpenMP when using std::thread?

I've distilled the problem I have to its bare essentials.我已经将我遇到的问题提炼为它的基本要素。 Here is the first example piece of code:这是第一个示例代码:

#include <vector>
#include <math.h>
#include <thread>

std::vector<double> vec(10000);

void run(void) 
{
    for(int l = 0; l < 500000; l++) {

    #pragma omp parallel for
        for(int idx = 0; idx < vec.size(); idx++) {

            vec[idx] += cos(idx);
        }
    }
}

int main(void)
{
    #pragma omp parallel
    {
    }

    std::thread threaded_call(&run);
    threaded_call.join();

    return 0;
}

Compile this as (on Ubuntu 20.04): g++ -fopenmp main.cpp -o main将其编译为(在 Ubuntu 20.04 上): g++ -fopenmp main.cpp -o main

EDIT: Version: g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0编辑:版本:g++(Ubuntu 9.3.0-17ubuntu1~20.04)9.3.0

Running on a Ryzen 3700x (8 cores, 16 threads) : run time ~ 43s , all 16 logical cores reported in System Monitor at ~ 80% .在 Ryzen 3700x(8 核,16 线程)上运行:运行时间 ~ 43秒,系统监视器中报告的所有 16 个逻辑核心都在 ~ 80%

Next take out the #pragma omp parallel directive, so the main function becomes:接下来取出#pragma omp parallel指令,那么main函数就变成了:

int main(void)
{
    std::thread threaded_call(&run);
    threaded_call.join();

    return 0;
}

Now run time ~ 9s , all 16 logical cores reported in System Monitor at 100% .现在运行时间 ~ 9 秒,系统监视器中报告的所有 16 个逻辑内核都为100%

I've also compiled this using MSVC on Windows 10, cpu utilization is always ~100% irrespective of the #pragma omp parallel directive being there or not.我还在 Windows 10 上使用 MSVC 编译了这个,无论#pragma omp 并行指令是否存在,cpu 利用率始终为 ~100%。 Yes I am fully aware this line should do absolutely nothing, yet with g++ it causes the above behaviour;是的,我完全知道这条线应该什么都不做,但是使用 g++ 它会导致上述行为; also it only happens if calling the run function on a thread, not directly.也只有在线程上调用 run 函数时才会发生,而不是直接调用。 I experimented with various compilation flags (-O levels) but problem remains.我尝试了各种编译标志(-O 级别),但问题仍然存在。 I suppose looking at the assembly code is the next step, but I can't see how this is anything but a bug in g++.我想下一步是查看汇编代码,但我看不出这只是 g++ 中的一个错误。 Can anyone shed some light on this please?任何人都可以对此有所了解吗? Would be much appreciated.将不胜感激。

Furthermore, calling omp_set_num_threads(1);此外,调用 omp_set_num_threads(1); in the "void run(void)" function just before the loop, in order to check how long a single thread takes, gives ~ 70s run time with only one thread at 100% (as expected).在循环之前的“void run(void)”函数中,为了检查单个线程需要多长时间,只有一个线程 100%(如预期)提供了大约70秒的运行时间。

Further, possibly related problem (although this might be lack of understanding on my part): Calling omp_set_num_threads(1);此外,可能相关的问题(虽然这可能是我缺乏理解):调用 omp_set_num_threads(1); in the "int main(void)" function (before threaded_call is defined) does nothing when compiling with g++, ie all 16 threads still execute in the for loop, irrespective of the bogus #pragma omp parallel directive.在“int main(void)”函数中(在定义 threaded_call 之前)用 g++ 编译时什么都不做,即所有 16 个线程仍然在 for 循环中执行,而不管虚假的 #pragma omp 并行指令。 When compiling with MSVC this causes only one thread to run as expected - according to the documentation for omp_set_num_threads I though this should be the correct behaviour, but not so with g++.使用 MSVC 编译时,这只会导致一个线程按预期运行 - 根据 omp_set_num_threads 的文档,我认为这应该是正确的行为,但对于 g++ 则不然。 Why not, is this a further bug?为什么不呢,这是另一个错误吗?

EDIT: this last problem I understand now ( Overriding OMP_NUM_THREADS from code - for real ), but still leaves the original problem outstanding.编辑:我现在理解的最后一个问题( 从代码中覆盖 OMP_NUM_THREADS - 真正的),但仍然使原始问题悬而未决。

Thank you to Hristo Iliev for useful comments, I now understand this and would like to answer my own question in case it's of use to anyone having similar issues.感谢 Hristo Iliev 的有用评论,我现在明白了这一点,并想回答我自己的问题,以防万一它对任何有类似问题的人有用。

The problem is if any OpenMP code is executed in the main program thread, its state becomes "polluted" - specifically after the "#pragma omp parallel" directive, OpenMP threads remain in a busy state (all 16) and this affects the performance of all other OpenMP code in any std::thread threads, which spawn their own team of OpenMP threads.问题是,如果在主程序线程中执行任何 OpenMP 代码,其状态将变为“污染”——特别是在“#pragma omp parallel”指令之后,OpenMP 线程保持忙碌状态(全部 16 个),这会影响性能任何 std::thread 线程中的所有其他 OpenMP 代码,它们产生自己的 OpenMP 线程团队。 Since the main thread only goes out of scope when the program finishes, this performance issue remains for the entire program execution.由于主线程仅在程序完成时超出范围,因此整个程序执行过程中都会存在此性能问题。 Thus if using OpenMP with std::thread make sure absolutely no OpenMP code exists in the main program thread.因此,如果将 OpenMP 与 std::thread 一起使用,请确保主程序线程中绝对不存在 OpenMP 代码。

To demonstrate this consider the following modified example code:为了证明这一点,请考虑以下修改后的示例代码:

#include <vector>
#include <math.h>
#include <thread>

std::vector<double> vec(10000);

void run(void) 
{
    for(int l = 0; l < 500000; l++) {

    #pragma omp parallel for
        for(int idx = 0; idx < vec.size(); idx++) {

            vec[idx] += cos(idx);
        }
    }
}

void state(void)
{
#pragma omp parallel
    {
    }

    std::this_thread::sleep_for(std::chrono::milliseconds(5000));
}

int main(void)
{
    std::thread state_thread(&state);
    state_thread.detach();

    std::thread threaded_call(&run);
    threaded_call.join();

    return 0;
}

This code runs at 80% CPU utilization for the first 5 seconds, then runs at 100% CPU utilization for the duration of the program.此代码在前 5 秒内以 80% 的 CPU 使用率运行,然后在程序运行期间以 100% 的 CPU 使用率运行。 This is because in the first std::thread a team of 16 OpenMP threads is spawned and remain in a busy state, thus affecting the performance of the OpenMP code in the second std::thread.这是因为在第一个 std::thread 中产生了一个由 16 个 OpenMP 线程组成​​的团队并保持忙碌状态,从而影响了第二个 std::thread 中 OpenMP 代码的性能。 As soon as the first std::thread terminates the performance of the second std::thread is not affected anymore since the second team of 16 OpenMP threads now doesn't have to compete for CPU access with the first.一旦第一个 std::thread 终止,第二个 std::thread 的性能就不再受到影响,因为第二个 16 个 OpenMP 线程团队现在不必与第一个竞争 CPU 访问。 When the offending code was in the main thread the issue persisted until the end of the program.当有问题的代码在主线程中时,问题一直持续到程序结束。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM