简体   繁体   English

在 lambda function 中传递引用,多线程 c++

[英]pass-by-reference in lambda function, multithreading c++

the code below is supposed to test the runtime of sin and cos function with different numbers of threads.下面的代码应该用不同数量的线程测试 sin 和 cos function 的运行时间。 I am writing this for a project where runtime is very relevant and it is a feasibility study whether multithreading will decrease the runtime enough.我正在为一个运行时非常相关的项目写这篇文章,这是一项可行性研究,多线程是否会充分减少运行时。

The idea is to pass it a different SAMPLE_SIZE and NUM_THREADS and see how it affects runtime.这个想法是将不同的 SAMPLE_SIZE 和 NUM_THREADS 传递给它,看看它如何影响运行时。

Problem: The output is not what I expected it to be.问题: output 不是我预期的那样。

  1. The ID inside the void-function cos_sin_multiplication is always incremented by one. void 函数 cos_sin_multiplication 中的 ID 始终递增 1。 So I get (ID:1... ID:NUM_THREADS+1) instead of (ID:0... ID:NUM_THREADS).所以我得到(ID:1 ... ID:NUM_THREADS + 1)而不是(ID:0 ... ID:NUM_THREADS)。
  2. When I run the code with 2/3/4 Threads I get a Segmentation Fault.当我使用 2/3/4 线程运行代码时,出现分段错误。
  3. When I run with 7 or more threads several IDs are changed to NUM_THREADS.当我运行 7 个或更多线程时,几个 ID 更改为 NUM_THREADS。
  4. The output of cos_out[0] is always 0 cos_out[0]的output一直为0

Here an example output for NUM_THREADS = 8 and SAMPLE_SIZE = 100'000.这里是 NUM_THREADS = 8 和 SAMPLE_SIZE = 100'000 的示例 output。

Initiate Thread: 0 with 12500 datapoints.
Initiate Thread: 1 with 12500 datapoints.
Initiate Thread: 2 with 12500 datapoints.
Initiate Thread: 3 with 12500 datapoints.
Initiate Thread: 4 with 12500 datapoints.
Initiate Thread: 5 with 12500 datapoints.
Initiate Thread: 6 with 12500 datapoints.
Initiate Thread: 7 with 12500 datapoints.
ID: 4: sin: 0.861292 cos: -1.72477
ID: 8: sin: -56.1798 cos: 55.4332
ID: 8: sin: -68.1969 cos: 51.9351
ID: 3: sin: 0.861292 cos: -1.72477
ID: 2: sin: 0.861292 cos: -1.72477
ID: 1: sin: 0.861292 cos: -1.72477
ID: 8: sin: -61.1793 cos: 58.8878
ID: 8: sin: -64.8086 cos: 59.5946
The execution took: 0.004465 seconds. 
ID: 0: sin: 59.5946 cos: 0
ID: 1: sin: 0.861292 cos: -1.72477
ID: 2: sin: 0.861292 cos: -1.72477
ID: 3: sin: 0.861292 cos: -1.72477
ID: 4: sin: 0.861292 cos: -1.72477
ID: 5: sin: 0 cos: 0
ID: 6: sin: 0 cos: 0
ID: 7: sin: 0 cos: 0

Can anyone point me in the right direction?谁能指出我正确的方向?

//Multithreaded Cosnius and Sinus Calculations Benchmark
// Calculate a sample of Cosinus and Sinus with different numbers of Threads
// to determine the runtime gain for different number of threads 

#include <math.h>

#include <iostream>
#include <fstream>
#include <thread>
#include <mutex>
#include <chrono>
#include <vector>

#define NUM_THREADS 3
#define SAMPLE_SIZE 2000000
#define PI 3.1415

float diff_time;

std::ofstream calc_speed;
std::mutex out_guard;

void cos_sin_multiplication(int id, int sample, float theta, float& value, float& sin_out, float& cos_out){
    for (int j = 0; j < sample; j++){
        sin_out += sin(PI*theta);
        cos_out += cos(PI*theta);
        theta += 0.1;
    }
    out_guard.lock();
    std::cout << "ID: " << id << ": sin: " << sin_out << " cos: " << cos_out << "\n";
    out_guard.unlock();
}

int main(){
    auto start_time = std::chrono::system_clock::now();

    std::vector<std::thread> Threads;

    int64_t sample_per_thread;
    int mod_sample_per_thread = SAMPLE_SIZE%NUM_THREADS;
    float value[SAMPLE_SIZE];

    float theta = 0.0;
    float cos_out[NUM_THREADS];
    float sin_out[NUM_THREADS];
    

    for(int i = 0; i <  NUM_THREADS; i++){
        cos_out[i] = 0.0;
        sin_out[i] = 0.0;
    }

    for(int i = 0; i < NUM_THREADS; i++){   
        
        if (i < mod_sample_per_thread){
            sample_per_thread = SAMPLE_SIZE/NUM_THREADS + 1;
        }
        else{
            sample_per_thread = SAMPLE_SIZE/NUM_THREADS;
        }
        out_guard.lock();
        std::cout << "Initiate Thread: " << i <<" with "<< sample_per_thread << " datapoints." << "\n";
        out_guard.unlock();

        Threads.emplace_back([&](){cos_sin_multiplication(i, sample_per_thread, theta, value[0], sin_out[i], cos_out[i]);});
    }

    for(auto& t: Threads){
        t.join();
    }

    auto end_time = std::chrono::system_clock::now();
    std::chrono::duration<double> diff_time = end_time - start_time;
    out_guard.lock();
    std::cout << "The execution took: " << diff_time.count() << " seconds. \n";
    out_guard.unlock();

    for(int i = 0; i < NUM_THREADS; i++){
        out_guard.lock();
        std::cout << "ID: " << i << ": sin: " << sin_out[i] << " cos: " << cos_out[i] << "\n";
        out_guard.unlock();
    }
    return 0;
}

Solution: Replace [&] with [&, i=i, sample_per_thread=sample_per_thread] st only things that need to be passed by reference are passed by reference.解决方法:把[&]换成[&, i=i, sample_per_thread=sample_per_thread] st只有需要引用传递的东西才引用传递。

Threads.emplace_back([&](){cos_sin_multiplication(i, sample_per_thread, theta, value[0], sin_out[i], cos_out[i]);});
    }

C++ gives you no guarantees, whatsoever, exactly when the execution thread will actually start executing this closure. C++ 无法保证执行线程实际开始执行此闭包的确切时间。 The only thing you can rely on is that this will happen at some point after the new std::thread object gets constructed (as part of the emplace).您唯一可以依赖的是,这将在构建新的std::thread object 之后的某个时刻发生(作为 emplace 的一部分)。 Which is nowhere near what must happen in order for this to work correctly.为了使其正常工作,这远不是必须发生的事情。 The only situation where everything works correctly would be if the execution thread begins executing the closure, and evaluates all of the parameters to the function call before the parent execution thread iterates the for loop, immediately afterwards.一切正常的唯一情况是执行线程开始执行闭包,并在父执行线程迭代for循环之前立即评估 function 调用的所有参数。 The chances of that are not very good.机会不是很大。

So, in addition to everything else that goes wrong sample_per_thread will be whatever was the last value calculated for it, as well.因此,除了所有其他错误之外, sample_per_thread也将是为它计算的最后一个值。

It is entirely possible that all of your execution threads will finally end up executing this closure, and evaluating all of the parameters, which were captured by reference, after the for loop has finished, and i has been destroyed, making everything undefined behavior.完全有可能你所有的执行线程最终都会执行这个闭包,并评估所有参数,这些参数是通过引用捕获的,在for循环完成后,并且i已经被销毁,使一切都成为未定义的行为。

Even if some of the execution threads managed to wake up and smell the coffee a little bit earlier, you still have no guarantees, whatsoever, that sample_per_thread would be what was calculated for it just before its std::thread object was constructed.即使某些执行线程设法早一点唤醒并闻到咖啡的味道,您仍然无法保证sample_per_thread会是在构造std::thread object 之前为其计算的值。 This is, actually, pretty much a guarantee that at least some of the execution threads will obtain the captured-by-reference value of sample_per_thread after it was already calculated for the next execution thread's ostensible consumption.实际上,这几乎可以保证至少一些执行线程在为下一个执行线程的表面消耗计算之后将获得sample_per_thread的引用捕获值。

In other words, nothing here works correctly because everything gets captured by reference.换句话说,这里没有任何东西可以正常工作,因为一切都是通过引用捕获的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM