在 lambda function 中传递引用，多线程 c++

Question

下面的代码应该用不同数量的线程测试 sin 和 cos function 的运行时间。 我正在为一个运行时非常相关的项目写这篇文章，这是一项可行性研究，多线程是否会充分减少运行时。

这个想法是将不同的 SAMPLE_SIZE 和 NUM_THREADS 传递给它，看看它如何影响运行时。

问题： output 不是我预期的那样。

void 函数 cos_sin_multiplication 中的 ID 始终递增 1。 所以我得到（ID：1 ... ID：NUM_THREADS + 1）而不是（ID：0 ... ID：NUM_THREADS）。
当我使用 2/3/4 线程运行代码时，出现分段错误。
当我运行 7 个或更多线程时，几个 ID 更改为 NUM_THREADS。
cos_out[0]的output一直为0

这里是 NUM_THREADS = 8 和 SAMPLE_SIZE = 100'000 的示例 output。

Initiate Thread: 0 with 12500 datapoints.
Initiate Thread: 1 with 12500 datapoints.
Initiate Thread: 2 with 12500 datapoints.
Initiate Thread: 3 with 12500 datapoints.
Initiate Thread: 4 with 12500 datapoints.
Initiate Thread: 5 with 12500 datapoints.
Initiate Thread: 6 with 12500 datapoints.
Initiate Thread: 7 with 12500 datapoints.
ID: 4: sin: 0.861292 cos: -1.72477
ID: 8: sin: -56.1798 cos: 55.4332
ID: 8: sin: -68.1969 cos: 51.9351
ID: 3: sin: 0.861292 cos: -1.72477
ID: 2: sin: 0.861292 cos: -1.72477
ID: 1: sin: 0.861292 cos: -1.72477
ID: 8: sin: -61.1793 cos: 58.8878
ID: 8: sin: -64.8086 cos: 59.5946
The execution took: 0.004465 seconds. 
ID: 0: sin: 59.5946 cos: 0
ID: 1: sin: 0.861292 cos: -1.72477
ID: 2: sin: 0.861292 cos: -1.72477
ID: 3: sin: 0.861292 cos: -1.72477
ID: 4: sin: 0.861292 cos: -1.72477
ID: 5: sin: 0 cos: 0
ID: 6: sin: 0 cos: 0
ID: 7: sin: 0 cos: 0

谁能指出我正确的方向？

//Multithreaded Cosnius and Sinus Calculations Benchmark
// Calculate a sample of Cosinus and Sinus with different numbers of Threads
// to determine the runtime gain for different number of threads 

#include <math.h>

#include <iostream>
#include <fstream>
#include <thread>
#include <mutex>
#include <chrono>
#include <vector>

#define NUM_THREADS 3
#define SAMPLE_SIZE 2000000
#define PI 3.1415

float diff_time;

std::ofstream calc_speed;
std::mutex out_guard;

void cos_sin_multiplication(int id, int sample, float theta, float& value, float& sin_out, float& cos_out){
    for (int j = 0; j < sample; j++){
        sin_out += sin(PI*theta);
        cos_out += cos(PI*theta);
        theta += 0.1;
    }
    out_guard.lock();
    std::cout << "ID: " << id << ": sin: " << sin_out << " cos: " << cos_out << "\n";
    out_guard.unlock();
}

int main(){
    auto start_time = std::chrono::system_clock::now();

    std::vector<std::thread> Threads;

    int64_t sample_per_thread;
    int mod_sample_per_thread = SAMPLE_SIZE%NUM_THREADS;
    float value[SAMPLE_SIZE];

    float theta = 0.0;
    float cos_out[NUM_THREADS];
    float sin_out[NUM_THREADS];
    

    for(int i = 0; i <  NUM_THREADS; i++){
        cos_out[i] = 0.0;
        sin_out[i] = 0.0;
    }

    for(int i = 0; i < NUM_THREADS; i++){   
        
        if (i < mod_sample_per_thread){
            sample_per_thread = SAMPLE_SIZE/NUM_THREADS + 1;
        }
        else{
            sample_per_thread = SAMPLE_SIZE/NUM_THREADS;
        }
        out_guard.lock();
        std::cout << "Initiate Thread: " << i <<" with "<< sample_per_thread << " datapoints." << "\n";
        out_guard.unlock();

        Threads.emplace_back([&](){cos_sin_multiplication(i, sample_per_thread, theta, value[0], sin_out[i], cos_out[i]);});
    }

    for(auto& t: Threads){
        t.join();
    }

    auto end_time = std::chrono::system_clock::now();
    std::chrono::duration<double> diff_time = end_time - start_time;
    out_guard.lock();
    std::cout << "The execution took: " << diff_time.count() << " seconds. \n";
    out_guard.unlock();

    for(int i = 0; i < NUM_THREADS; i++){
        out_guard.lock();
        std::cout << "ID: " << i << ": sin: " << sin_out[i] << " cos: " << cos_out[i] << "\n";
        out_guard.unlock();
    }
    return 0;
}

解决方法：把[&]换成[&, i=i, sample_per_thread=sample_per_thread] st只有需要引用传递的东西才引用传递。

Answer 1

Threads.emplace_back([&](){cos_sin_multiplication(i, sample_per_thread, theta, value[0], sin_out[i], cos_out[i]);});
    }

C++ 无法保证执行线程实际开始执行此闭包的确切时间。 您唯一可以依赖的是，这将在构建新的std::thread object 之后的某个时刻发生（作为 emplace 的一部分）。 为了使其正常工作，这远不是必须发生的事情。 一切正常的唯一情况是执行线程开始执行闭包，并在父执行线程迭代for循环之前立即评估 function 调用的所有参数。 机会不是很大。

因此，除了所有其他错误之外， sample_per_thread也将是为它计算的最后一个值。

完全有可能你所有的执行线程最终都会执行这个闭包，并评估所有参数，这些参数是通过引用捕获的，在for循环完成后，并且i已经被销毁，使一切都成为未定义的行为。

即使某些执行线程设法早一点唤醒并闻到咖啡的味道，您仍然无法保证sample_per_thread会是在构造std::thread object 之前为其计算的值。 实际上，这几乎可以保证至少一些执行线程在为下一个执行线程的表面消耗计算之后将获得sample_per_thread的引用捕获值。

换句话说，这里没有任何东西可以正常工作，因为一切都是通过引用捕获的。

在 lambda function 中传递引用，多线程 c++

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-04-27 11:11:23

在 lambda function 中传递引用，多线程 c++

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-04-27 11:11:23

解决方案1
2 已采纳 2022-04-27 11:11:23