在 lambda function 中傳遞引用，多線程 c++

Question

下面的代碼應該用不同數量的線程測試 sin 和 cos function 的運行時間。 我正在為一個運行時非常相關的項目寫這篇文章，這是一項可行性研究，多線程是否會充分減少運行時。

這個想法是將不同的 SAMPLE_SIZE 和 NUM_THREADS 傳遞給它，看看它如何影響運行時。

問題： output 不是我預期的那樣。

void 函數 cos_sin_multiplication 中的 ID 始終遞增 1。 所以我得到（ID：1 ... ID：NUM_THREADS + 1）而不是（ID：0 ... ID：NUM_THREADS）。
當我使用 2/3/4 線程運行代碼時，出現分段錯誤。
當我運行 7 個或更多線程時，幾個 ID 更改為 NUM_THREADS。
cos_out[0]的output一直為0

這里是 NUM_THREADS = 8 和 SAMPLE_SIZE = 100'000 的示例 output。

Initiate Thread: 0 with 12500 datapoints.
Initiate Thread: 1 with 12500 datapoints.
Initiate Thread: 2 with 12500 datapoints.
Initiate Thread: 3 with 12500 datapoints.
Initiate Thread: 4 with 12500 datapoints.
Initiate Thread: 5 with 12500 datapoints.
Initiate Thread: 6 with 12500 datapoints.
Initiate Thread: 7 with 12500 datapoints.
ID: 4: sin: 0.861292 cos: -1.72477
ID: 8: sin: -56.1798 cos: 55.4332
ID: 8: sin: -68.1969 cos: 51.9351
ID: 3: sin: 0.861292 cos: -1.72477
ID: 2: sin: 0.861292 cos: -1.72477
ID: 1: sin: 0.861292 cos: -1.72477
ID: 8: sin: -61.1793 cos: 58.8878
ID: 8: sin: -64.8086 cos: 59.5946
The execution took: 0.004465 seconds. 
ID: 0: sin: 59.5946 cos: 0
ID: 1: sin: 0.861292 cos: -1.72477
ID: 2: sin: 0.861292 cos: -1.72477
ID: 3: sin: 0.861292 cos: -1.72477
ID: 4: sin: 0.861292 cos: -1.72477
ID: 5: sin: 0 cos: 0
ID: 6: sin: 0 cos: 0
ID: 7: sin: 0 cos: 0

誰能指出我正確的方向？

//Multithreaded Cosnius and Sinus Calculations Benchmark
// Calculate a sample of Cosinus and Sinus with different numbers of Threads
// to determine the runtime gain for different number of threads 

#include <math.h>

#include <iostream>
#include <fstream>
#include <thread>
#include <mutex>
#include <chrono>
#include <vector>

#define NUM_THREADS 3
#define SAMPLE_SIZE 2000000
#define PI 3.1415

float diff_time;

std::ofstream calc_speed;
std::mutex out_guard;

void cos_sin_multiplication(int id, int sample, float theta, float& value, float& sin_out, float& cos_out){
    for (int j = 0; j < sample; j++){
        sin_out += sin(PI*theta);
        cos_out += cos(PI*theta);
        theta += 0.1;
    }
    out_guard.lock();
    std::cout << "ID: " << id << ": sin: " << sin_out << " cos: " << cos_out << "\n";
    out_guard.unlock();
}

int main(){
    auto start_time = std::chrono::system_clock::now();

    std::vector<std::thread> Threads;

    int64_t sample_per_thread;
    int mod_sample_per_thread = SAMPLE_SIZE%NUM_THREADS;
    float value[SAMPLE_SIZE];

    float theta = 0.0;
    float cos_out[NUM_THREADS];
    float sin_out[NUM_THREADS];
    

    for(int i = 0; i <  NUM_THREADS; i++){
        cos_out[i] = 0.0;
        sin_out[i] = 0.0;
    }

    for(int i = 0; i < NUM_THREADS; i++){   
        
        if (i < mod_sample_per_thread){
            sample_per_thread = SAMPLE_SIZE/NUM_THREADS + 1;
        }
        else{
            sample_per_thread = SAMPLE_SIZE/NUM_THREADS;
        }
        out_guard.lock();
        std::cout << "Initiate Thread: " << i <<" with "<< sample_per_thread << " datapoints." << "\n";
        out_guard.unlock();

        Threads.emplace_back([&](){cos_sin_multiplication(i, sample_per_thread, theta, value[0], sin_out[i], cos_out[i]);});
    }

    for(auto& t: Threads){
        t.join();
    }

    auto end_time = std::chrono::system_clock::now();
    std::chrono::duration<double> diff_time = end_time - start_time;
    out_guard.lock();
    std::cout << "The execution took: " << diff_time.count() << " seconds. \n";
    out_guard.unlock();

    for(int i = 0; i < NUM_THREADS; i++){
        out_guard.lock();
        std::cout << "ID: " << i << ": sin: " << sin_out[i] << " cos: " << cos_out[i] << "\n";
        out_guard.unlock();
    }
    return 0;
}

解決方法：把[&]換成[&, i=i, sample_per_thread=sample_per_thread] st只有需要引用傳遞的東西才引用傳遞。

Answer 1

Threads.emplace_back([&](){cos_sin_multiplication(i, sample_per_thread, theta, value[0], sin_out[i], cos_out[i]);});
    }

C++ 無法保證執行線程實際開始執行此閉包的確切時間。 您唯一可以依賴的是，這將在構建新的std::thread object 之后的某個時刻發生（作為 emplace 的一部分）。 為了使其正常工作，這遠不是必須發生的事情。 一切正常的唯一情況是執行線程開始執行閉包，並在父執行線程迭代for循環之前立即評估 function 調用的所有參數。 機會不是很大。

因此，除了所有其他錯誤之外， sample_per_thread也將是為它計算的最后一個值。

完全有可能你所有的執行線程最終都會執行這個閉包，並評估所有參數，這些參數是通過引用捕獲的，在for循環完成后，並且i已經被銷毀，使一切都成為未定義的行為。

即使某些執行線程設法早一點喚醒並聞到咖啡的味道，您仍然無法保證sample_per_thread會是在構造std::thread object 之前為其計算的值。 實際上，這幾乎可以保證至少一些執行線程在為下一個執行線程的表面消耗計算之后將獲得sample_per_thread的引用捕獲值。

換句話說，這里沒有任何東西可以正常工作，因為一切都是通過引用捕獲的。

在 lambda function 中傳遞引用，多線程 c++

問題描述

1 個解決方案

解決方案1
2 已采納 2022-04-27 11:11:23

在 lambda function 中傳遞引用，多線程 c++

問題描述

1 個解決方案

解決方案1 2 已采納 2022-04-27 11:11:23

解決方案1
2 已采納 2022-04-27 11:11:23