如何將遞歸函數的線程與子線程同步

Question

我是C ++和線程技術的新手，幾天來就一直陷在這個問題中。它應該形成fft（快速傅立葉變換）的基本代碼-只是基本代碼，因此仍然缺少一些東西，例如旋轉項，輸入是雙數（尚未復數）。

我想用C ++對函數f_thread進行並行編程...這是一個有效的“可編譯”代碼

#include<iostream>
#include<thread>
#include <vector>
#include <mutex>

void get_odd_elements(std::vector<double> inpt, std::vector<double> &out) {
    for (int i = 0; i < inpt.size()-1; i = i + 2) {out[i/2] = inpt[i];}
}

void get_even_elements(std::vector<double> inpt, std::vector<double> &out) {
    for (int i = 1; i < inpt.size(); i = i + 2) {out[i/2] = inpt[i];}
}

void attach(std::vector<double> a, std::vector<double> b, std::vector<double> &out) {
    for (int i = 0; i < a.size(); i++) {out[i] = a[i];}
    for (int i = a.size(); i < a.size()+b.size(); i++) {out[i] = b[i];}
}

void add_vectors(std::vector<double> &x, std::vector<double> &y, std::vector<double> &z) {for (int i = 0; i < x.size(); i++) {z[i] = x[i] + y[i];}}

void sub_vectors(std::vector<double> &x, std::vector<double> &y, std::vector<double> &z) {for (int i = 0; i < x.size(); i++) {z[i] = y[i] - x[i];}}

//the f_thread function

void f_thread(std::vector<double> in, std::vector<double> &out) {

    if (in.size() == 1) {out = in;}
    else {

        std::vector<double> f0(in.size()/2);
        std::vector<double> f1(in.size()/2);

        get_odd_elements(in,std::ref(f0)); //get_odd_elements is a function that gets all odd-indexed elements of f
        get_even_elements(in,std::ref(f1)); //get_even_elements is a function that gets all even-indexed elements of in

        std::vector<double> a(f0.size());
        std::vector<double> b(f1.size());

        std::mutex mtx1; std::mutex mtx2;

        std::thread t0(f_thread,std::ref(f0),std::ref(a)); //create thread for f_thread on a
        std::thread t1(f_thread,std::ref(f1),std::ref(b)); //create thread for f_thread on b

        t0.join(); t1.join(); // join 2 threads

        std::vector<double> a_out(f0.size());
        std::vector<double> b_out(f1.size());

        add_vectors(std::ref(a),std::ref(b),std::ref(a_out)); //call add_vectors function : a + b
        sub_vectors(std::ref(a),std::ref(b),std::ref(b_out)); //call sub_vectors function : b - a

        std::vector<double> f_out(in.size());
        attach(a_out,b_out,std::ref(f_out)); //attach is a function that appends b to the end of a
        out = f_out; 
    }
}

int main() {
    int n_elements = 16;
    std::vector<double> sample_input(n_elements);
    for (int i = 0; i < n_elements; i++) {sample_input[i] = i;}
    std::vector<double> output(n_elements);
    std::thread start(f_thread,std::ref(sample_input),std::ref(output));
    start.join();
    for (int i = 0; i < n_elements; i++) {std::cout << "output element "; std::cout << i; std::cout << ": "; std::cout << output[i]; std::cout<< "\n";}
    }

因此，將f_thread初始化為線程，然后創建2個子線程以遞歸方式調用f_thread 。 我嘗試了使用互斥鎖的幾種技巧，但是似乎沒有用，因為兩個子線程之間的同步不理想（這是競爭條件的熱點）。 這是我嘗試的一個代碼，但是沒有用。 我也嘗試使用全局遞歸互斥體，但仍無改善。

#include<iostream>
#include<thread>
#include <vector>
#include <mutex>

void get_odd_elements(std::vector<double> inpt, std::vector<double> &out) {
    for (int i = 0; i < inpt.size()-1; i = i + 2) {out[i/2] = inpt[i];}
}

void get_even_elements(std::vector<double> inpt, std::vector<double> &out) {
    for (int i = 1; i < inpt.size(); i = i + 2) {out[i/2] = inpt[i];}
}

void attach(std::vector<double> a, std::vector<double> b, std::vector<double> &out) {
    for (int i = 0; i < a.size(); i++) {out[i] = a[i];}
    for (int i = a.size(); i < a.size()+b.size(); i++) {out[i] = b[i];}
}

void add_vectors(std::vector<double> &x, std::vector<double> &y, std::vector<double> &z) {for (int i = 0; i < x.size(); i++) {z[i] = x[i] + y[i];}}

void sub_vectors(std::vector<double> &x, std::vector<double> &y, std::vector<double> &z) {for (int i = 0; i < x.size(); i++) {z[i] = y[i] - x[i];}}

//the f_thread function

void f_thread(std::vector<double> in, std::vector<double> &out) {

    if (in.size() == 1) {out = in;}
    else {

        std::vector<double> f0(in.size()/2);
        std::vector<double> f1(in.size()/2);

        get_odd_elements(in,std::ref(f0)); //get_odd_elements is a function that gets all odd-indexed elements of f
        get_even_elements(in,std::ref(f1)); //get_even_elements is a function that gets all even-indexed elements of in

        std::vector<double> a(f0.size());
        std::vector<double> b(f1.size());

        std::mutex mtx1; std::mutex mtx2;

        mtx1.lock(); std::thread t0(f_thread,std::ref(f0),std::ref(a)); mtx1.unlock(); //create thread for f_thread on a
        mtx2.lock(); std::thread t1(f_thread,std::ref(f1),std::ref(b)); mtx2.unlock(); //create thread for f_thread on b

        t0.join(); t1.join(); // join 2 threads

        std::vector<double> a_out(f0.size());
        std::vector<double> b_out(f1.size());

        add_vectors(std::ref(a),std::ref(b),std::ref(a_out)); //call add_vectors function : a + b
        sub_vectors(std::ref(a),std::ref(b),std::ref(b_out)); //call sub_vectors function : b - a

        std::vector<double> f_out(in.size());
        attach(a_out,b_out,std::ref(f_out)); //attach is a function that appends b to the end of a
        out = f_out; 
    }
}

int main() {
    int n_elements = 16;
    std::vector<double> sample_input(n_elements);
    for (int i = 0; i < n_elements; i++) {sample_input[i] = i;}
    std::vector<double> output(n_elements);
    std::thread start(f_thread,std::ref(sample_input),std::ref(output));
    start.join();
    for (int i = 0; i < n_elements; i++) {std::cout << "output element "; std::cout << i; std::cout << ": "; std::cout << output[i]; std::cout<< "\n";}
    }

我必須驗證此代碼是否可以在Linux（ubuntu 18.04）操作系統中使用帶有標准C ++庫的g ++ f_thread.cpp -pthread進行編譯

該代碼現在可以運行（不再有“異常終止的核心轉儲錯誤”），但是線程版本的輸出在每次運行時都會更改（表明同步工作不正常）。

作為參考，下面是不使用子線程且運行良好的順序版本代碼（即，每次運行時輸出均無變化）

// WORKING sequential version

#include<iostream>
#include<thread>
#include <vector>
#include <mutex>

void get_odd_elements(std::vector<double> inpt, std::vector<double> &out) {
    for (int i = 0; i < inpt.size()-1; i = i + 2) {out[i/2] = inpt[i];}
}

void get_even_elements(std::vector<double> inpt, std::vector<double> &out) {
    for (int i = 1; i < inpt.size(); i = i + 2) {out[i/2] = inpt[i];}
}

void attach(std::vector<double> a, std::vector<double> b, std::vector<double> &out) {
    for (int i = 0; i < a.size(); i++) {out[i] = a[i];}
    for (int i = a.size(); i < a.size()+b.size(); i++) {out[i] = b[i];}
}

void add_vectors(std::vector<double> &x, std::vector<double> &y, std::vector<double> &z) {for (int i = 0; i < x.size(); i++) {z[i] = x[i] + y[i];}}

void sub_vectors(std::vector<double> &x, std::vector<double> &y, std::vector<double> &z) {for (int i = 0; i < x.size(); i++) {z[i] = y[i] - x[i];}}

//the f_thread function

void f_thread(std::vector<double> in, std::vector<double> &out) {

    if (in.size() == 1) {out = in;}
    else {

        std::vector<double> f0(in.size()/2);
        std::vector<double> f1(in.size()/2);

        get_odd_elements(in,std::ref(f0)); //get_odd_elements is a function that gets all odd-indexed elements of f
        get_even_elements(in,std::ref(f1)); //get_even_elements is a function that gets all even-indexed elements of in

        std::vector<double> a(f0.size());
        std::vector<double> b(f1.size());

        f_thread(std::ref(f0),std::ref(a)); // no thread, just call recursion 

        f_thread(std::ref(f1),std::ref(b)); // no thread, just call recursion 

        std::vector<double> a_out(f0.size());
        std::vector<double> b_out(f1.size());

        add_vectors(std::ref(a),std::ref(b),std::ref(a_out)); //call add_vectors function : a + b
        sub_vectors(std::ref(a),std::ref(b),std::ref(b_out)); //call sub_vectors function : b - a

        std::vector<double> f_out(in.size());
        attach(a_out,b_out,std::ref(f_out)); //attach is a function that appends b to the end of a
        out = f_out; 
    }
}

int main() {
    int n_elements = 16;
    std::vector<double> sample_input(n_elements);
    for (int i = 0; i < n_elements; i++) {sample_input[i] = i;}
    std::vector<double> output(n_elements);
    std::thread start(f_thread,std::ref(sample_input),std::ref(output));
    start.join();
    for (int i = 0; i < n_elements; i++) {std::cout << "output element "; std::cout << i; std::cout << ": "; std::cout << output[i]; std::cout<< "\n";}
    }

每次運行代碼時，結果都應該固定在該輸出上。

output element 0: 120
output element 1: 0
output element 2: 0
output element 3: 7.31217e-322
output element 4: 0
output element 5: 6.46188e-319
output element 6: 56
output element 7: 0
output element 8: 0
output element 9: 4.19956e-322
output element 10: 120
output element 11: 0
output element 12: 0
output element 13: 7.31217e-322
output element 14: 0
output element 15: 6.46188e-319

Answer 1

這不是線程錯誤，而是對函數attach數組元素的越界訪問：

void attach(std::vector<double> a, std::vector<double> b, std::vector<double> &out) {
    for (int i = 0; i < a.size(); i++) {out[i] = a[i];}
    for (int i = a.size(); i < a.size()+b.size(); i++) {out[i] = b[i];}
}

在第二個循環中，索引從a.size()開始，而不是從0開始-但是您可以使用它來訪問b元素，就像它從0開始一樣。

除了編寫循環，還可以使用<algorithm> std::copy ：

void attach(std::vector<double> a, std::vector<double> b, std::vector<double> &out) {
    std::copy(a.begin(), a.end(), out.begin());
    std::copy(b.begin(), b.end(), out.begin()+a.size());
}

在那之后，對於遞歸線程，您只需要這樣做：

std::thread t0(f_thread,std::ref(f0),std::ref(a)); //create thread for f_thread on a
std::thread t1(f_thread,std::ref(f1),std::ref(b)); //create thread for f_thread on b
t0.join(); t1.join(); // join 2 threads

因為每個線程都使用單獨的輸入和輸出數組（您在“父”線程的堆棧上創建），所以沒有種族。 結果是確定性的，對於順序版本和線程版本，結果相同：

output element 0: 120
output element 1: 64
output element 2: 32
output element 3: 0
output element 4: 16
output element 5: 0
output element 6: 0
output element 7: 0
output element 8: 8
output element 9: 0
output element 10: 0
output element 11: 0
output element 12: 0
output element 13: 0
output element 14: 0
output element 15: 0

順便說一句，您可能甚至猜測您的序列號也是不正確的，因為輸入數據都是整數，而您只復制，添加和減去這些數字即可。 因此沒有理由讓諸如7.31217e-322類的7.31217e-322出現在輸出中。

還請注意戴維斯·赫林（Davis Herring）的評論：在向量之間大量復制數據。 至少，我將通過const引用而不是通過值將向量傳遞給函數（除非已知消除了這些副本）。

最后，您應該比輸入數組的大小為1的時候更早地停止創建新線程。對於實際的問題大小，您可能無法創建數千個線程。 即使成功了，創建和運行那么多線程的開銷也會使您的代碼運行非常緩慢。 理想情況下，您創建的線程不應超過運行代碼的計算機上的硬件核心。

Answer 2

您應該通過詢問有多少cpus來處理此問題，然后拆分工作並使用隊列將其重新結合在一起。

我不知道FFT算法，但是通過粗略地查看代碼，看起來您基本上是使用越來越細的齒梳將數據分割開來的。 除了從最好的層次開始並逐步提高，這不是拆分事物的好方法。

您不希望其他CPU處理其他所有值，因為即使在單芯片多核CPU上，也存在多個L1緩存。 每個L1緩存最多與另一個內核共享。 因此，您希望單個CPU處理的所有值都彼此接近，以最大程度地增加您要查找的值在緩存中的機會。

因此，您應該從最大的連續塊開始分割。 由於FFT算法基於2的冪進行工作，因此您應計算擁有的內核數。 使用thread::hardware_concurrency()進行計數。 然后取整到下一個最高的2的冪，然后將問題分解為該數量的子FFT。 然后將其結果合並到主線程中。

我有一個程序，我寫了那種你想要的東西。 它將列表分成許多塊以對進行排序。 然后它有一個需要完成的合並隊列。 每個塊都由一個單獨的線程處理，並且每個合並也都派生到它自己的線程中。

由於不喜歡現代CPU的功能，我將內核數一分為二。 我本可以忽略這一點，但它會運行的很好，盡管由於主要爭執已經超過了整數ALU，所以它可能會慢一點。 （超線程在單個內核中共享資源。）

從另一個答案看來，您的FFT代碼有一些錯誤。 我建議將其僅與一個線程一起使用，然后弄清楚如何將其拆分。

如何將遞歸函數的線程與子線程同步

問題描述

2 個解決方案

解決方案1
2 已采納 2019-02-04 18:46:17

解決方案2
1 2019-02-05 02:06:59

如何將遞歸函數的線程與子線程同步

問題描述

2 個解決方案

解決方案1 2 已采納 2019-02-04 18:46:17

解決方案2 1 2019-02-05 02:06:59

解決方案1
2 已采納 2019-02-04 18:46:17

解決方案2
1 2019-02-05 02:06:59