為什么 for 循環中的異步不能提高執行時間？

Question

我試圖了解並發性，因此我嘗試從 A Tour of C++（第二版）15.7.3，第 205 頁中編寫 Stroustrup 示例代碼（comp4（））的更靈活版本（my_comp（））。它給出了正確的答案，但它沒有使用並發來提高執行時間。 我的問題是：為什么 my_comp() 沒有按預期運行，我該如何解決？

#include <iostream>
#include <chrono>
#include <cmath>
#include <vector>
#include <numeric>
#include <future>
#include <fstream>

using namespace std;
using namespace std::chrono;

constexpr auto sz = 500'000'000;
constexpr int conc_num{ 4 };

double accum(double* beg, double* end, double init)
{
    return accumulate(beg, end, init);
}

double comp4(vector<double>& v)
//From Stroustrup, A Tour of C++ (Second edition)
//15.7.3 page 205
{
    auto v0 = &v[0];
    auto sz = v.size();

    auto f0 = async(accum, v0, v0 + sz / 4, 0.0);
    auto f1 = async(accum, v0 + sz / 4, v0 + sz / 2, 0.0);
    auto f2 = async(accum, v0 + sz / 2, v0 + sz * 3 / 4, 0.0);
    auto f3 = async(accum, v0 + sz * 3 / 4, v0 + sz, 0.0);

    return f0.get() + f1.get() + f2.get() + f3.get();
}

double my_comp(vector<double>& v, int conc = 1)
//My idea of a more flexible version of comp4
{
    if (conc < 1)
        conc = 1;
    auto v0 = &v[0];
    auto sz = v.size();

    vector<future<double>> fv(conc);
    for (int i = 0; i != conc; ++i) {
        auto f = async(accum, v0 + sz * (i / conc), v0 + sz * ((i + 1) / (conc)), 0.0);
        fv[i] = move(f);
    }
    double ret{ 0.0 };
    for (int i = 0; i != fv.size(); ++i) {
        ret += fv[i].get();
    }
    return ret;
}

int main()
{
    cout << "Calculating ..." << "\n\n";
    auto tv0 = high_resolution_clock::now();
    vector<double> vc;
    vc.reserve(sz);
    for (int i = 0; i != sz; ++i) {
        vc.push_back(sin(i));   //Arbitrary test function
    }
    auto tv1 = high_resolution_clock::now();
    auto durtv = duration_cast<milliseconds>(tv1 - tv0).count();
    cout << "vector of size " << vc.size() << ":  " << durtv << " msec\n\n";

    ////////////////////////////////////////////
    auto vc_test = vc;
    auto t0 = high_resolution_clock::now();
    auto s1 = accumulate(vc_test.begin(), vc_test.end(), 0.0);
    auto t1 = high_resolution_clock::now();
    auto dur1 = duration_cast<milliseconds>(t1 - t0).count();
    ///////////////////////////////////////////
    vc_test = vc;
    auto tt0 = high_resolution_clock::now();
    auto s2 = my_comp(vc_test, conc_num);       //Should be faster
    auto tt1 = high_resolution_clock::now();
    auto dur2 = duration_cast<milliseconds>(tt1 - tt0).count();
    ////////////////////////////////////////////
    vc_test = vc;
    auto ttt0 = high_resolution_clock::now();
    auto s3 = comp4(vc_test);       //Really is faster
    auto ttt1 = high_resolution_clock::now();
    auto dur3 = duration_cast<milliseconds>(ttt1 - ttt0).count();
    ///////////////////////////////////////////

    cout << dur1 << " msec\n";
    cout << "Output = " << s1 << " (accumulate)" << "\n\n";
    cout << dur2 << " msec" << "  Ratio:  " << double(dur2) / double(dur1) << "\n";
    cout << "Output = " << s2 << " (my_comp)" << "\n\n";
    cout << dur3 << " msec" << "  Ratio:  " << double(dur3) / double(dur1) << "\n";
    cout << "Output = " << s3 << " (comp4)" << "\n\n";
}

使用 Visual C++ 2019（ISO C++17 標准（/std:c++17））X64 版本編譯。 一個典型的 output 是：

424 毫秒 Output = 1.93496（累積）

431 毫秒比率：1.01651 Output = 1.93496 (my_comp)

117 毫秒比率：0.275943 Output = 1.93496 (comp4)

我知道並行算法和 std::reduce。 我的問題不是如何優化這個特定的計算，而是學習如何編寫符合預期的並發代碼。

Answer 1

你的問題在這里： (i / conc) 。 一旦0 <= i < conc ，並且i和conc是 integer，這意味着這個計算總是為零。

要解決您的問題，請刪除括號：

auto f = async(accum, v0 + sz * i / conc, v0 + sz * (i + 1) / conc, 0.0);

為什么 for 循環中的異步不能提高執行時間？

問題描述

1 個解決方案

解決方案1
2 已采納 2019-11-04 22:14:06

為什么 for 循環中的異步不能提高執行時間？

問題描述

1 個解決方案

解決方案1 2 已采納 2019-11-04 22:14:06

解決方案1
2 已采納 2019-11-04 22:14:06