Using BinaryOp within std::reduce() from <numeric> with parallel execution policy

Question

I could not spot a problem with my use of std::reduce() function from the <numeric> STL header.

Since I have found workaround, I will show first expected behavior:

uint64_t f(uint64_t n)
{
   return 1ull; 
}

uint64_t solution(uint64_t N) // here N == 10000000
{
    uint64_t r(0);

    // persistent array of primes
    const auto& app = YTL::AccumulativePrimes::global().items(); 

    auto citEnd = std::upper_bound(app.cbegin(), app.cend(), 2*N);
    auto citBegin = std::lower_bound(app.cbegin(), citEnd, N);

    std::vector<uint64_t> v(citBegin, citEnd);

    std::for_each(std::execution::par,
                    v.begin(), v.end(),
                    [](auto& p)->void {p = f(p); });

    r = std::reduce(std::execution::par, v.cbegin(), v.cend(), 0);
    return r; // here is correct answer: 606028
}

However , if I want to avoid intermediate vector and instead apply binary operator on the spot in reduce() itself, also in parallel, it gives me a different answer each time:

uint64_t f(uint64_t n)
{
   return 1ull;
}

uint64_t solution(uint64_t N) // here N == 10000000
{
    uint64_t r(0);

    // persistent array of primes
    const auto& app = YTL::AccumulativePrimes::global().items(); 

    auto citEnd = std::upper_bound(app.cbegin(), app.cend(), 2*N);
    auto citBegin = std::lower_bound(app.cbegin(), citEnd, N);

    // bug in parallel reduce?! 
    r = std::reduce(std::execution::par,
                    citBegin, citEnd, 0ull,
                    [](const uint64_t& r, const uint64_t& v)->uint64_t { return r + f(v); });
    return r; // here the value of r is different every time I run!! 
}

Could anyone explain why the latter usage is wrong?

I am using MS C++ compiler cl.exe: Version 19.28.29333.0;
Windows SDK version: 10.0.18362.0;
Platform Toolset: Visual Studio 2019 (v142)
C++ language standard: Preview - Features from the Latest C++ Working Draft (/std:c++latest)
Computer: Dell XPS 9570 i7-8750H CPU @ 2.20GHz, 16GB RAM OS: Windows 10 64bit

Answer 1

From cppreference : "The behavior is non-deterministic if binary_op is not associative or not commutative." Which is what you observe; yours is not commutative.

Your binary operation makes an assumption that the first parameter is always the accumulator, and the second parameter is always an element value. That is not generally the case. Eg the simplest form of parallel reduce would split the range in two halves, reduce each, then combine the results - using the same operation, which in your case would lose track of half the values.

What you really want is std::transform_reduce . As in

r = std::transform_reduce(
        std::execution::par, citBegin, citEnd, 0ull,
        std::plus<uint64_t>{}, f);

Using BinaryOp within std::reduce() from <numeric> with parallel execution policy

Question

1 answers

solution1
2 ACCPTED 2020-11-21 02:08:30

Using BinaryOp within std::reduce() from <numeric> with parallel execution policy

Question

1 answers

solution1 2 ACCPTED 2020-11-21 02:08:30

solution1
2 ACCPTED 2020-11-21 02:08:30