线程没有提高代码性能

Question

I am trying to convert a basic long loop into thread to improve the loop performance.我正在尝试将基本的长循环转换为线程以提高循环性能。

Here is the threaded version:这是线程版本：

#include <iostream>
#include <thread>
#include <chrono>
using namespace std;
using namespace std::chrono;

void funcSum(long long int start, long long int end, long long int *sum)
{
    for(auto i = start; i <= end; ++i)
    {
        *sum += i;
    }
}

int main()
{
    long long int start = 10, end = 1900000000;
    long long int sum = 0;
    auto startTime = high_resolution_clock::now();
    thread t1(funcSum, start, end / 2, &sum);
    thread t2(funcSum, end / 2 + 1 , end, &sum);
    t1.join();
    t2.join();
    auto stopTime = high_resolution_clock::now();
    auto duration = duration_cast<seconds>(stopTime - startTime);
    cout << "Sum: " << sum << endl;
    cout << duration.count() << " Seconds";
    return 0;
}

And here is the normal code (Without threads):这是正常的代码（没有线程）：

#include <iostream>
#include <thread>
#include <chrono>
using namespace std;
using namespace std::chrono;

void funcSum(long long int start, long long int end, long long int *sum)
{
    for(auto i = start; i <= end; ++i)
    {
        *sum += i;
    }
}

int main()
{
    long long int start = 10, end = 1900000000;
    long long int sum = 0;
    auto startTime = high_resolution_clock::now();
    funcSum(start, end, &sum);
    auto stopTime = high_resolution_clock::now();
    auto duration = duration_cast<seconds>(stopTime - startTime);
    cout << "Sum: " << sum << endl;
    cout << duration.count() << " Seconds";
    return 0;
}

Sum: 1805000000949999955 5 Seconds Process finished with exit code 0总和：1805000000949999955 5 秒过程完成，退出代码为 0

In both the cases, time spent is 5 seconds.在这两种情况下，花费的时间都是 5 秒。

Why the first threaded version does not improve the performance?为什么第一个线程版本没有提高性能？ How do I decrease the time using threads for this sum of range?对于这个范围的总和，如何减少使用线程的时间？

Answer 1

Fixed version of threaded code:固定版本的线程代码：

// Compute the sum of start ... end
class Summer {
public:
    long long int start;
    long long int end;
    long long int sum = 0;

    Summer(long long int aStart, long long int aEnd)
        : start(aStart),
        end(aEnd)
    {
    }

    void funcSum()
    {
        sum = 0;
        for (auto i = start; i <= end; ++i)
        {
            sum += i;
        }
    }
};

class SummerFunctor {
    Summer& mSummer;
public:
    SummerFunctor(Summer& aSummer)
    : mSummer(aSummer)
    {
    }

    void operator()()
    {
        mSummer.funcSum();
    }
};

// Version with n thread objects reports 
// 1 threads, sum = 1805000000949999955, 1587 ms
// 2 threads, sum = 1805000000949999955, 2547 ms
// 4 threads, sum = 1805000000949999955, 1251 ms
// 6 threads, sum = 1805000000949999955, 916 ms
int main()
{
    long long int start = 10, end = 1900000000;
    long long int sum = 0;
    auto startTime = high_resolution_clock::now();
    const size_t threadCount = 6;

    if (threadCount < 2) {
        funcSum(start, end, &sum);
    } else {
        Summer* summers[threadCount];
        std::thread* threads[threadCount];

        // Start threads
        auto val = start;
        auto partitionSize = (end-start) / threadCount;
        for (size_t i = 0; i < threadCount; ++i) {
            auto partitionEnd = std::min(start + partitionSize, end);
            summers[i] = new Summer(start, partitionEnd);
            start = partitionEnd + 1;
            SummerFunctor functor (*summers[i]);
            threads[i] = new std::thread(functor);
        }

        // Join threads
        for (size_t i = 0; i < threadCount; ++i) {
            threads[i]->join();
            sum += summers[i]->sum;
            delete threads[i];
            delete summers[i];
        }
    }

    auto stopTime = high_resolution_clock::now();
    auto duration = duration_cast<milliseconds>(stopTime - startTime);
    cout << threadCount << " threads, sum = " << sum << ", " << duration.count() << " ms" << std::endl;
    return 0;
}

I had to wrap the Summer object with a functor because std::thread insists on making a copy of a functor handed to it, that we can't access later.我不得不用函子包装 Summer object，因为 std::thread 坚持要复制一个交给它的函子，我们以后无法访问。 The execution gets better when more threads are used (running times see comments).当使用更多线程时，执行会变得更好（运行时间见注释）。 Possible reasons for this:可能的原因：

The CPU has to synchronize access to the memory pages even though the threads use separate variables here because the variables likely lie in the same page CPU 必须同步对 memory 页面的访问，即使线程在这里使用单独的变量，因为变量可能位于同一页面中
If there is only one thread running on a CPU, that thread may run at higher CPU frequency, but several threads may run only at normal CPU frequency如果一个 CPU 上只有一个线程在运行，则该线程可能以更高的 CPU 频率运行，但多个线程可能仅以正常的 CPU 频率运行
CPU cores often share arithmetic units CPU 内核通常共享算术单元
Without threads, the compiler can make optimizations that are not possible with threads.没有线程，编译器可以进行线程无法实现的优化。 In theory, the compiler could unroll the loop and directly print the result.理论上，编译器可以展开循环并直接打印结果。

线程没有提高代码性能

问题描述

1 个解决方案

解决方案1
0 2021-03-11 15:10:05

线程没有提高代码性能

问题描述

1 个解决方案

解决方案1 0 2021-03-11 15:10:05

解决方案1
0 2021-03-11 15:10:05