簡體   English   中英

線程沒有提高代碼性能

[英]Thread not improving the code performance

我正在嘗試將基本的長循環轉換為線程以提高循環性能。

這是線程版本:

#include <iostream>
#include <thread>
#include <chrono>
using namespace std;
using namespace std::chrono;

void funcSum(long long int start, long long int end, long long int *sum)
{
    for(auto i = start; i <= end; ++i)
    {
        *sum += i;
    }
}

int main()
{
    long long int start = 10, end = 1900000000;
    long long int sum = 0;
    auto startTime = high_resolution_clock::now();
    thread t1(funcSum, start, end / 2, &sum);
    thread t2(funcSum, end / 2 + 1 , end, &sum);
    t1.join();
    t2.join();
    auto stopTime = high_resolution_clock::now();
    auto duration = duration_cast<seconds>(stopTime - startTime);
    cout << "Sum: " << sum << endl;
    cout << duration.count() << " Seconds";
    return 0;
}

這是正常的代碼(沒有線程):

#include <iostream>
#include <thread>
#include <chrono>
using namespace std;
using namespace std::chrono;

void funcSum(long long int start, long long int end, long long int *sum)
{
    for(auto i = start; i <= end; ++i)
    {
        *sum += i;
    }
}

int main()
{
    long long int start = 10, end = 1900000000;
    long long int sum = 0;
    auto startTime = high_resolution_clock::now();
    funcSum(start, end, &sum);
    auto stopTime = high_resolution_clock::now();
    auto duration = duration_cast<seconds>(stopTime - startTime);
    cout << "Sum: " << sum << endl;
    cout << duration.count() << " Seconds";
    return 0;
}

總和:1805000000949999955 5 秒過程完成,退出代碼為 0

在這兩種情況下,花費的時間都是 5 秒。

為什么第一個線程版本沒有提高性能? 對於這個范圍的總和,如何減少使用線程的時間?

固定版本的線程代碼:

// Compute the sum of start ... end
class Summer {
public:
    long long int start;
    long long int end;
    long long int sum = 0;

    Summer(long long int aStart, long long int aEnd)
        : start(aStart),
        end(aEnd)
    {
    }

    void funcSum()
    {
        sum = 0;
        for (auto i = start; i <= end; ++i)
        {
            sum += i;
        }
    }
};

class SummerFunctor {
    Summer& mSummer;
public:
    SummerFunctor(Summer& aSummer)
    : mSummer(aSummer)
    {
    }

    void operator()()
    {
        mSummer.funcSum();
    }
};

// Version with n thread objects reports 
// 1 threads, sum = 1805000000949999955, 1587 ms
// 2 threads, sum = 1805000000949999955, 2547 ms
// 4 threads, sum = 1805000000949999955, 1251 ms
// 6 threads, sum = 1805000000949999955, 916 ms
int main()
{
    long long int start = 10, end = 1900000000;
    long long int sum = 0;
    auto startTime = high_resolution_clock::now();
    const size_t threadCount = 6;

    if (threadCount < 2) {
        funcSum(start, end, &sum);
    } else {
        Summer* summers[threadCount];
        std::thread* threads[threadCount];

        // Start threads
        auto val = start;
        auto partitionSize = (end-start) / threadCount;
        for (size_t i = 0; i < threadCount; ++i) {
            auto partitionEnd = std::min(start + partitionSize, end);
            summers[i] = new Summer(start, partitionEnd);
            start = partitionEnd + 1;
            SummerFunctor functor (*summers[i]);
            threads[i] = new std::thread(functor);
        }

        // Join threads
        for (size_t i = 0; i < threadCount; ++i) {
            threads[i]->join();
            sum += summers[i]->sum;
            delete threads[i];
            delete summers[i];
        }
    }

    auto stopTime = high_resolution_clock::now();
    auto duration = duration_cast<milliseconds>(stopTime - startTime);
    cout << threadCount << " threads, sum = " << sum << ", " << duration.count() << " ms" << std::endl;
    return 0;
}

我不得不用函子包裝 Summer object,因為 std::thread 堅持要復制一個交給它的函子,我們以后無法訪問。 當使用更多線程時,執行會變得更好(運行時間見注釋)。 可能的原因:

  1. CPU 必須同步對 memory 頁面的訪問,即使線程在這里使用單獨的變量,因為變量可能位於同一頁面中
  2. 如果一個 CPU 上只有一個線程在運行,則該線程可能以更高的 CPU 頻率運行,但多個線程可能僅以正常的 CPU 頻率運行
  3. CPU 內核通常共享算術單元
  4. 沒有線程,編譯器可以進行線程無法實現的優化。 理論上,編譯器可以展開循環並直接打印結果。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM