简体   繁体   English


[英]Why is my parallel task management so slow?

For reasons explained below I have started to investigate the time it takes to create and run a thread. 由于以下原因,我已经开始研究创建和运行线程所花费的时间。 The way I do it, I found this process to take about 26 ms for 10 threads which is much longer than it should be - at least from my understanding. 我这样做的方法是,我发现此过程需要10个线程大约26毫秒,这比它应该的要长得多-至少从我的理解来看。

A short background: 简短的背景:

I'm working on a game that uses pathfinding. 我正在开发使用寻路的游戏。 After adding more entities it became necessary to parallise the process. 添加更多实体后,有必要使该过程并行化。

I want this to be as readable as possible so I've created a ParallelTask class that holds a thread , std::function (that should be executed by the tread), a mutex to protect some write operations and a bool is completed that is set to true once the thread has finished executing. 我希望它尽可能可读,因此我创建了一个ParallelTask类,该类包含一个线程 std :: function (应由踩踏执行),一个互斥体以保护某些写操作,并且布尔值已完成 。线程完成执行后,将其设置为true。

I'm new to multithreading so I have no idea if this is a good approach to begin with but never the less I'm confused why it takes so long to execute. 我是多线程技术的新手,所以我不知道这是不是一个好的开始,但是我总是感到困惑,为什么执行起来需要这么长时间。

I have written the code below to isolate the problem. 我已经编写了下面的代码来隔离问题。

int main()

    std::map<int, std::unique_ptr<ParallelTask>> parallelTaskDictionary;

    auto start = std::chrono::system_clock::now();

    for (size_t i = 0; i < 10; i++)
         parallelTaskDictionary.emplace(i, std::make_unique<ParallelTask>());

    auto end = std::chrono::system_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
    std::cout << elapsed.count() << std::endl;


    return 0;

class ParallelTask

    // Join the treads

    inline std::vector<int> GetPath() const { return path; }
    void Execute();

    std::thread thread;
    mutable std::mutex mutex;

    std::function<void()> threadFunction;
    bool completed;

    std::vector<int> path;

    threadFunction = [this]() {
            std::lock_guard<std::mutex> lock(mutex);
            this->completed = true;

    if (thread.joinable())

void ParallelTask::Execute()
    this->completed = false;

    // Launch the thread
    this->thread = std::thread(threadFunction);

Running this code gives me between 25 and 26 milliseconds of execution time. 运行这段代码可以给我25到26毫秒的执行时间。 Since this is meant to be used in a game its of course inacceptable. 由于这是要用于游戏中,因此它当然是不可接受的。

As previously mentioned, I do not understand why, especially since the threadFunction itself does literally noting. 如前所述,我不理解为什么,尤其是因为threadFunction本身确实做到了这一点。 In case you wonder, I have even removed the mutex lock and it gave me literally the same result so there must be something else going on here. 如果您想知道,我什至删除了互斥锁,它实际上给了我相同的结果,因此这里肯定还有其他事情发生。 (From my research creating a thread shouldn't take more than a couple microseconds but maybe I'm just wrong with that ^^) (根据我的研究,创建线程的时间不应超过几微秒,但也许我只是错了^^)

PS: Oh yeah and while we are at it, I still don't really understand who should own the mutex. PS:哦,是的,虽然我们在开会,但我仍然不太了解谁应该拥有互斥量。 (Is there one global or one per object...)??? (是否有一个全局对象或每个对象一个...)???

If you want to measure the time of execution only, I think you should put the now and end statements inside the threadFunction only where the work is done, as shown in the code below. 如果您只想测量执行时间,我认为您应该仅在完成工作的地方将now和end语句放入threadFunction中,如下面的代码所示。

#include <map>
#include <iostream>
#include <memory>
#include <chrono>
#include <vector>
#include <thread>
#include <mutex>
#include <functional>

class ParallelTask

    // Join the treads

    inline std::vector<int> GetPath() const { return path; }
    void Execute();

    std::thread thread;
    mutable std::mutex mutex;

    std::function<void()> threadFunction;
    bool completed;

    std::vector<int> path;

    threadFunction = [this]() {
            auto start = std::chrono::system_clock::now();
            std::lock_guard<std::mutex> lock(mutex);
            this->completed = true;
            auto end = std::chrono::system_clock::now();
            auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
            std::cout << "elapsed time" << elapsed.count() << std::endl;

    if (thread.joinable())

void ParallelTask::Execute()
    this->completed = false;

    // Launch the thread
    this->thread = std::thread(threadFunction);

int main()

    std::map<int, std::unique_ptr<ParallelTask>> parallelTaskDictionary;

    for (size_t i = 0; i < 10; i++)
         parallelTaskDictionary.emplace(i, std::make_unique<ParallelTask>());


    return 0;

which gives an output: 输出:

elapsed time1
elapsed time0
elapsed time0
elapsed time0
elapsed time0
elapsed time0elapsed time
elapsed time0
elapsed time0
elapsed time0

Because we exclude the time it takes to spin up the thread. 因为我们排除了启动线程所需的时间。

And just as a sanity check, if you really want to see the effect of real work, you could add, 就像进行完整性检查一样,如果您真的想查看实际工作的效果,则可以添加,

        using namespace std::chrono_literals;

to your threadFunction , to make it look like this 到您的threadFunction ,使其看起来像这样

    threadFunction = [this]() {
            auto start = std::chrono::system_clock::now();
            std::lock_guard<std::mutex> lock(mutex);
            this->completed = true;
            using namespace std::chrono_literals;
            auto end = std::chrono::system_clock::now();
            auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
            std::cout << "elapsed time" << elapsed.count() << std::endl;

and the output will be, 输出将是

elapsed time2000061
elapsed timeelapsed time2000103
elapsed timeelapsed time20000222000061
elapsed time2000050
elapsed time2000061
elapsed time200012

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM