为什么并行任务管理这么慢？

Question

For reasons explained below I have started to investigate the time it takes to create and run a thread. 由于以下原因，我已经开始研究创建和运行线程所花费的时间。 The way I do it, I found this process to take about 26 ms for 10 threads which is much longer than it should be - at least from my understanding. 我这样做的方法是，我发现此过程需要10个线程大约26毫秒，这比它应该的要长得多-至少从我的理解来看。

A short background: 简短的背景：

I'm working on a game that uses pathfinding. 我正在开发使用寻路的游戏。 After adding more entities it became necessary to parallise the process. 添加更多实体后，有必要使该过程并行化。

I want this to be as readable as possible so I've created a ParallelTask class that holds a thread , std::function (that should be executed by the tread), a mutex to protect some write operations and a bool is completed that is set to true once the thread has finished executing. 我希望它尽可能可读，因此我创建了一个ParallelTask类，该类包含一个线程 std :: function （应由踩踏执行），一个互斥体以保护某些写操作，并且布尔值已完成 。线程完成执行后，将其设置为true。

I'm new to multithreading so I have no idea if this is a good approach to begin with but never the less I'm confused why it takes so long to execute. 我是多线程技术的新手，所以我不知道这是不是一个好的开始，但是我总是感到困惑，为什么执行起来需要这么长时间。

I have written the code below to isolate the problem. 我已经编写了下面的代码来隔离问题。

int main()
{

    std::map<int, std::unique_ptr<ParallelTask>> parallelTaskDictionary;

    auto start = std::chrono::system_clock::now();

    for (size_t i = 0; i < 10; i++)
    {
         parallelTaskDictionary.emplace(i, std::make_unique<ParallelTask>());
         parallelTaskDictionary[i]->Execute();
    }

    auto end = std::chrono::system_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
    std::cout << elapsed.count() << std::endl;

    parallelTaskDictionary.clear();

    return 0;
}


class ParallelTask
{
public:

    ParallelTask();
    // Join the treads
    ~ParallelTask();

public:
    inline std::vector<int> GetPath() const { return path; }
    void Execute();

private:
    std::thread thread;
    mutable std::mutex mutex;

    std::function<void()> threadFunction;
    bool completed;

    std::vector<int> path;
};


ParallelTask::ParallelTask()
{
    threadFunction = [this]() {
        {
            std::lock_guard<std::mutex> lock(mutex);
            this->completed = true;
        }
    };
}

ParallelTask::~ParallelTask()
{
    if (thread.joinable())
    thread.join();
}

void ParallelTask::Execute()
{
    this->completed = false;

    // Launch the thread
    this->thread = std::thread(threadFunction);
}

Running this code gives me between 25 and 26 milliseconds of execution time. 运行这段代码可以给我25到26毫秒的执行时间。 Since this is meant to be used in a game its of course inacceptable. 由于这是要用于游戏中，因此它当然是不可接受的。

As previously mentioned, I do not understand why, especially since the threadFunction itself does literally noting. 如前所述，我不理解为什么，尤其是因为threadFunction本身确实做到了这一点。 In case you wonder, I have even removed the mutex lock and it gave me literally the same result so there must be something else going on here. 如果您想知道，我什至删除了互斥锁，它实际上给了我相同的结果，因此这里肯定还有其他事情发生。 (From my research creating a thread shouldn't take more than a couple microseconds but maybe I'm just wrong with that ^^) （根据我的研究，创建线程的时间不应超过几微秒，但也许我只是错了^^）

PS: Oh yeah and while we are at it, I still don't really understand who should own the mutex. PS：哦，是的，虽然我们在开会，但我仍然不太了解谁应该拥有互斥量。 (Is there one global or one per object...)??? （是否有一个全局对象或每个对象一个...）？？？

Answer 1

If you want to measure the time of execution only, I think you should put the now and end statements inside the threadFunction only where the work is done, as shown in the code below. 如果您只想测量执行时间，我认为您应该仅在完成工作的地方将now和end语句放入threadFunction中，如下面的代码所示。

#include <map>
#include <iostream>
#include <memory>
#include <chrono>
#include <vector>
#include <thread>
#include <mutex>
#include <functional>

class ParallelTask
{
public:

    ParallelTask();
    // Join the treads
    ~ParallelTask();

public:
    inline std::vector<int> GetPath() const { return path; }
    void Execute();

private:
    std::thread thread;
    mutable std::mutex mutex;

    std::function<void()> threadFunction;
    bool completed;

    std::vector<int> path;
};


ParallelTask::ParallelTask()
{
    threadFunction = [this]() {
        {
            auto start = std::chrono::system_clock::now();
            std::lock_guard<std::mutex> lock(mutex);
            this->completed = true;
            auto end = std::chrono::system_clock::now();
            auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
            std::cout << "elapsed time" << elapsed.count() << std::endl;
        }
    };
}

ParallelTask::~ParallelTask()
{
    if (thread.joinable())
    thread.join();
}

void ParallelTask::Execute()
{
    this->completed = false;

    // Launch the thread
    this->thread = std::thread(threadFunction);
}


int main()
{

    std::map<int, std::unique_ptr<ParallelTask>> parallelTaskDictionary;


    for (size_t i = 0; i < 10; i++)
    {
         parallelTaskDictionary.emplace(i, std::make_unique<ParallelTask>());
         parallelTaskDictionary[i]->Execute();
    }

    parallelTaskDictionary.clear();

    return 0;
}

which gives an output: 输出：

elapsed time1
elapsed time0
elapsed time0
elapsed time0
elapsed time0
elapsed time0elapsed time
0
elapsed time0
elapsed time0
elapsed time0

Because we exclude the time it takes to spin up the thread. 因为我们排除了启动线程所需的时间。

And just as a sanity check, if you really want to see the effect of real work, you could add, 就像进行完整性检查一样，如果您真的想查看实际工作的效果，则可以添加，

        using namespace std::chrono_literals;
        std::this_thread::sleep_for(2s);

to your threadFunction , to make it look like this 到您的threadFunction ，使其看起来像这样

ParallelTask::ParallelTask()
{
    threadFunction = [this]() {
        {
            auto start = std::chrono::system_clock::now();
            std::lock_guard<std::mutex> lock(mutex);
            this->completed = true;
            using namespace std::chrono_literals;
            std::this_thread::sleep_for(2s);
            auto end = std::chrono::system_clock::now();
            auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
            std::cout << "elapsed time" << elapsed.count() << std::endl;
        }
    };
}

and the output will be, 输出将是

elapsed time2000061
elapsed timeelapsed time2000103
elapsed timeelapsed time20000222000061
elapsed time2000050
2000072
elapsed time2000061
elapsed time200012

为什么并行任务管理这么慢？

问题描述

1 个解决方案

解决方案1
1 2019-02-10 20:57:04

为什么并行任务管理这么慢？

问题描述

1 个解决方案

解决方案1 1 2019-02-10 20:57:04

解决方案1
1 2019-02-10 20:57:04