C ++ 11异步只使用一个核心

Question

I'm trying to parallelise a long running function in C++ and using std::async it only uses one core. 我试图在C ++中并行化长时间运行的函数，并使用std :: async它只使用一个核心。

It's not the running time of the function is too small, as I'm currently using test data that takes about 10 mins to run. 这不是函数的运行时间太小，因为我目前正在使用大约需要10分钟运行的测试数据。

From my logic I create NThreads worth of Futures (each taking a proportion of the loop rather than an individual cell so it is a nicely long running thread), each of which will dispatch an async task. 根据我的逻辑，我创建了NThreads值得期货（每个都占用循环的一部分而不是单个单元格，所以它是一个很好的长期运行的线程），每个都将调度一个异步任务。 Then after they've been created the program spin locks waiting for them to complete. 然后在创建它们之后，程序旋转锁定等待它们完成。 However it always uses one core?! 但是它总是使用一个核心？！

This isn't me looking at top either and saying it looks roughly like one CPU, my ZSH config outputs the CPU % of the last command, and it always exactly 100%, never above 这不是我看顶部并说它看起来大致像一个CPU，我的ZSH配置输出最后一个命令的CPU％，它总是正好 100％，从不高于

auto NThreads = 12;
auto BlockSize = (int)std::ceil((int)(NThreads / PathCountLength));

std::vector<std::future<std::vector<unsigned __int128>>> Futures;

for (auto I = 0; I < NThreads; ++I) {
    std::cout << "HERE" << std::endl;
    unsigned __int128 Min = I * BlockSize;
    unsigned __int128 Max = I * BlockSize + BlockSize;

    if (I == NThreads - 1)
        Max = PathCountLength;

    Futures.push_back(std::async(
        [](unsigned __int128 WMin, unsigned __int128 Min, unsigned__int128 Max,
           std::vector<unsigned __int128> ZeroChildren,
           std::vector<unsigned __int128> OneChildren,
           unsigned __int128 PathCountLength)
           -> std::vector<unsigned __int128> {
           std::vector<unsigned __int128> LocalCount;
           for (unsigned __int128 I = Min; I < Max; ++I)
               LocalCount.push_back(KneeParallel::pathCountOrStatic(
                   WMin, I, ZeroChildren, OneChildren, PathCountLength));
          return LocalCount;
    },
    WMin, Min, Max, ZeroChildInit, OneChildInit, PathCountLength));
}

for (auto &Future : Futures) {
    Future.get();
}

Does anyone have any insight. 有没有人有任何见解。

I'm compiling with clang and LLVM on Arch Linux. 我正在使用Arch Linux上的clang和LLVM进行编译。 Are there any compile flags I need, but from what I can tell C++11 standardised the thread library? 有没有我需要的编译标志，但从我可以告诉C ++ 11标准化的线程库？

Edit: If it helps anyone giving any further clues, when I comment out the local vector it runs on all cores as it should, when I drop it back in rolls back to one core. 编辑：如果它可以帮助任何人提供任何进一步的线索，当我注释掉本地矢量时，它会在所有内核上运行，就像它应该的那样，当我将它放回滚动回到一个核心时。

Edit 2: So I pinned down the solution, but it seems very bizarre. 编辑2：所以我把解决方案固定下来，但看起来很奇怪。 Returning the vector from the lambda function fixed it to one core, so now I get round this by passing in a shared_ptr to the output vector and manipulating that. 从lambda函数返回向量将其固定为一个核心，所以现在通过将shared_ptr传递给输出向量并操纵它来绕过它。 And hey presto, it fires up on the cores! 嘿presto，它在核心上爆发！

I figured it was pointless now using futures as I don't have a return and I'd use threads instead, nope!, using threads with no returns also uses one core. 我认为现在使用期货是毫无意义的，因为我没有返回，我会使用线程，而不是！使用没有返回的线程也使用一个核心。 Weird eh? 怪啊呃？

Fine, go back to using futures, just return an into to throw away or something. 很好，回到使用期货，只需返回扔掉或什么的。 Yep you guessed it, even returning an int from the thread sticks the program to one core. 是的，你猜对了，即使从线程中返回一个int，也会将程序粘贴到一个核心。 Except futures can't have void lambda functions. 除了期货不能有无效的lambda函数。 So my solution is to pass a pointer in to store the output, to an int lambda function that never returns anything. 所以我的解决方案是传入一个指针来存储输出，到一个永远不会返回任何东西的int lambda函数。 Yeah it feels like duct tape, but I can't see a better solution. 是的，它感觉像胶带，但我看不到更好的解决方案。

It seems so...bizzare? 看起来如此......古怪？ Like the compiler is somehow interpreting the lambda incorrectly. 就像编译器以某种方式错误地解释lambda一样。 Could it be because I use the dev release of LLVM and not a stable branch...? 可能是因为我使用了LLVM的开发版，而不是一个稳定的分支......？

Anyway my solution, because I hate nothing more than finding my problm on here and having no answer: 无论如何我的解决方案，因为我讨厌在这里找到我的问题并且没有答案：

auto NThreads = 4;
auto BlockSize = (int)std::ceil((int)(NThreads / PathCountLength));

auto Futures = std::vector<std::future<int>>(NThreads);
auto OutputVectors =
    std::vector<std::shared_ptr<std::vector<unsigned __int128>>>(
        NThreads, std::make_shared<std::vector<unsigned __int128>>());

for (auto I = 0; I < NThreads; ++I) {
  unsigned __int128 Min = I * BlockSize;
  unsigned __int128 Max = I * BlockSize + BlockSize;

if (I == NThreads - 1)
  Max = PathCountLength;

Futures[I] = std::async(
  std::launch::async,
  [](unsigned __int128 WMin, unsigned __int128 Min, unsigned __int128 Max,
       std::vector<unsigned __int128> ZeroChildren,
       std::vector<unsigned __int128> OneChildren,
       unsigned __int128 PathCountLength,
       std::shared_ptr<std::vector<unsigned __int128>> OutputVector)
        -> int {
      for (unsigned __int128 I = Min; I < Max; ++I) {
        OutputVector->push_back(KneeParallel::pathCountOrStatic(
            WMin, I, ZeroChildren, OneChildren, PathCountLength));
      }
    },
    WMin, Min, Max, ZeroChildInit, OneChildInit, PathCountLength,
    OutputVectors[I]);
}

for (auto &Future : Futures) {
  Future.get();
}

Answer 1

By providing a first argument to async, you can configure it to run deferred ( std::launch::deferred ), to run in its own thread ( std::launch::async ), or let the system decide between both options ( std::launch::async | std::launch::deferred ). 通过为async提供第一个参数，您可以将其配置为运行延迟（ std::launch::deferred ），在其自己的线程（ std::launch::async ）中运行，或让系统在两个选项之间进行决策（ std::launch::async | std::launch::deferred ）。 The latter is the default behavior. 后者是默认行为。

So, to force it to run in another thread, adapt your call of std::async to std::async(std::launch::async, /*...*/) . 因此，要强制它在另一个线程中运行，请调整std::async对std::async(std::launch::async, /*...*/)调用。

C ++ 11异步只使用一个核心

问题描述

1 个解决方案

解决方案1
14 2015-02-23 16:48:52

C ++ 11异步只使用一个核心

问题描述

1 个解决方案

解决方案1 14 2015-02-23 16:48:52

解决方案1
14 2015-02-23 16:48:52