简体   繁体   English

C++17 并行算法是否已经实现?

[英]Are C++17 Parallel Algorithms implemented already?

I was trying to play around with the new parallel library features proposed in the C++17 standard, but I couldn't get it to work.我试图尝试使用 C++17 标准中提出的新并行库功能,但我无法让它工作。 I tried compiling with the up-to-date versions of g++ 8.1.1 and clang++-6.0 and -std=c++17 , but neither seemed to support #include <execution> , std::execution::par or anything similar.我尝试使用最新版本的g++ 8.1.1clang++-6.0-std=c++17 ,但似乎都不支持#include <execution>std::execution::par或任何类似的东西.

When looking at the cppreference for parallel algorithms there is a long list of algorithms, claiming在查看并行算法的cppreference 时,有一长串算法,声称

Technical specification provides parallelized versions of the following 69 algorithms from algorithm , numeric and memory : ( ... long list ...)技术规范从algorithmnumericmemory提供了以下 69 种算法的并行版本: (...长列表...)

which sounds like the algorithms are ready 'on paper' , but not ready to use yet?听起来这些算法已经“纸上谈兵”了,但还没有准备好使用?

In this SO question from over a year ago the answers claim these features hadn't been implemented yet.在一年多前的这个 SO 问题中,答案声称这些功能尚未实现。 But by now I would have expected to see some kind of implementation.但到现在为止,我本来希望看到某种实现。 Is there anything we can use already?有什么我们可以使用的吗?

GCC 9 has them but you have to install TBB separately GCC 9 有它们,但你必须单独安装 TBB

In Ubuntu 19.10, all components have finally aligned:在 Ubuntu 19.10 中,所有组件终于对齐了:

  • GCC 9 is the default one , and the minimum required version for TBB GCC 9 是默认版本,也是 TBB 所需的最低版本
  • TBB (Intel Thread Building Blocks) is at 2019~U8-1, so it meets the minimum 2018 requirement TBB(Intel Thread Building Blocks)在2019~U8-1,所以满足2018年的最低要求

so you can simply do:所以你可以简单地做:

sudo apt install gcc libtbb-dev
g++ -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -o main.out main.cpp -ltbb
./main.out

and use as:并用作:

#include <execution>
#include <algorithm>

std::sort(std::execution::par_unseq, input.begin(), input.end());

see also the full runnable benchmark below.另请参阅下面的完整可运行基准。

GCC 9 and TBB 2018 are the first ones to work as mentioned in the release notes: https://gcc.gnu.org/gcc-9/changes.html GCC 9 和 TBB 2018 是第一个在发行说明中提到的工作: https : //gcc.gnu.org/gcc-9/changes.html

Parallel algorithms and <execution> (requires Thread Building Blocks 2018 or newer).并行算法和<execution> (需要 Thread Building Blocks 2018 或更新版本)。

Related threads:相关主题:

Ubuntu 18.04 installation Ubuntu 18.04 安装

Ubuntu 18.04 is a bit more involved: Ubuntu 18.04 涉及更多:

Here are fully automated tested commands for Ubuntu 18.04:以下是 Ubuntu 18.04 的全自动测试命令:

# Install GCC 9
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-9 g++-9

# Compile libtbb from source.
sudo apt-get build-dep libtbb-dev
git clone https://github.com/intel/tbb
cd tbb
git checkout 2019_U9
make -j `nproc`
TBB="$(pwd)"
TBB_RELEASE="${TBB}/build/linux_intel64_gcc_cc7.4.0_libc2.27_kernel4.15.0_release"

# Use them to compile our test program.
g++-9 -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -I "${TBB}/include" -L 
"${TBB_RELEASE}" -Wl,-rpath,"${TBB_RELEASE}" -o main.out main.cpp -ltbb
./main.out

Test program analysis测试程序分析

I have tested with this program that compares the parallel and serial sorting speed.我已经用这个程序进行了测试,该程序比较了并行和串行排序速度。

main.cpp主程序

#include <algorithm>
#include <cassert>
#include <chrono>
#include <execution>
#include <random>
#include <iostream>
#include <vector>

int main(int argc, char **argv) {
    using clk = std::chrono::high_resolution_clock;
    decltype(clk::now()) start, end;
    std::vector<unsigned long long> input_parallel, input_serial;
    unsigned int seed;
    unsigned long long n;

    // CLI arguments;
    std::uniform_int_distribution<uint64_t> zero_ull_max(0);
    if (argc > 1) {
        n = std::strtoll(argv[1], NULL, 0);
    } else {
        n = 10;
    }
    if (argc > 2) {
        seed = std::stoi(argv[2]);
    } else {
        seed = std::random_device()();
    }

    std::mt19937 prng(seed);
    for (unsigned long long i = 0; i < n; ++i) {
        input_parallel.push_back(zero_ull_max(prng));
    }
    input_serial = input_parallel;

    // Sort and time parallel.
    start = clk::now();
    std::sort(std::execution::par_unseq, input_parallel.begin(), input_parallel.end());
    end = clk::now();
    std::cout << "parallel " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;

    // Sort and time serial.
    start = clk::now();
    std::sort(std::execution::seq, input_serial.begin(), input_serial.end());
    end = clk::now();
    std::cout << "serial " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;

    assert(input_parallel == input_serial);
}

On Ubuntu 19.10, Lenovo ThinkPad P51 laptop with CPU: Intel Core i7-7820HQ CPU (4 cores / 8 threads, 2.90 GHz base, 8 MB cache), RAM: 2x Samsung M471A2K43BB1-CRC (2x 16GiB, 2400 Mbps) a typical output for an input with 100 million numbers to be sorted:在 Ubuntu 19.10 上,Lenovo ThinkPad P51 笔记本电脑,CPU:Intel Core i7-7820HQ CPU(4 核/8 线程,2.90 GHz 基础,8 MB 缓存),RAM:2x Samsung M471A2K43BB1-CRC(2x 16GiB,2400 Mbps)典型输出对于要排序的 1 亿个数字的输入:

./main.out 100000000

was:曾是:

parallel 2.00886 s
serial 9.37583 s

so the parallel version was about 4.5 times faster!所以并行版本快了大约 4.5 倍! See also: What do the terms "CPU bound" and "I/O bound" mean?另请参阅: 术语“CPU 绑定”和“I/O 绑定”是什么意思?

We can confirm that the process is spawning threads with strace :我们可以确认该进程正在使用strace线程:

strace -f -s999 -v ./main.out 100000000 |& grep -E 'clone'

which shows several lines of type:其中显示了几行类型:

[pid 25774] clone(strace: Process 25788 attached
[pid 25774] <... clone resumed> child_stack=0x7fd8c57f4fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fd8c57f59d0, tls=0x7fd8c57f5700, child_tidptr=0x7fd8c57f59d0) = 25788

Also, if I comment out the serial version and run with:另外,如果我注释掉串行版本并运行:

time ./main.out 100000000

I get:我得到:

real    0m5.135s
user    0m17.824s
sys     0m0.902s

which confirms again that the algorithm was parallelized since real < user , and gives an idea of how effectively it can be parallelized in my system (about 3.5x for 8 cores).再次证实了该算法自 real < user 以来是并行化的,并给出了它在我的系统中并行化的效率(8 核约为 3.5 倍)。

Error messages错误信息

Google, index this please.谷歌,请索引这个。

If you don't have tbb installed, the error is:如果你没有安装tbb,错误是:

In file included from /usr/include/c++/9/pstl/parallel_backend.h:14,
                 from /usr/include/c++/9/pstl/algorithm_impl.h:25,
                 from /usr/include/c++/9/pstl/glue_execution_defs.h:52,
                 from /usr/include/c++/9/execution:32,
                 from parallel_sort.cpp:4:
/usr/include/c++/9/pstl/parallel_backend_tbb.h:19:10: fatal error: tbb/blocked_range.h: No such file or directory
   19 | #include <tbb/blocked_range.h>
      |          ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.

so we see that <execution> depends on an uninstalled TBB component.所以我们看到<execution>依赖于一个卸载的 TBB 组件。

If TBB is too old, eg the default Ubuntu 18.04 one, it fails with:如果 TBB 太旧,例如默认的 Ubuntu 18.04,它会失败:

#error Intel(R) Threading Building Blocks 2018 is required; older versions are not supported.

You can refer https://en.cppreference.com/w/cpp/compiler_support to check all C++ feature implementation status.您可以参考https://en.cppreference.com/w/cpp/compiler_support查看所有C++功能实现状态。 For your case, just search " Standardization of Parallelism TS ", and you will find only MSVC and Intel C++ compilers support this feature now.对于您的情况,只需搜索“ Standardization of Parallelism TS ”,您就会发现现在只有MSVCIntel C++编译器支持此功能。

Intel has released a Parallel STL library which follows the C++17 standard: Intel 发布了一个遵循 C++17 标准的并行 STL 库:

It is being merged into GCC .它正在被合并到 GCC 中

Gcc does not yet implement the Parallelism TS (see https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2017 ) Gcc 尚未实现 Parallelism TS(参见https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2017

However libstdc++ (with gcc) has an experimental mode for some equivalent parallel algorithms.然而,libstdc++(带gcc)有一些等效并行算法的实验模式。 See https://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.htmlhttps://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html

Getting it to work:让它工作:

Any use of parallel functionality requires additional compiler and runtime support, in particular support for OpenMP.任何并行功能的使用都需要额外的编译器和运行时支持,尤其是对 OpenMP 的支持。 Adding this support is not difficult: just compile your application with the compiler flag -fopenmp.添加此支持并不困难:只需使用编译器标志 -fopenmp 编译您的应用程序。 This will link in libgomp, the GNU Offloading and Multi Processing Runtime Library, whose presence is mandatory.这将链接到 libgomp,GNU 卸载和多处理运行时库,它的存在是强制性的。

Code example代码示例

#include <vector>
#include <parallel/algorithm>

int main()
{
  std::vector<int> v(100);

  // ...

  // Explicitly force a call to parallel sort.
  __gnu_parallel::sort(v.begin(), v.end());
  return 0;
}

Gcc 现在支持执行标头,但不支持来自https://apt.llvm.org 的标准 clang 构建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM