简体   繁体   English

如何消除并行 std::transform_reduce() 的中间容器?

[英]How to eliminate intermediate container for parallel std::transform_reduce()?

Frequently, I have to find Sum( f(i), 1, N ) or Product( f(i), 1, N ) , where f(i) is computationally CPU-intensive, while integral i is from sequential range but huge.通常,我必须找到Sum( f(i), 1, N )Product( f(i), 1, N ) ,其中f(i)在计算上是 CPU 密集型的,而积分 i 来自顺序范围但很大.

Using C++20 compiler I can write function:使用 C++20 编译器,我可以编写 function:

uint64_t solution(uint64_t N)
{
    std::vector<uint64_t> v(N);
    std::iota(v.begin(), v.end(), 1ULL);

    return std::transform_reduce(
                std::execution::par, 
                v.cbegin(), v.cend(), 
                0ull, 
                std::plus<>(), 
                []f(const uint64_t& i)->uint64_t {
                   uint64_t result(0);
                   // expensive computation of result=f(i) goes here
                   // ...
                   return result;
                 });  

}

But that will be RAM constrained.但这将受到 RAM 的限制。

How I can completely eliminate intermediate memory operations with input vector in run-time using only C++20 STL (ie no vendor specific or 3rd party libraries) and yet have efficient parallel execution?如何在运行时仅使用 C++20 STL (即没有特定于供应商的库或第 3 方库)完全消除中间 memory 操作与输入向量,但仍具有高效的并行执行?

Disclaimer: I have no prior experience in implementing iterators or in C++20免责声明:我之前没有实现迭代器或 C++20 的经验

This seems to work for me with gcc 10.1 and -std=c++2a .这似乎对我有用 gcc 10.1 和-std=c++2a I put this together in very short time without putting much thought into it, so the implementation can certainly be improved, if only by templatizing it.我在很短的时间内把它放在一起,没有花太多心思,所以实现肯定可以改进,如果只是通过模板化它。 If operator<=> is exchanged for the old two-way comparison operators, this should also run with C++17, but I haven't tested it.如果将operator<=>替换为旧的双向比较运算符,这也应该与 C++17 一起运行,但我还没有测试过。 If you find any errors or easily correctable design flaws, you are welcome to comment them below, such that this answer can be improved.如果您发现任何错误或易于纠正的设计缺陷,欢迎您在下面发表评论,以便改进此答案。

#include <cstddef>

#if __cplusplus > 201703L
#include <compare>
#endif

#include <execution>
#include <iostream>
#include <iterator>
#include <numeric>

class counting_iterator {
public:
  typedef std::ptrdiff_t difference_type;
  typedef std::ptrdiff_t value_type;
  typedef void pointer;
  typedef void reference;
  typedef std::random_access_iterator_tag iterator_category;

private:
  value_type val_{0};

public:
  counting_iterator() = default;
  explicit counting_iterator(value_type init) noexcept : val_{init} {}
  value_type operator*() const noexcept { return val_; }
  value_type operator[](difference_type index) const noexcept {
    return val_ + index;
  }
  counting_iterator &operator++() noexcept {
    ++val_;
    return *this;
  }
  counting_iterator operator++(int) noexcept {
    counting_iterator res{*this};
    ++(*this);
    return res;
  }
  counting_iterator &operator--() noexcept {
    --val_;
    return *this;
  }
  counting_iterator operator--(int) noexcept {
    counting_iterator res{*this};
    --(*this);
    return res;
  }
  friend counting_iterator operator+(counting_iterator const &it,
                                     difference_type const &offset) noexcept;
  friend counting_iterator operator+(difference_type const &offset,
                                     counting_iterator const &it) noexcept;
  friend counting_iterator operator-(counting_iterator const &it,
                                     difference_type const &offset) noexcept;
  friend difference_type operator-(counting_iterator const &a,
                                   counting_iterator const &b) noexcept;
  counting_iterator &operator+=(difference_type offset) noexcept {
    val_ += offset;
    return *this;
  }
  counting_iterator &operator-=(difference_type offset) noexcept {
    val_ -= offset;
    return *this;
  }
  friend bool operator==(counting_iterator const &a,
                         counting_iterator const &b) noexcept;
#if __cplusplus > 201703L
  friend std::strong_ordering operator<=>(counting_iterator const &a,
                                          counting_iterator const &b);
#else
  friend bool operator!=(counting_iterator const &a,
                         counting_iterator const &b) noexcept;
  friend bool operator<=(counting_iterator const &a,
                         counting_iterator const &b) noexcept;
  friend bool operator>=(counting_iterator const &a,
                         counting_iterator const &b) noexcept;
  friend bool operator<(counting_iterator const &a,
                        counting_iterator const &b) noexcept;
  friend bool operator>(counting_iterator const &a,
                        counting_iterator const &b) noexcept;
#endif
};

counting_iterator
operator+(counting_iterator const &it,
          counting_iterator::difference_type const &offset) noexcept {
  return counting_iterator{it.val_ + offset};
}
counting_iterator operator+(counting_iterator::difference_type const &offset,
                            counting_iterator const &it) noexcept {
  return counting_iterator{it.val_ + offset};
}
counting_iterator
operator-(counting_iterator const &it,
          counting_iterator::difference_type const &offset) noexcept {
  return counting_iterator{it.val_ - offset};
}
counting_iterator::difference_type
operator-(counting_iterator const &a, counting_iterator const &b) noexcept {
  return a.val_ - b.val_;
}
bool operator==(counting_iterator const &a,
                counting_iterator const &b) noexcept {
  return a.val_ == b.val_;
}
#if __cplusplus > 201703L
std::strong_ordering operator<=>(counting_iterator const &a,
                                 counting_iterator const &b) {
  return a.val_ <=> b.val_;
}
#else
bool operator!=(counting_iterator const &a,
                counting_iterator const &b) noexcept {
  return a.val_ != b.val_;
}
bool operator<=(counting_iterator const &a,
                counting_iterator const &b) noexcept {
  return a.val_ <= b.val_;
}
bool operator>=(counting_iterator const &a,
                counting_iterator const &b) noexcept {
  return a.val_ >= b.val_;
}
bool operator<(counting_iterator const &a,
               counting_iterator const &b) noexcept {
  return a.val_ < b.val_;
}
bool operator>(counting_iterator const &a,
               counting_iterator const &b) noexcept {
  return a.val_ > b.val_;
}
#endif

int main() {
    auto res = std::transform_reduce(
                std::execution::par, 
                counting_iterator(0), counting_iterator(10), 
                0L, 
                std::plus<>(), 
                [](const std::ptrdiff_t& i) { return i * i; });

    std::cout << res << std::endl;
}

EDIT: I worked over the class to make it usable with C++17 as well.编辑:我在 class 上工作,使其也可以与 C++17 一起使用。 Now it also explicitly typedefs the std::random_access_iterator_tag .现在它还显式地定义了std::random_access_iterator_tag I still don't get any parallel computing with that execution policy, neither with the iterator nor with the vector, so I don't know if there is anything about the class itself inhibiting parallel execution.我仍然没有使用该执行策略进行任何并行计算,无论是使用迭代器还是使用向量,所以我不知道 class 本身是否会抑制并行执行。

After some massaging and experiments I am confirming that bidirectional iterator, based on sample from Paul above, had worked:经过一些按摩和实验,我确认双向迭代器,基于上面 Paul 的示例,已经工作:

class counting_iterator {
public:
    using iterator_category = std::bidirectional_iterator_tag;
    using difference_type = std::ptrdiff_t;
    using value_type = std::ptrdiff_t;
private:
    value_type val_;
public:
    counting_iterator() : val_(0) {}
    explicit counting_iterator(value_type init) : val_(init) {}

    value_type operator*() noexcept { return val_; }
    const value_type& operator*() const noexcept { return val_; }

    counting_iterator& operator++() noexcept { ++val_; return *this; }
    counting_iterator operator++(int) noexcept { counting_iterator res{ *this }; ++(*this); return res; }

    counting_iterator& operator--() noexcept { --val_; return *this; }
    counting_iterator operator--(int) noexcept { counting_iterator res{ *this }; --(*this); return res; }

    value_type operator[](difference_type index) noexcept { return val_ + index; }

    counting_iterator& operator+=(difference_type offset) noexcept { val_ += offset; return *this; }
    counting_iterator& operator-=(difference_type offset) noexcept { val_ -= offset; return *this; }

    counting_iterator operator+(difference_type offset) const noexcept { return counting_iterator{ *this } += offset; };
    /*counting_iterator& operator+(difference_type offset) noexcept { return operator+=(offset); }*/

    counting_iterator operator-(difference_type offset) const noexcept { return counting_iterator{ *this } -= offset; };

    /*counting_iterator& operator-(difference_type offset) noexcept { return operator-=(offset); }*/

    difference_type operator-(counting_iterator const& other) noexcept { return val_ - other.val_; }

    bool operator<(counting_iterator const& b) const noexcept { return val_ < b.val_; }
    bool operator==(counting_iterator const& b) const noexcept { return val_ == b.val_; }
    bool operator!=(counting_iterator const& b) const noexcept { return !operator==(b); }

    std::strong_ordering operator<=>(counting_iterator const& b) const noexcept { return val_ <=> b.val_; }
};

I could not make it work though in parallel std::transform_reduce with iterator_category = std::random_access_iterator_tag , and that I believe is the reason for the performance drop.尽管std::transform_reduceiterator_category = std::random_access_iterator_tag并行工作,但我无法使其工作,我相信这是性能下降的原因。

UPD: In the code above commented lines made MS compiler choosing them instead of copy version alternative and that made a havoc during parallel execution if iterator was marked as random_access_category_tag. UPD:在上面的代码中,注释行使 MS 编译器选择它们而不是复制版本替代方案,如果迭代器被标记为 random_access_category_tag,则会在并行执行期间造成严重破坏。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM