如何优化 C++ 中的基数排序算法？

Question

I am working with this assignment of optimizing a radix sort code in C++ and I need to reduce the execution time, my code is working and it looks like this:我正在处理优化 C++ 中的基数排序代码的任务，我需要减少执行时间，我的代码正在运行，它看起来像这样：

void RadixSort::RadixSortNaive(std::vector<long> &Arr) {

long Max_Value = findMax(Arr);

    int Max_Radix = 1;
while (1) {
  if (Max_Radix >= Max_Value) break;
  Max_Radix = Max_Radix*radix_;
}

for (int i = 1; i < Max_Radix; i = i*radix_) {
  for (int j = 0; j < key_length_; j++) {
    int K;
    if (Arr[j] < i) K = 0;
    else K = (Arr[j] / i) % radix_;
    Q[K].push(Arr[j]);
  }

  int idx = 0;
  for (int j = 0; j < radix_; j++) {
    while (Q[j].empty() == 0) {
      Arr[idx] = Q[j].front();
      Q[j].pop();
      idx++;
    }
  }
}
class RadixSort{
public :

  void setConfig(int key_length, int radix) {
    key_length_ = key_length;
    radix_ = radix;
    for (int i = 0; i < radix_; i++) {
      Q.push_back(std::queue<long>());
    }
  }

  long findMax(std::vector<long> Arr) const {
    long Max = 0;
    for (int i = 0; i < key_length_; i++) {
      if (Max < Arr[i])
        Max = Arr[i];
    }
    return Max;
  }

  void RadixSortNaive(std::vector<long> &Arr);
  void RadixSortStudent(std::vector<long> &Arr);

private:
  int key_length_;
  int radix_;
  std::vector<std::queue<long>> Q;
};
}

However, I am sure that there is still room for improvement.但是，我确信仍有改进的余地。 I have been trying to implement parallelization with OMP library but nothings seems to work.我一直在尝试使用 OMP 库实现并行化，但似乎没有任何效果。 Is there any way where I can improve the previous code?有什么办法可以改进以前的代码吗？ Maybe improving the loops or any other code optimization technique.也许改进循环或任何其他代码优化技术。

Answer 1

As suggested in the comments, first thing is to get the API right.正如评论中所建议的，第一件事是让 API 正确。

findMax can be replaced by std::max_element( ) , which uses iterators, and doesn't make a copy of the input. findMax可以替换为std::max_element( ) ，它使用迭代器，并且不复制输入。

Other suspicious thing is Q[K].push(Arr[j]);其他可疑的事情是Q[K].push(Arr[j]); . . If memory so permits, at least reserve the maximum amount of elements in each queue -- otherwise the queues need to copy old data when resizing.如果 memory 允许，至少在每个队列中保留最大数量的元素——否则队列在调整大小时需要复制旧数据。

Then if possible, using raw pointers with no out of range check, you can push() and pop() with auto popped = *tail++ and *head++ = new_element;然后，如果可能，使用没有超出范围检查的原始指针，您可以push()和pop()与auto popped = *tail++和*head++ = new_element; My observation is that while STL is correctly implemented and is fast to develop with, the support of dynamic memory allocation in insertions practically always degrades performance compared to known static allocations.我的观察是，虽然 STL 已正确实现并且可以快速开发，但与已知的 ZA81259CEF8E959C624DF1D456E5D327 分配相比，插入中动态 memory 分配的支持实际上总是会降低性能。

Third thing is to specialise the radix for powers of two, since now the division is strength reduced to shift, and the modulus is strength reduced to logical and (by some constants, which need to be calculated).第三件事是将基数专门化为 2 的幂，因为现在除法是强度减少到移位，模数是强度减少到逻辑和（通过一些常数，需要计算）。

Especially when radix is a power of two, and possibly otherwise too, I guess it's not useful to calculate K==0 conditionally: if (Arr[j] < i) K = 0;特别是当 radix 是 2 的幂时，可能还有其他情况，我想有条件地计算K==0是没有用的： if (Arr[j] < i) K = 0; . .

如何优化 C++ 中的基数排序算法？

问题描述

1 个解决方案

解决方案1
0 2021-12-15 08:38:22

如何优化 C++ 中的基数排序算法？

问题描述

1 个解决方案

解决方案1 0 2021-12-15 08:38:22

解决方案1
0 2021-12-15 08:38:22