简体繁体 English

是什么让桶排序好？

[英]What make Bucket Sort good?

原文 2020-05-11 07:04:13 6 1 algorithm/ sorting/ bucket-sort

So I stumbled about non-comparison sorting based algorithms bucket sort to be exact and I couldn't exactly get why it is good.所以我偶然发现了基于非比较排序的算法，确切地说是桶排序，我无法完全理解它为什么好。

I've a thought but I need somebody to confirm it.我有一个想法，但我需要有人确认。

Let's assume I want to sort a 1000 element array.If it were uniformly distributed and bucketed into 10 buckets where each bucket had 100 elements.假设我想对一个 1000 个元素的数组进行排序。如果它是均匀分布的，并分桶成 10 个桶，每个桶有 100 个元素。

sorting 100 element 10 times using n log(n) algorithm = 10 * 100 log(100) = 1000 log(100) = 2000使用 n log(n) 算法对 100 个元素进行 10 次排序 = 10 * 100 log(100) = 1000 log(100) = 2000

while sorting 1000 elements using n log(n) algorithm = 1000 log(1000) = 3000使用 n log(n) 算法对 1000 个元素进行排序时 = 1000 log(1000) = 3000

So the algorithm makes use that if n = m + l then (m+l)^2 > m^2 + l^2 and same applies to n log(n) algorithms因此，该算法利用 if n = m + l then (m+l)^2 > m^2 + l^2 并且同样适用于 n log(n) 算法

so the more uniformly bucketed the data is the better the performance of the bucket sort所以数据分桶越均匀，桶排序的性能就越好

Is this right?这是正确的吗？

and what would the optimum number of buckets be?桶的最佳数量是多少？ ( I feel it's a space-time trade off thing but also depending on uniformity of the data being sorted) （我觉得这是一个时空权衡的事情，但也取决于被排序数据的一致性）

1 个解决方案

But you have to take into account that the bucketing step has a complexity of 1000. This gives you:但是您必须考虑到分桶步骤的复杂度为 1000。这为您提供：

bucket sort: 1000 + 10 * 100 log(100) = 3000桶排序： 1000 + 10 * 100 log(100) = 3000
comparison sort: 1000 * log(1000) = 3000比较排序： 1000 * log(1000) = 3000

But you can reapply again the bucketing strategy to sort the smaller arrays.但是您可以再次应用分桶策略对较小的 arrays 进行排序。 This is https://en.wikipedia.org/wiki/Radix_sort .这是https://en.wikipedia.org/wiki/Radix_sort 。

The complexity advertised is O(nw) where w is the number of bits to represent an element.广告的复杂度是O(nw) ，其中w是表示元素的位数。 Linear?线性？ Better than merge sort?比归并排序好吗？ Wait a minute, how big is w usually?等一下， w通常有多大？ Yeah right, for usual sets of stuff, you have to use log(n) bits to represent elements, so back to n log(n) .是的，对于通常的东西，你必须使用log(n)位来表示元素，所以回到n log(n) 。

As you said this is a time/memory trade of though, and Radix sort is when you have a fixed memory budget (but who doesn't?).正如您所说，这是时间/内存交易，而基数排序是当您有固定的 memory 预算时（但谁没有？）。 If you can grow your memory linearly with the input size, take n buckets and you have a O(n) sort.如果您可以随输入大小线性增长 memory，则取n存储桶，您就有一个O(n)排序。

An example reference (there are many:): https://www.radford.edu/nokie/classes/360/Linear.Sorts.html .一个示例参考（有很多：）： https://www.radford.edu/nokie/classes/360/Linear.Sorts.html 。