简体   繁体   English

是什么让桶排序好?

[英]What make Bucket Sort good?

So I stumbled about non-comparison sorting based algorithms bucket sort to be exact and I couldn't exactly get why it is good.所以我偶然发现了基于非比较排序的算法,确切地说是桶排序,我无法完全理解它为什么好。

I've a thought but I need somebody to confirm it.我有一个想法,但我需要有人确认。

Let's assume I want to sort a 1000 element array.If it were uniformly distributed and bucketed into 10 buckets where each bucket had 100 elements.假设我想对一个 1000 个元素的数组进行排序。如果它是均匀分布的,并分桶成 10 个桶,每个桶有 100 个元素。

sorting 100 element 10 times using n log(n) algorithm = 10 * 100 log(100) = 1000 log(100) = 2000使用 n log(n) 算法对 100 个元素进行 10 次排序 = 10 * 100 log(100) = 1000 log(100) = 2000

while sorting 1000 elements using n log(n) algorithm = 1000 log(1000) = 3000使用 n log(n) 算法对 1000 个元素进行排序时 = 1000 log(1000) = 3000

So the algorithm makes use that if n = m + l then (m+l)^2 > m^2 + l^2 and same applies to n log(n) algorithms因此,该算法利用 if n = m + l then (m+l)^2 > m^2 + l^2 并且同样适用于 n log(n) 算法

so the more uniformly bucketed the data is the better the performance of the bucket sort所以数据分桶越均匀,桶排序的性能就越好

Is this right?这是正确的吗?

and what would the optimum number of buckets be?桶的最佳数量是多少? ( I feel it's a space-time trade off thing but also depending on uniformity of the data being sorted) (我觉得这是一个时空权衡的事情,但也取决于被排序数据的一致性)

But you have to take into account that the bucketing step has a complexity of 1000. This gives you:但是您必须考虑到分桶步骤的复杂度为 1000。这为您提供:

  • bucket sort: 1000 + 10 * 100 log(100) = 3000桶排序: 1000 + 10 * 100 log(100) = 3000
  • comparison sort: 1000 * log(1000) = 3000比较排序: 1000 * log(1000) = 3000

But you can reapply again the bucketing strategy to sort the smaller arrays.但是您可以再次应用分桶策略对较小的 arrays 进行排序。 This is https://en.wikipedia.org/wiki/Radix_sort .这是https://en.wikipedia.org/wiki/Radix_sort

The complexity advertised is O(nw) where w is the number of bits to represent an element.广告的复杂度是O(nw) ,其中w是表示元素的位数。 Linear?线性? Better than merge sort?比归并排序好吗? Wait a minute, how big is w usually?等一下, w通常有多大? Yeah right, for usual sets of stuff, you have to use log(n) bits to represent elements, so back to n log(n) .是的,对于通常的东西,你必须使用log(n)位来表示元素,所以回到n log(n)

As you said this is a time/memory trade of though, and Radix sort is when you have a fixed memory budget (but who doesn't?).正如您所说,这是时间/内存交易,而基数排序是当您有固定的 memory 预算时(但谁没有?)。 If you can grow your memory linearly with the input size, take n buckets and you have a O(n) sort.如果您可以随输入大小线性增长 memory,则取n存储桶,您就有一个O(n)排序。

An example reference (there are many:): https://www.radford.edu/nokie/classes/360/Linear.Sorts.html .一个示例参考(有很多:): https://www.radford.edu/nokie/classes/360/Linear.Sorts.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM