简体繁体 English

具有特定值的数组

[英]Array with specific values

原文 2016-04-06 06:57:14 6 3 arrays/ algorithm/ data-structures

Given an array the size of n where: 1/2 of the array is with a single (unknown) value. 给定一个n的大小的数组，其中：1/2的数组具有单个（未知）值。 1/4 of the array is with a single (unknown) different value. 阵列的1/4具有单个（未知）不同的值。 And so on for 1/8, 1/16, 1/32 Give an algorithm to sort the array. 等等为1 / 8,1 / 16,1 / 32给出一个算法来对数组进行排序。 You cannot use the find median algorithm 您不能使用查找中值算法

So what I figured is: There are only logn different values There is a simple solution using a binary heap on O ( n*loglogn) It looks like a question that needed to be solved in O (n) 所以我想的是：只有logn不同的值有一个简单的解决方案在O上使用二进制堆（n * loglogn）它看起来像是一个需要在O（n）中解决的问题

3 个解决方案

Here is one possible approach: 这是一种可能的方法：

scan the array and store element frequencies (there are log n distinct elements) in a hash table in amortized O(n) time; 扫描数组并在分摊的 O（n）时间内在哈希表中存储元素频率（有log n个不同的元素）; this is doable because we can do insertions in amortized O(1) time ; 这是可行的，因为我们可以在摊销的O（1）时间内进行插入 ;
now run a classic sorting algorithm on these log n elements: this is doable in deterministic O(log n log log n) time using, say, heap sort or merge sort; 现在在这些log n元素上运行经典的排序算法：这在确定性O（log n log log n）时间内是可行的，比如使用堆排序或合并排序;
now expand the sorted array---or create a new one and fill it using the sorted array and the hash table---using frequencies from the hash table; 现在展开已排序的数组---或创建一个新数组并使用排序数组和哈希表填充它 - 使用哈希表中的频率; this is doable in O(n) amortized time. 这在O（n）摊销时间是可行的。

The whole algorithm thus runs in amortized O(n) time, ie, it is dominated by eliminating duplicates and expanding the sorted array. 因此，整个算法以摊销的O（n）时间运行，即，它通过消除重复和扩展排序的阵列来主导。 The space complexity is O(n). 空间复杂度为O（n）。

This is essentially optimal because you need to "touch" all the elements to print the sorted array, which means we have a matching lower bound of Omega(n) on the running time. 这基本上是最佳的，因为您需要“触摸”所有元素以打印排序的数组，这意味着我们在运行时间上具有匹配的下限Omega（n）。

我们的想法是使用多数算法，该算法取O（n）然后发现什么是“半”值从数组中删除它然后再在新数组上再做n + n / 2 + n / 4 + n / 8 + ..... <2n => O（n）

Going over the array once, keep hash map for seen values. 遍历数组一次，保留哈希映射以查看值。 Like you said there are only log(n) different values. 就像你说的那样，只有log(n)不同的值。

Now you have list of all the different values - sorting them will take lon(n)*log(log(n)) 现在你有了所有不同值的列表 - 对它们进行排序将需要lon(n)*log(log(n))

Once you have the sorted uniq like it's easy to constract the original array : The max value will take n/2 cells , the 2nd take n/4 and so on. 一旦你有了排序的uniq，就像它很容易构建原始数组：最大值将需要n/2单元格，第二个需要n/4 ，依此类推。

The Total run time is O(n + lon(n)*log(log(n)) + n) which is O(n) 总运行时间为O(n + lon(n)*log(log(n)) + n) ，即O(n)