简体繁体 English

合并排序数组，最佳时间复杂度是多少？

[英]Merging sorted arrays, what is the optimum time complexity?

原文 2011-02-25 10:20:06 3 2 arrays/ algorithm/ sorting/ data-structures/ complexity-theory

I have m arrays, every array is of length n. 我有m个数组，每个数组的长度为n。 Each array is sorted. 每个数组都已排序。 I want to create a single array of length m*n, containing all the values of the previous arrays (including repeating values), sorted. 我想创建一个长度为m * n的单个数组，其中包含前面数组的所有值（包括重复值），已排序。 I have to merge these arrays.. 我必须合并这些数组..

I think the optimum time complexity is m*n*log(m) 我认为最佳时间复杂度是m * n * log（m）

Here's the sketch of the algorithm.. 这是算法的草图..

I create a support array H of lenth m, containing all the values of the first element of each array. 我创建了一个lenth m的支持数组H，它包含每个数组的第一个元素的所有值。

I then sort this array (m log m), and move the min value to the output array. 然后我对这个数组（m log m）进行排序，并将min值移动到输出数组。

I then replace the moved value with the next one, from the array it was taken. 然后我将移动的值替换为下一个移动的值。 Actually I don't replace it, but I insert it in the right (sorted) position. 实际上我不替换它，但我将其插入右侧（已排序）位置。 This take log m I think. 我认为这需要记录。

And I repeat this for all m*n values... therefore m*n*log m 我对所有m * n值重复这一点......因此m * n * log m

My question.. can you think of a more efficient algorithm? 我的问题..你能想到一个更有效的算法吗？ If mnlogm is actually optimum, can you at least think of a simpler, more elegant algorith? 如果mnlogm实际上是最佳的，你至少可以想到一个更简单，更优雅的算法吗？

2 个解决方案

The complexity is right! 复杂性是对的！ However, there's a small flaw in your algorithm idea: You cannot insert an item in a sorted array in log m . 但是，算法思想中存在一个小缺陷：您无法在log m中的已排序数组中插入项。 You can find its position using binary search in that complexity, but you might have to move elements around to actually place it there. 你可以在这种复杂性中使用二进制搜索找到它的位置，但是你可能必须移动元素以实际将它放在那里。 To fix this problem, you can use a heap data-structure instead! 要解决此问题，您可以使用堆数据结构！

Multi-way merge (which is the common name of your algorithm) is usually implemented with yet another 'merging' data-structure: the tournament-tree. 多路合并（这是您的算法的通用名称）通常使用另一个“合并”数据结构实现：锦标赛树。 You can find a description in Knuth's "The Art of Computer Programming" (Chapter on Sorting, iirc). 您可以在Knuth的“计算机编程艺术”（关于排序，iirc的章节）中找到描述。 It has a lower constant factor in theory and in practice when compared to heaps in this specific case. 与特定情况下的堆相比，它在理论上和实践中具有较低的常数因子。

If you want to look implementations, I'm pretty sure that the parallel multi-way merge in the GNU C++ Standard library parallel-extensions is implemented this way. 如果你想看实现，我很确定GNU C ++标准库并行扩展中的并行多路合并是以这种方式实现的。

Edit: I referenced the wrong book, which is fixed now. 编辑：我引用了错误的书，现在已修复。

Best you can do is O(m*n + d). 你能做的最好是O（m * n + d）。 Similar to counting sort: http://en.wikipedia.org/wiki/Counting_sort If you know the range of values possible (d, say) you can initialize an array of length d, and then scan through each of the m arrays adding 1 to each 'bin' in d for each value corresponding to that bin. 与计数排序类似： http ： //en.wikipedia.org/wiki/Counting_sort如果您知道可能的值范围（例如），您可以初始化长度为d的数组，然后扫描每个m数组对于与该箱相对应的每个值，在d中的每个“bin”为1。 Then in your new array of length m*n for each value in d you add however many counts that bin has. 然后在d中每个值的新长度为m * n的数组中添加bin所具有的多个计数。