[英]What is the relation between merges and number of items in a in k-way merge
The question is: In a k-way merge, how many merge operation will we perform.问题是:在一次 k 路合并中,我们将执行多少次合并操作。 For example: 2-way merge:2 nodes 1 merge;
例如:2路合并:2节点1合并; 3 nodes 2 merge;
3节点2合并; 4 nodes 3 merge.
4 个节点 3 个合并。 So we get M(n)=n-1.
所以我们得到 M(n)=n-1。
What the the M(n) when k is arbitrary?当 k 是任意的时,M(n) 是什么?
2-way merges are most efficient when merging equal-sized blocks, so the most efficient k -way merge based on 2-way merges is to first merge block 1 with block 2, block 3 with block 4, and so on, then merge the first two resulting blocks, and so on. 2-way merge 在合并大小相等的块时效率最高,因此基于 2-way 合并的最有效的k -way 合并是先将块 1 与块 2 合并,将块 3 与块 4 合并,以此类推,然后合并前两个结果块,依此类推。 This is basically how mergesort works, and results in O( kn log k ) time, assuming each of the k blocks contains n items.
这基本上是归并排序的工作原理,并导致 O( kn log k ) 时间,假设k个块中的每一个包含n 个项目。 But it's only perfectly efficient if all blocks have exactly n items, and k is a power of 2, so...
但只有当所有块都恰好有n 个项目并且k是 2 的幂时,它才是完全有效的,所以......
Instead of performing k separate merge passes, you can use a single pass that uses a heap containing the first item of each block (ie k items in total):您可以使用包含每个块的第一项(即总共k项)的堆,而不是执行k个单独的合并通道:
If there are a total of kn items, this always takes O( kn log k ) time regardless of how they are distributed amongst blocks, and regardless of whether k is a power of 2. Your heap needs to contain (item, block_index)
pairs so that you can identify which block each item comes from.如果总共有kn个项目,这总是需要 O( kn log k ) 时间,无论它们如何在块之间分布,也不管k是否是 2 的幂。你的堆需要包含
(item, block_index)
对这样您就可以识别每个项目来自哪个块。
OK, to answer the original question as stated:好的,按照说明回答原始问题:
To merge k blocks using a sequence of 2-way merges always requires exactly k - 1 merges, since regardless of what pair of blocks you choose to merge at any point in time, merging them reduces the total number of blocks by 1.要使用一系列 2 路合并来合并k个块,总是需要恰好k - 1 次合并,因为无论您在任何时间点选择合并哪对块,合并它们都会将块的总数减少 1。
As I said in my original answer, which pairs of blocks you choose to merge does impact the overall time complexity -- it's better to merge similar-sized blocks -- but it doesn't affect the number of 2-way merge operations.正如我在原始答案中所说,您选择合并的哪对块确实会影响整体时间复杂度——最好合并大小相似的块——但它不会影响 2 路合并操作的数量。
yes, The heap way may be more effective.是的,堆的方式可能更有效。 But what's the answer to the orginal question?
但是原始问题的答案是什么? I found there may be no answer about that since it is maybe not a full k-way tree, so 4-way could regress to 3-way, 2-way.
我发现可能没有答案,因为它可能不是一个完整的 k 路树,所以 4 路可能会回归到 3 路、2 路。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.