简体   繁体   English

k路合并中合并与项目数之间的关系是什么

[英]What is the relation between merges and number of items in a in k-way merge

The question is: In a k-way merge, how many merge operation will we perform.问题是:在一次 k 路合并中,我们将执行多少次合并操作。 For example: 2-way merge:2 nodes 1 merge;例如:2路合并:2节点1合并; 3 nodes 2 merge; 3节点2合并; 4 nodes 3 merge. 4 个节点 3 个合并。 So we get M(n)=n-1.所以我们得到 M(n)=n-1。

What the the M(n) when k is arbitrary?当 k 是任意的时,M(n) 是什么?

2-way merges are most efficient when merging equal-sized blocks, so the most efficient k -way merge based on 2-way merges is to first merge block 1 with block 2, block 3 with block 4, and so on, then merge the first two resulting blocks, and so on. 2-way merge 在合并大小相等的块时效率最高,因此基于 2-way 合并的最有效的k -way 合并是先将块 1 与块 2 合并,将块 3 与块 4 合并,以此类推,然后合并前两个结果块,依此类推。 This is basically how mergesort works, and results in O( kn log k ) time, assuming each of the k blocks contains n items.这基本上是归并排序的工作原理,并导致 O( kn log k ) 时间,假设k个块中的每一个包含n 个项目。 But it's only perfectly efficient if all blocks have exactly n items, and k is a power of 2, so...但只有当所有块都恰好有n 个项目并且k是 2 的幂时,它才是完全有效的,所以......

Instead of performing k separate merge passes, you can use a single pass that uses a heap containing the first item of each block (ie k items in total):您可以使用包含每个块的第一项(即总共k项)的堆,而不是执行k个单独的合并通道:

  1. Read the lowest item from the heap (O(log k ) time)从堆中读取最低项(O(log k ) 时间)
  2. Write it out把它写出来
  3. Remove it from the heap从堆中删除它
  4. If the block that that item came from is not yet exhausted, place the next item from it into the heap (O(log k ) time again).如果该项目来自的块尚未耗尽,则将其下一个项目放入堆中(再次O(log k )一次)。
  5. Repeat until the heap is empty.重复直到堆为空。

If there are a total of kn items, this always takes O( kn log k ) time regardless of how they are distributed amongst blocks, and regardless of whether k is a power of 2. Your heap needs to contain (item, block_index) pairs so that you can identify which block each item comes from.如果总共有kn个项目,这总是需要 O( kn log k ) 时间,无论它们如何在块之间分布,也不管k是否是 2 的幂。你的堆需要包含(item, block_index)对这样您就可以识别每个项目来自哪个块。

OK, to answer the original question as stated:好的,按照说明回答原始问题:

To merge k blocks using a sequence of 2-way merges always requires exactly k - 1 merges, since regardless of what pair of blocks you choose to merge at any point in time, merging them reduces the total number of blocks by 1.要使用一系列 2 路合并来合并k个块,总是需要恰好k - 1 次合并,因为无论您在任何时间点选择合并哪对块,合并它们都会将块的总数减少 1。

As I said in my original answer, which pairs of blocks you choose to merge does impact the overall time complexity -- it's better to merge similar-sized blocks -- but it doesn't affect the number of 2-way merge operations.正如我在原始答案中所说,您选择合并的哪对块确实会影响整体时间复杂度——最好合并大小相似的块——但它不会影响 2 路合并操作的数量

yes, The heap way may be more effective.是的,堆的方式可能更有效。 But what's the answer to the orginal question?但是原始问题的答案是什么? I found there may be no answer about that since it is maybe not a full k-way tree, so 4-way could regress to 3-way, 2-way.我发现可能没有答案,因为它可能不是一个完整的 k 路树,所以 4 路可能会回归到 3 路、2 路。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM