简体   繁体   English

动态规划:最优二叉搜索树

[英]Dynamic Programming: Optimal Binary Search Tree

Allright, I'm hoping someone can explain this to me. 好吧,我希望有人可以向我解释一下。 I'm studying for finals and I can't quite figure something out. 我正在攻读决赛,我无法解决一些问题。

The problem is dynamic programming; 问题是动态编程; constructing an optimal binary search tree (OBST). 构造最优二叉搜索树(OBST)。 I understand dynamic programming in general and the concepts of this problem in particular, but I don't understand the recursive form of this problem. 我理解一般的动态编程和特别是这个问题的概念,但我不明白这个问题的递归形式。

I get that we're constructing optimal binary search trees for an increasing subset of these nodes and keeping the answers in a table as we go along to avoid recalculation. 我得到的是,我们正在为这些节点中不断增加的子集构建最佳二叉搜索树,并在我们继续时将答案保存在表中以避免重新计算。 I also get that when you root the tree at a_{k}, all of the successful nodes from a_{1} through a_{k-1} along with their corresponding fictitious unsuccessful nodes (ie the leaves of the tree) are in the left subtree, and then the ones in the right subtree are a_{k+1} through a_{n}. 当你在a_ {k}根树时,我也得到了这一点,所有来自a_ {1}到a_ {k-1}的成功节点以及它们对应的虚构不成功节点(即树的叶子)都在左子树,然后右子树中的子树是a_ {k + 1}到a_ {n}。

Here's the recursive form of the equation that I don't understand: 这是我不明白的等式的递归形式:

c(i, j) = min (i < k <= j) {c(i, k-1) + c(k, j) + p(k) + w(i, k-1) + w(k +j)} c(i,j)= min(i <k <= j){c(i,k-1)+ c(k,j)+ p(k)+ w(i,k-1)+ w(k + J)}

where w(i, j) = q(i) + sum from i+1 to j (q(l) + p(l)). 其中w(i,j)= q(i)+从i + 1到j的总和(q(1)+ p(1))。

So in c(i,j), from left to right, we have the cost of left subtree + cost of right subtree + probability of successful search for root + w(i, k-1) + w(k +j). 所以在c(i,j)中,从左到右,我们有左子树的成本+右子树的成本+成功搜索root + w(i,k-1)+ w(k + j)的概率。

My confusion is how c(i, k-1) differs from w(i, k-1). 我的困惑是c(i,k-1)与w(i,k-1)的区别。

The text is Computer Algorithms by Horowitz, Sahni, and Rajasekeran but I've also read CLRS on OBSTs and searched online, and nothing I've come across does a good job of explaining the difference between those parts of the equation. 文本是Horowitz,Sahni和Rajasekeran的计算机算法,但我也读过OBST上的CLRS并在网上搜索,我所遇到的任何内容都没有很好地解释这些部分之间的差异。

c(i,j) represents the expected cost of searching an optimal binary search tree containing the keys ki, ..., kj. c(i,j)表示搜索包含密钥ki,...,kj的最优二叉搜索树的预期成本。 w(i,j) represents the probability sum of the subtree containing the keys ki, ..., kj. w(i,j)表示包含密钥ki,...,kj的子树的概率和。 For the formula: 对于公式:

c(i, j) = min (i < k <= j) {c(i, k-1) + c(k, j) + p(k) + w(i, k-1) + w(k,j)}

c(i,k-1)+w(i,k-1) reresents the cost for the left subtree if we choose key k as the root. 如果我们选择密钥k作为根,则c(i,k-1)+ w(i,k-1)重新表示左子树的成本。 c(k,j)+w(k,j) represents the cost for the right subtree. c(k,j)+ w(k,j)表示右子树的成本。 p(k) represents the cost for the root k. p(k)表示根k的成本。

Notice that: If we choose key k as the root, then the left subtree contains the keys ki, ..., k(k-1) and the right subtree contains the kyes k(k+1), ..., kj. 请注意:如果我们选择键k作为根,则左子树包含键ki,...,k(k-1),右子树包含kyes k(k + 1),...,kj 。 But we can not simply say that: 但我们不能简单地说:

c(i,j)=min (i < k <= j) {c(i, k-1) + c(k, j) + p(k)}

Because when we choose the key k for the root, the generated subtrees has their depth added by 1. So c(i,k-1)+w(i,k-1) will be the right cost for the left subtree! 因为当我们为根选择密钥k时,生成的子树的深度加1.因此c(i,k-1)+ w(i,k-1)将是左子树的正确成本!

This is a subtle way of calculating frequency*depth for a node at a particular depth. 这是计算特定深度的节点的频率*深度的微妙方式。

Each time a node is evaluated as a root, while summing up its left (or right) subtree, you are adding sum of frequency to increase depth of all children. 每次将节点评估为根时,在总结其左(或右)子树时,您将添加频率总和以增加所有子节点的深度。

For example, assume nodes 'A','B' and 'C', where 'A' is root, 'B' is left child of 'A' and 'C' is left child of 'B'. 例如,假设节点'A','B'和'C',其中'A'是根,'B'是'A'的子节点而'C'是'B'的子节点。 (There are no right children to make things simple.) (没有合适的孩子可以让事情变得简单。)

In bottom up manner, with leaf 'C' as root: 以自下而上的方式,以叶'C'为根:

cost is Pr(C) = freqC*1  (no children)

with 'B' as root: 以'B'为根:

cost = Pr(B) + Cost[C,C] + sum of children freq 
     = freqB*1 + freqC*1 + freqC*1
     = freqB*1 + freqC*2 

where Pr(B) = freqB*1
     Cost[C,C] = freqC*1
     sum of children freq = freqC*1

And finally, with 'A' as root: 最后,以'A'为根:

cost = Pr(A) + Cost[C,B] + sum of children freq 
     = freqA*1 + freqB*1 + freqC*2 + freqB*1 + freqC*1
     = freqA*1 + freqB*2 + freqC*3

where Pr(A) = freqA*1
     Cost[C,B] = freqB*1 + freqC*2
     sum of children freq = freqB*1 + freqC*1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM