简体   繁体   中英

Adding sum of frequencies whille solving Optimal Binary search tree

I am referring to THIS problem and solution.

Firstly, I did not get why sum of frequencies is added in the recursive equation. 在此处输入图片说明 Can someone please help understand that with an example may be.

In Author's word.

We add sum of frequencies from i to j (see first term in the above formula), this is added because every search will go through root and one comparison will be done for every search.

In code, sum of frequencies (purpose of which I do not understand) ... corresponds to fsum.

int optCost(int freq[], int i, int j)
{
   // Base cases
   if (j < i)      // If there are no elements in this subarray
     return 0;
   if (j == i)     // If there is one element in this subarray
     return freq[i];

   // Get sum of freq[i], freq[i+1], ... freq[j]
   int fsum = sum(freq, i, j);

   // Initialize minimum value
   int min = INT_MAX;

   // One by one consider all elements as root and recursively find cost
   // of the BST, compare the cost with min and update min if needed
   for (int r = i; r <= j; ++r)
   {
       int cost = optCost(freq, i, r-1) + optCost(freq, r+1, j);
       if (cost < min)
          min = cost;
   }

   // Return minimum value
   return min + fsum;
}

Secondly, this solution will just return the optimal cost. Any suggestions regarding how to get the actual bst ?

Why we need sum of frequencies

The idea behind sum of frequencies is to correctly calculate cost of particular tree. It behaves like accumulator value to store tree weight.

Imagine that on first level of recursion we start with all keys located on first level of the tree (we haven't picked any root element yet). Remember the weight function - it sums over all node weights multiplied by node level. For now weight of our tree equals to sum of weights of all keys because any of our keys can be located on any level (starting from first) and anyway we will have at least one weight for each key in our result.

1) Suppose that we found optimal root key, say key r . Next we move all our keys except r one level down because each of the elements left can be located at most on second level (first level is already occupied). Because of that we add weight of each key left to our sum because anyway for all of them we will have at least double weight. Keys left we split in two sub arrays according to r element(to the left from r and to the right) which we selected before.

2) Next step is to select optimal keys for second level, one from each of two sub arrays left from first step. After doing that we again move all keys left one level down and add their weights to the sum because they will be located at least on third level so we will have at least triple weight for each of them.

3) And so on.

I hope this explanation will give you some understanding of why we need this sum of frequencies.

Finding optimal bst

As author mentioned at the end of the article

2) In the above solutions, we have computed optimal cost only. The solutions can be easily modified to store the structure of BSTs also. We can create another auxiliary array of size n to store the structure of tree. All we need to do is, store the chosen 'r' in the innermost loop.

We can do just that. Below you will find my implementation.

Some notes about it:

1) I was forced to replace int[n][n] with utility class Matrix because I used Visual C++ and it does not support non-compile time constant expression as array size.

2) I used second implementation of the algorithm from article which you provided (with memorization) because it is much easier to add functionality to store optimal bst to it.

3) Author has mistake in his code: Second loop for (int i=0; i<=n-L+1; i++) should have nL as upper bound not n-L+1 .

4) The way we store optimal bst is as follows: For each pair i, j we store optimal key index. This is the same as for optimal cost but instead of storing optimal cost we store optimal key index. For example for 0, n-1 we will have index of the root key r of our result tree. Next we split our array in two according to root element index r and get their optimal key indexes. We can dot that by accessing matrix elements 0, r-1 and r+1, n-1 . And so forth. Utility function 'PrintResultTree' uses this approach and prints result tree in in-order (left subtree, node, right subtree). So you basically get ordered list because it is binary search tree.

5) Please don't flame me for my code - I'm not really a c++ programmer. :)

int optimalSearchTree(int keys[], int freq[], int n, Matrix& optimalKeyIndexes)
{
    /* Create an auxiliary 2D matrix to store results of subproblems */
    Matrix cost(n,n);
    optimalKeyIndexes = Matrix(n, n);
    /* cost[i][j] = Optimal cost of binary search tree that can be
    formed from keys[i] to keys[j].
    cost[0][n-1] will store the resultant cost */

    // For a single key, cost is equal to frequency of the key
    for (int i = 0; i < n; i++)
        cost.SetCell(i, i, freq[i]);

    // Now we need to consider chains of length 2, 3, ... .
    // L is chain length.
    for (int L = 2; L <= n; L++)
    {
        // i is row number in cost[][]
        for (int i = 0; i <= n - L; i++)
        {
            // Get column number j from row number i and chain length L
            int j = i + L - 1;
            cost.SetCell(i, j, INT_MAX);

            // Try making all keys in interval keys[i..j] as root
            for (int r = i; r <= j; r++)
            {
                // c = cost when keys[r] becomes root of this subtree
                int c = ((r > i) ? cost.GetCell(i, r - 1) : 0) +
                    ((r < j) ? cost.GetCell(r + 1, j) : 0) +
                    sum(freq, i, j);
                if (c < cost.GetCell(i, j))
                {
                    cost.SetCell(i, j, c);
                    optimalKeyIndexes.SetCell(i, j, r);
                }
            }
        }
    }
    return cost.GetCell(0, n - 1);
}

Below is utility class Matrix :

class Matrix
{
private:
    int rowCount;
    int columnCount;
    std::vector<int> cells;
public:
    Matrix()
    {

    }
    Matrix(int rows, int columns)
    {
        rowCount = rows;
        columnCount = columns;
        cells = std::vector<int>(rows * columns);
    }

    int GetCell(int rowNum, int columnNum)
    {
        return cells[columnNum + rowNum * columnCount];
    }

    void SetCell(int rowNum, int columnNum, int value)
    {
        cells[columnNum + rowNum * columnCount] = value;
    }
};

And main method with utility function to print result tree in in-order:

//Print result tree in in-order
void PrintResultTree(
    Matrix& optimalKeyIndexes,
    int startIndex,
    int endIndex,
    int* keys)
{
    if (startIndex == endIndex)
    {
        printf("%d\n", keys[startIndex]);
        return;
    }
    else if (startIndex > endIndex)
    {
        return;
    }

    int currentOptimalKeyIndex = optimalKeyIndexes.GetCell(startIndex, endIndex);
    PrintResultTree(optimalKeyIndexes, startIndex, currentOptimalKeyIndex - 1, keys);
    printf("%d\n", keys[currentOptimalKeyIndex]);
    PrintResultTree(optimalKeyIndexes, currentOptimalKeyIndex + 1, endIndex, keys);

}
int main(int argc, char* argv[])
{
    int keys[] = { 10, 12, 20 };
    int freq[] = { 34, 8, 50 };

    int n = sizeof(keys) / sizeof(keys[0]);
    Matrix optimalKeyIndexes;
    printf("Cost of Optimal BST is %d \n", optimalSearchTree(keys, freq, n, optimalKeyIndexes));
    PrintResultTree(optimalKeyIndexes, 0, n - 1, keys);

    return 0;
}

EDIT:

Below you can find code to create simple tree like structure.

Here is utility TreeNode class

struct TreeNode
{
public:
    int Key;
    TreeNode* Left;
    TreeNode* Right;
};

Updated main function with BuildResultTree function

void BuildResultTree(Matrix& optimalKeyIndexes,
    int startIndex,
    int endIndex,
    int* keys,
    TreeNode*& tree)
{

    if (startIndex > endIndex)
    {
        return;
    }

    tree = new TreeNode();
    tree->Left = NULL;
    tree->Right = NULL;
    if (startIndex == endIndex)
    {
        tree->Key = keys[startIndex];
        return;
    }

    int currentOptimalKeyIndex = optimalKeyIndexes.GetCell(startIndex, endIndex);
    tree->Key = keys[currentOptimalKeyIndex];
    BuildResultTree(optimalKeyIndexes, startIndex, currentOptimalKeyIndex - 1, keys, tree->Left);
    BuildResultTree(optimalKeyIndexes, currentOptimalKeyIndex + 1, endIndex, keys, tree->Right);
}

int main(int argc, char* argv[])
{
    int keys[] = { 10, 12, 20 };
    int freq[] = { 34, 8, 50 };

    int n = sizeof(keys) / sizeof(keys[0]);
    Matrix optimalKeyIndexes;
    printf("Cost of Optimal BST is %d \n", optimalSearchTree(keys, freq, n, optimalKeyIndexes));
    PrintResultTree(optimalKeyIndexes, 0, n - 1, keys);
    TreeNode* tree = new TreeNode();
    BuildResultTree(optimalKeyIndexes, 0, n - 1, keys, tree);
    return 0;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM