简体   繁体   中英

Binary search tree character implementation C++

I'm trying to implement a character BST. I can't wrap my head around the logic of inserting characters. So let's say this is in main insert("a"); insert("b"); insert("c"); insert("d"); When will the a letter ever be less than a? So would my tree basically be all on the right side?

      a
        \
          b
            \
              c
                \
                  d

Like that? Since there wouldn't ever be a letter that is less than a. But this feels wrong, but I'm not sure what I'm missing.

My insert function:

void insert(char letter, node * curr)
{
   int count{0};                                                                                                                                                                                                                                                                                             
   if ( strcmp(curr->getLetter(), letter) == 0)                                                                                                                 
       return 0;           
                                                                                                                                                                                                                                                                                  
   if (!curr->getRight() && strcmp(curr->getLetter(), letter) > 0)                                                                                          
   {                                                                                                                                                        
       curr->getRight() = new node(letter);                                                                                                                   
       return 1;                                                                                                                                        
   }                                                                                                                                                                                                                                                                                                         
   if (!curr->getLeft() && strcmp(curr->getLetter(), letter) < 0)                                                                                           
   {                                                                                                                                                        
      curr->getLeft() = new node(letter);                                                                                                                    
      return 1;                                                                                                                                        
   }                                                                                                                                                                                                                                                                                                         
   if (strcmp(letter, curr->getLetter() ) < 0)                                                                                                                  
      count += insert(letter, curr->getLeft() );                                                                                                                                                                                                                                                              
   else                                                                                                                                                     
      count += insert(letter, curr->getRight() );                                                                                                                                                                                                                                                             
      
   return count;
}

The odd looking generated "tree" may feel wrong, but it isn't. What you're seeing is a degenerate binary tree, specifically one that has devolved to a linked list. That is the penalty of insertion into a ordered binary tree without rebalancing. It is also the main reason why balancing algorithms and self-balancing tree's exist. Without them, degenerate conditions can surface which eliminate all splendor of having a binary tree in the first place: fast O(logN) search, insert, and delete asymptotic complexity.

For example, your original insertion order generated this:


a
  \
    b
      \
        c
          \
            d

Obviously this is devolved to a linked list.

But now, let's try this again, only this time, after each insertion, you checked to ensure no node had children more misbalanced than a level of +1 (or -1) level. This is part of keeping an AVL tree balanced (but by no means all parts; it is much more complicated than you're about to see). I chose AVL to demonstrate this simply because it is so easy to understand. With that...

Insert a

a

The first node a is obviously trivial. It has no children, so a L/R balance is 0/0 and we move on.

With b insertion, we would have:

a
  \
    b

b is 0/0, and a node is 0/1, which is ok. Now insert c

a
  \
    b
      \
        c

At this point b has a L/R balance of 0/1, but a has a balance of 0/2. It is at this point a rebalance would happen. For this trivial condition a left-rotation about a would transpire. It is done by

  1. Detaching a 's right child b .
  2. Setting a 's right child to it's former right child b 's left child (of which there is none, so... simple enough.
  3. Setting b 's left child to be a
  4. Set whomever was pointing to a to now point to b . In this case that would be whatever was the 'root' pointer of the tree. Ie now b is the root.

The result looks like this:

   b
  / \
 a   c

Terrific. This is awesome. a is 0/0, b is 1/1, and c is 0/0

Insert d ,

   b
  / \
 a   c
      \
       d

The tree is still "balanced", in that no node has a child depth heavier than +1 over its other child. Specifically,

  • a is 0/0
  • b is 1/2
  • c is 0/1
  • d is 0/0

Whoopee.

Continuing this example, let us see what happens when we insert e . Before rebalancing, it would look like this:

   b
  / \
 a   c
      \
       d
        \
         e

Now, working up from where we just hung that e node:

  • e is 0/0 (obviously)
  • d is 0/1
  • c is 0/2 <=== imbalance detected

Stopping there for a moment, we see we should do left rotation about c like we did earlier about a :

   b
  / \
 a   d
    / \
   c   e

Now our node balances are:

  • a is 0/0
  • b is 1/2
  • c is 0/0
  • d is 1/1
  • e is 0/0

No node has a child-balance off by more than 1 in either direction of the other child. the tree is once-again balanced.


As a final example, we take the tree we had before, and this time hang one more node: f .

   b
  / \
 a   d
    / \
   c   e
        \
         f

Working our way up from f we see that:

  • f is 0/0
  • e is 0/1
  • d is 1/2
  • b is 1/3 <=== imbalance detected

Whoa. Following the rotation mechanics we'v done several times before we do it again:

  1. Detaching b 's right child d .
  2. Setting b 's right child to its former right child d 's left child c .
  3. Setting d 's left child to be b
  4. Set whomever was pointing to b to now point to d . In this case that would be whatever was the 'root' pointer of the tree. Ie now d is the root.

The result looks like this:

    d
   / \
  b   e
 / \   \  
a   c   f

The final balance factors are:

  • a is 0/0
  • b is 1/1
  • c is 0/0
  • d is 2/2
  • e is 0/1
  • f is 0/0

There are books written on different tree balancing mechanics. The example I've shown above somewhat trivializes the matter, but it will be important as you continuing learning about trees, and the manners available to keep them efficient. A common self-balancing technique used by ordered standard containers ( std::set , std::map , etc.) are red-black trees . The management is considerably different than what I've shown above, but the premise remains the same. Keep any single tree depth (be it the whole tree, or any sub-tree within the whole tree) to be as close to logN order, where N is the number of nodes contained in that tree. Maintaining that is what provides the efficiencies trees can offer.

Understand that the same data can be stored in valid binary search trees in multiple ways.

The order you insert the data matters when constructing a binary search tree naively like this. What you have discovered is for your insertion algorithm ordered data gives you worst case behavior. Try a random order of the same letters and see what you get.

This is why actual implementations of binary search trees use more complex algorithms when you insert into them. Typical implementations of std::map<K,V> for example use " red-black trees " which are still just binary search trees but insertion and deletion is done in a manner such that the tree is guaranteed to not end up being too lopsided regardless of the insertion order. These type of data structures are called "self-balancing binary search trees".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM