Why is my Binary Search Tree void remove function not working properly?

Question

I am having trouble getting my binary search tree remove function to work properly. It never actually wants to work properly no matter what I do. It always seems to do something the strangest things when I try to get it to function properly.

The node* struct is located in another header file as is the root_ as well (they are setup in the normal right, left, and data stored configuration)

void remove(int value){
   node* n = root_;
   node* nClone = nullptr; 

    while (n != nullptr) {//constant checker to ensure that a 
        if (value > n->value_) { // if value is larger than value stored within node n it will descend further down right
        
            nClone = n; //stores n before continueing
            n = n->rhs_;
        
        } else if (value < n->value_) { // if value is less than value stored within node n it will descend further down left
        
            nClone = n; //stores n before continueing
            n = n->lhs_;
        
        } else { //if it is equal to the value (there are no other possible outcomes so an else would work) check if there are any subsiquent leaves attached

            if (n->lhs_ == nullptr && n->rhs_ == nullptr) { //if both left and right are empty (I.E. no leaves attached to node) set n to nullptr and then delete using free();
            
                nClone->lhs_ = nullptr; //stores both left
                nClone->rhs_ = nullptr; // and right leaves as nullptr
                
                free(n); //frees n

                n = nullptr;

                count_--;//decreases count_/size counter by 1
                return; //exits from function as there is nothing more to do
            
            } else if (n->lhs_ == nullptr || n->rhs_ == nullptr) { //if n has one connection whether it be on the left or right it stores itself in nClone and then deletes n

                if (n->lhs_ != nullptr) { //if statement to check if left leaf of n exists
                
                    nClone->lhs_ = n->lhs_; //if it does it stores n's left leaf in nClone 
                                    
                } else { //if it doesnt have anything stored in the left then there garuntteed is one on the right

                    nClone->rhs_ = n->rhs_; //stores n's right leaf in nClone

                }
                
                free(n);
                count_--; //decreases count_/size counter by 1
                return; //exits from function as there is nothing more to do

            } else {
                //for preorder succession
                node* nSuc = n->rhs_; //stores right leaf of n in nSuc

                while (nSuc->lhs_ != nullptr) { //look for successor
                    nSuc = nSuc->lhs_;
                
                }

                n->value_ = nSuc->value_;
                free(n);
                count_--;
                return;

            
            }
                    
        }    

    }

}

Answer 1

Your code is written as if it was C. C++ is a different language, and you can and should leverage it to your advantage.

The following text is interspersed with a complete, compileable example that you can try online :) It is of course not the only way to implement a tree, and I had to gloss over various details that would be needed to make it at least a full-featured solution. A "real life" implementation may be much more complex, since the "textbook" approach usually doesn't mix well with cache hierarchies on modern CPUs, and a tree like this would be rather slow compared to state-of-the-art implementations. But it does help, I think, to bridge the gap between the pervasive "C-like" way of thinking about trees, and what the modern C++ brings.

Again: This example is minimal, it doesn't do many of the things that would be needed in common practice, but it at least points in the direction away from C, and towards C++, and that's what I intended.

First, let's have a Node type that uses owning pointers for the child nodes. These pointers automatically manage memory and essentially prevent you from making mistakes that would leak memory or allow use of dangling pointers. The term "owning pointer" means that there's always a well defined owner: it is the pointer itself. Those pointers cannot be, for example, copied - since then you'd have two owners for the same object, and for that you need shared ownership. But shared ownership is hard to get right, since there must be some protocols in place to ensure that you don't get cyclic references. When building a tree, the "parent" node is the natural owner of a "child" node, and thus the unique ownership is precisely what's needed.

// complete example begins
#include <cassert>
#include <memory>

using Value = int;

struct Node {
  std::unique_ptr<Node> lhs_;
  std::unique_ptr<Node> rhs_;
  Value value_;
  explicit Node(int value) : value_(value) {}
};
// cont'd.

We also should have a Tree that owns the root node, and keeps the node count:

// cont'd.
struct Tree {
  std::unique_ptr<Node> root_;
  int count_ = 0;
};
// cont'd.

When operating on such data structures, you frequently want to have access not only to the value of the node pointer, but also to the pointer itself so that you can modify it. So, we need some sort of a "node reference" that mostly behaves like Node* would, but which, internally, also carries the address of the pointer to the node, so that eg the node could be replace d:

// cont'd.
class NodeRef {
  std::unique_ptr<Node> *owner;
public:
  NodeRef() = delete;
  NodeRef(std::unique_ptr<Node> &o) : owner(&o) {}
  Node *get() const { return owner->get(); }
  // Use -> or * to access the underlying node
  Node *operator->() const { return get(); }
  Node &operator*() const { return *get(); }
  // In boolean contexts, it's true if the Node exists
  explicit operator bool() const { return bool(*owner); }
  // Replace the Node (if any) with some other one
  void replace(std::unique_ptr<Node> &&oldNode) {
    *owner = std::move(oldNode);
  }
  NodeRef &operator=(std::unique_ptr<Node> &val) {
    owner = &val;
    return *this;
  }
};
// cont'd.

NodeRef holds a pointer to the owner of the node (the owner is the owning pointer type std::unique_ptr ).

The following are the ways that you can use NodeRef as-if it was Node* :

NodeRef node = ...;
node->value_       // access to pointed-to Node using ->
(*node).value      // access to pointed-to Node using *
if (node) ...      // null check
node = otherNode;  // assignment from another node (whether owner or NodeRef)

And the following would be the way that NodeRef behaves similar to std::unique_ptr<Node> & , ie like a reference to the node owner, allowing you to alter the ownership:

Tree tree;
NodeRef root = tree.root_; // reference the root of the tree
root.replace(std::make_unique<Node>(2)); // replace the root with a new node

Note that this code performs all the necessary memory allocation and deallocation thanks to the power of std::unique_ptr and move semantics. There are no new , delete , malloc nor free statements anywhere. And, also, the performance is on par with manual allocations - this code does not use any sort of garbage collection or reference counting. std::unique_ptr is a tool that lets you leverage the compiler to write memory allocation and deallocation code for you, in a way that's guaranteed to be correct.

But, NodeRef is not an "observing" pointer, ie if the owner of the node it points to suddely disappears, then NodeRef becomes dangling. To do otherwise would have more overhead, and would require the use of some tracking pointers, eg shared_ptr and weak_ptr , or a bespoke solution - certainly out of scope here.

And thus NodeRef fulfills the typical requirements that make the actual tree management code much easier to write, understand, and maintain with reduced potential for errors. This approach facilitates code that is correct by design, ie where mistakes that would cause undefined behavior are mostly caught by the compiler, or impossible to write.

Let's see how would a binary node search look, using the types we introduced above:

// cont'd
// Finds the owner of a node that contains a given value,
// or the insertion point where the value would be
NodeRef find(Tree &tree, const Value &value)
{
  NodeRef node = tree.root_;
  while (node) {
    if (value < node->value_)
      node = node->lhs_;
    else if (node->value_ < value)
      node = node->rhs_;
    else
      break; // we found the value we need
  }
  return node;
}
// cont'd

First, let's note that while the returned node reference can be null, it doesn't mean that it's "useless". A NodeRef is never "completely" null, and must always refer to some node owner - that's why the default constructor is deleted, so you can't create an invalid NodeRef by mistake. It is the node that can be null, not the underlying reference to the owning pointer to the node.

Notice how similar the code is to a version that would use Node * , yet it is more powerful. Since this version of find returns a NodeRef , we can use this reference to replace the node (or set it for the first time if it was null), whereas the signature Node *find(Node *root, const Value &value) would only give us access to the node itself, but not to its owner. And, in case the node wasn't found, it would return a null pointer - nor bringing us any closer to knowing where to insert the new node, and discarding the work done to find such insertion point (!).

NodeRef gives us a circumspect access to the parent node: it doesn't expose the entire parent node, but just the owning pointer which owns given node - and it's also more general than a "parent" node would be, since the owning pointer does not need to be even held by a Node type. And indeed, NodeRef works just fine when a node's owner is in the Tree class, or it could refer to a stand-alone pointer as well:

std::unique_ptr<Node> myNode;
NodeRef node = myNode;

// The two lines below are equivalent - both change the `myNode` owning pointer
node.replace(std::make_unique<Node>(42));
myNode = std::make_unique<Node>(42);

In principle, there could be a NodeRef &NodeRef::operator=(std::unique_ptr<Node> &&) , ie a way to move-assign the node itself, but this would hide the important fact that NodeRef doesn't really own the node, but only refers to some owner, and the replace method makes this more explicit: we are replacing the node held by the owner.

Now we can implement the function you sought: node removal. This function takes a NodeRef , and modifies the subtree at the root of that node, so that the original node is removed:

// cont'd
// Removes the given node. Returns true if the node was removed, or false if
// there was nothing to remove
bool remove(NodeRef node)
{
  for (;;) {
    if (!node) return false; // the node is empty, nothing to do
  
    if (!node->lhs_) {
      // replace the node with its sole right child, if any
      node.replace(std::move(node->rhs_));
      return true;
    }
    else if (!node->rhs_) {
      // replace the node with its sole left child, if any
      node.replace(std::move(node->lhs_));
      return true;
    }
    else {
      // node has two children
      // 1. take on the largest value in the left subtree
      // oldValue is a *reference* to the value of the node being replaced
      Value &oldValue = node->value_;
      node = node->lhs_;
      while (node->rhs_) node = node->rhs_;
      // we found the node with a replacement value - substitute it for
      // the old value
      oldValue = std::move(node->value_);

      // 2. remove that child - continue the removal loop
      continue;
      // instead of continue, we could also do
      // remove(node);
      // return;
      // but by continuing we don't have recursion, and we levarage
      // the fact that the `node` references the correct node to remove
    }
  }
}
// cont'd

We std::move the values - this is not important at all when dealing with "simple" value types like integers, but would be important if, for example, the Value was a type that can only be moved but not copied, eg using Value = std::unique_ptr<SomeType>; .

And now the helper that manages node removal in the Tree :

// cont'd
void remove(Tree &tree, const Value& value)
{
  auto node = find(tree, value);
  if (remove(node))
    -- tree.count_;
}
// cont'd

Instead of const Value &value we could have had int value , but this way it's a more generic approach that would work with other Value types.

Node insertion is also fairly easy, since find already provides the insertion point where the value would be, were it to exist:

// cont'd
bool insert(Tree &tree, const Value& value)
{
  auto node = find(tree, value);
  if (node) {
    // Such a value already exists
    assert(node->value_ == value);
    return false;
  } else {
    // Insert new value
    node.replace(std::make_unique<Node>(value));
    ++ tree.count_;
    return true;
  }
}
// cont'd

If Value was a non-copyable type, then we'd need an insert signature that takes rvalue reference, ie bool insert(Tree &tree, Value &&value) .

Now you may ask: how would we "walk" the tree? In C++, the idiomatic way to deal with collections of items is via iterators, and then one can use so-called range-for . The following example prints out the elements of a vector:

std::vector<int> values{1,2,3,4,5};
for (int val : values)
  std::cout << val << "\n";

When iterating, or "walking" the tree, we need some "breadcrumbs" to leave behind us, so that we can find our way back up the tree. Those need to reference the node, as well as whether the node was visited or traversed:

// cont'd
#include <functional>
#include <stack>
#include <vector>

// An entry in the node stack used to iterate ("walk") the tree
struct BreadCrumb {
  NodeRef node;
  bool visited = false; // was this node visited?
  bool traversedLeft = false; // was the left child descended into?
  bool traversedRight = false; // was the right child descended into?
  BreadCrumb(std::unique_ptr<Node> &owner) : node(owner) {}
  BreadCrumb(NodeRef node) : node(node) {}
  Node *operator->() const { return node.get(); }
  explicit operator bool() const { return bool(node); }
};
// cont'd

The "path" that we walk down the tree is kept on a stack dedicated for this purpose:

// cont'd
// A stack holds the path to the current node
class NodeStack {
  // Top of stack is the current node
  std::stack<BreadCrumb, std::vector<BreadCrumb>> m_stack;
public:
  NodeStack() = default;
  NodeStack(NodeRef n) { if (n) m_stack.push(n); }
  
  bool empty() const { return m_stack.empty(); }
  // The breadcrumb that represents the top of stack, and thus the current node
  BreadCrumb &crumb() { return m_stack.top(); }
  const BreadCrumb &crumb() const { return m_stack.top(); }
  NodeRef node() { return crumb().node; }
  Node *node() const { return empty() ? nullptr : crumb().node.get(); }
  
  void push(NodeRef n) { m_stack.push(n); }

  // Visit and mark the node if not visited yet
  bool visit() {
    if (crumb().visited) return false;
    crumb().visited = true;
    return true;
  }
  // Descend one level via the left edge if not traversed left yet
  bool descendLeft() {
    if (crumb().traversedLeft) return false;
    crumb().traversedLeft = true;
    auto &n = crumb()->lhs_;
    if (n) m_stack.push(n);
    return bool(n);
  }
  // Descends one level via right edge if not traversed right yet
  bool descendRight() {
    if (crumb().traversedRight) return false;
    crumb().traversedRight = true;
    auto &n = crumb()->rhs_;
    if (n) m_stack.push(n);
    return bool(n);
  }
  // Ascends one level
  bool ascend() {
    m_stack.pop();
    return !empty();
  }
};
// cont'd

The tree traversal operations are abstracted away in the stack, so that the remaining code is higher level and devoid of such details.

Now we can implement a node iterator that uses the stack to keep its trail of breadcrumbs:

// cont'd
// Node Forward Iterator - iterates the nodes in given order
class NodeIterator {
  using Advancer = void (NodeIterator::*)();
  
  NodeStack m_stack; // Breadcrumb path to the current node
  Advancer m_advancer; // Method that advances to next node in chosen order
  Order m_order = Order::In;

public:
  NodeIterator() = default;
  // Dereferencing operators
  Node& operator*() { return *m_stack.node(); }
  Node* operator->() { return m_stack.node().get(); }
  // Do the iterators both point to the same node (or no node)?
  bool operator==(const NodeIterator &other) const {
    return m_stack.node() == other.m_stack.node();
  }
  bool operator==(decltype(nullptr)) const { return !bool(m_stack.node()); }
  bool operator!=(const NodeIterator &other) const { return m_stack.node(); }
  bool operator!=(decltype(nullptr)) const { return bool(m_stack.node()); }
  
  NodeIterator(NodeRef n, Order order = Order::In) : m_stack(n) {
    setOrder(order);
    if (n) operator++(); // Start the traversal
  }
  
  void setOrder(Order order) {
    if (order == Order::In)
      m_advancer = &NodeIterator::advanceInorder;
    else if (order == Order::Pre)
      m_advancer = &NodeIterator::advancePreorder;
    else if (order == Order::Post)
      m_advancer = &NodeIterator::advancePostorder;
    m_order = order;
  }

  NodeIterator &operator++() { // Preincrement operator
    assert(!m_stack.empty());
    std::invoke(m_advancer, this);
    return *this;
  }
  // No postincrement operator since it'd need to copy the stack and thus
  // be way too expensive to casually expose via postincrement.

  void advanceInorder();
  void advancePreorder();
  void advancePostorder();
  
  bool goLeft() { return m_stack.descendLeft(); }
  bool goRight() { return m_stack.descendRight(); }
};
// cont'd

Remember the stack? It lets us describe the in-, pre- and post-order traversal rather succinctly:

// cont'd
void NodeIterator::advanceInorder() {
  for (;;) {
    if (m_stack.descendLeft())
      continue;
    if (m_stack.visit())
      break;
    if (m_stack.descendRight())
      continue;
    if (m_stack.ascend())
      continue;
    assert(m_stack.empty());
    break;
  }
}

void NodeIterator::advancePreorder() {
  for (;;) {
    if (m_stack.visit())
      break;
    if (m_stack.descendLeft())
      continue;
    if (m_stack.descendRight())
      continue;
    if (m_stack.ascend())
      continue;
    assert(m_stack.empty());
    break;
  }
}

void NodeIterator::advancePostorder() {
  for (;;) {
    if (m_stack.descendLeft())
      continue;
    if (m_stack.descendRight())
      continue;
    if (m_stack.visit())
      break;
    if (m_stack.ascend())
      continue;
    assert(m_stack.empty());
    break;
  }
}
// cont'd

And now we'd want some easy way to use this iterator when we'd wish to iterate a tree rooted in some node:

// cont'd
class TreeRangeAdapter {
  NodeRef m_root;
  Order m_order;
public:
  TreeRangeAdapter(NodeRef root, Order order) :
    m_root(root), m_order(order) {}
  NodeIterator begin() const { return {m_root, m_order}; }
  constexpr auto end() const { return nullptr; }
};

auto inOrder(NodeRef node) { return TreeRangeAdapter(node, Order::In); }
auto preOrder(NodeRef node) { return TreeRangeAdapter(node, Order::Pre); }
auto postOrder(NodeRef node) { return TreeRangeAdapter(node, Order::Post); }
// cont'd

And how would all that work? This is but a simple example of filling up a tree, and in-order traversal:

// cont'd
#include <iostream>
#include <cstdlib>

int main() {
  Tree tree;
  for (int i = 0; i < 10; ++i) insert(tree, rand() / (RAND_MAX/100));
  
  for (auto &node : inOrder(tree.root_)) {
    std::cout << node.value_ << " ";
  }
  std::cout << "\n";
}
// complete example ends

Output:

19 27 33 39 55 76 78 79 84 91

Why is my Binary Search Tree void remove function not working properly?

Question

1 answers

solution1
2 ACCPTED 2020-11-20 23:49:54

Why is my Binary Search Tree void remove function not working properly?

Question

1 answers

solution1 2 ACCPTED 2020-11-20 23:49:54

solution1
2 ACCPTED 2020-11-20 23:49:54