简体   繁体   中英

Inserting into a binary search tree in C

I'm currently learning C and also some datastructures such as binary search trees etc. I have trouble understanding HOW exactly changing pointer values within a function works in some cases and in others doesn't... I'll attach some of my code I wrote. It's an insert function which inserts values in the correct places in the BST (it works as it should). I tried working with pointers to pointers to be able to change values withing a function. Even though it works, im still really confused why it actually does. I don't quite understand why my insert function actually changes the BST even though I only work with local variables (tmp, parent_ptr) in my insert function and I don't really dereference any pointers apart from " tmp = *p2r " in the insert function.

Thanks for helping out.

#include <stdio.h>
#include <stdlib.h>


struct TreeNode{
    int val;
    struct TreeNode *left;
    struct TreeNode *right;
};

struct TreeNode** createTree(){
    struct TreeNode** p2r;
    p2r = malloc(sizeof(struct TreeNode*));
    *p2r = NULL;
    return p2r;
}

void insert(struct TreeNode** p2r, int val){
    // create TreeNode which we will insert
    struct TreeNode* new_node = malloc(sizeof(struct TreeNode));
    new_node -> val = val;
    new_node -> left = NULL;
    new_node -> right = NULL;
    //define onestep delayed pointer
    struct TreeNode* parent_ptr = NULL;
    struct TreeNode* tmp = NULL;
    tmp = *p2r;
    // find right place to insert node
    while (tmp != NULL){
        parent_ptr = tmp;
        if (tmp -> val < val) tmp = tmp->right;
        else tmp = tmp->left;
    }
    if (parent_ptr == NULL){
        *p2r = new_node;
    }
    else if (parent_ptr->val < val){ //then insert on the right
        parent_ptr -> right = new_node;
    }else{
        parent_ptr -> left = new_node;
    }
}

int main(){
    struct TreeNode **p2r = createTree();
    insert(p2r, 4);
    insert(p2r, 2);
    insert(p2r, 3);
    return 0;
}

While the pointers themselves are indeed local variables, they point to a specific location in memory. When you dereference the pointer by using the -> symbol, you're basically accessing the memory where that exact variable to which the pointer is pointing to is stored. This is why your changes are reflected outside the function as well.

You basically told a local variable where your tree is stored, it helped with the insertion, and then it went out of scope. The tree itself is not a local variable so the changes are reflected on it.

I suggest reading up on how pointers work.

First of all, always remember one thing about the pointers, they store a memory address, rather than a value. For example:

int val = 5;
int copy = val;
int *original = &val;

printf("%d\n", val);
printf("%d\n", copy);
printf("%d\n", *original);

val = 8;

printf("%d\n", val);
printf("%d\n", copy);
printf("%d\n", *original);

On executing this piece of code, the output will be

5
5
5
8
5
8

Notice, how on changing the value of val , the value of copy remains the same, and the value pointed the by original changes. This happens because the pointer original points to the memory location val .

Now, coming to the insert function, although you are only working with local variables(tmp, parent_ptr), but remember they are pointer variables, they refer to a memory address. So whenever within the loop, you traverse to tmp -> right or tmp -> left , you are actually jumping in memory from one location to another, in the correct order, that's why it works. The following example will make it more clear.

     56 (A)
     /    \
    /      \
  45 (B)  60 (C)

Consider the above BST, with the memory address in brackets. Let's insert 40 into this BST. Initially, tmp will point to A, address of 56. Now 40 is less than 56, so tmp goes to left and now points to B, address of 45. Once, again it goes to left and now it is null. But by now, parent_ptr points to B. So the new node for 40 gets attached to left of B.

      56 (A)
     /    \
    /      \
  45 (B)  60 (C)
  /
 /
40 (D)

Let's analyze the approach step by step.

At first we consider the following simple program.

#include <stdio.h>
#include <stdlib.h>

struct TreeNode{
    int val;
    struct TreeNode *left;
    struct TreeNode *right;
};

void create( struct TreeNode *head, int val )
{
    head = malloc( sizeof( struct TreeNode ) );
    
    head->val   = val;
    head->left  = NULL;
    head->right = NULL;
}

int main(void) 
{
    struct TreeNode *head = NULL;
    
    printf( "Before calling the function create head == NULL is %s\n",
            head == NULL ? "true" : "false" );
            
    create( head, 10 );
    
    printf( "After  calling the function create head == NULL is %s\n",
            head == NULL ? "true" : "false" );
            
    return 0;
}

The program output is

Before calling the function create head == NULL is true
After  calling the function create head == NULL is true

As you can see the pointer head in main was not changed. The reason is that the function deals with a copy of the value of the original pointer head . So changing the copy does not influence on the original pointer.

If you rename the function parameter to head_parm (to distinguish the original pointer named head and the function parameter) then you can imagine the function definition and its call the following way

create( head, 10 );

//...

void create( /*struct TreeNode *head_parm, int val */ )
{
    struct TreNode *head_parm = head;
    int val = 10;
    head_parm = malloc( sizeof( struct TreeNode ) );
    //...

That is within the function there is created a local variable head_parm that is initialized by the value of the argument head and this function local variable head_parm is changed within the function.

It means that function arguments are passed by value.

To change the original pointer head declared in main you need to pass it by reference.

In C the mechanism of passing by reference is implemented by passing an object indirectly through a pointer to it. Thus dereferencing the pointer in a function you will get a direct access to the original object.

So let's rewrite the above program the following way.

#include <stdio.h>
#include <stdlib.h>

struct TreeNode{
    int val;
    struct TreeNode *left;
    struct TreeNode *right;
};

void create( struct TreeNode **head, int val )
{
    *head = malloc( sizeof( struct TreeNode ) );
    
    ( *head )->val   = val;
    ( *head )->left  = NULL;
    ( *head )->right = NULL;
}

int main(void) 
{
    struct TreeNode *head = NULL;
    
    printf( "Before calling the function create head == NULL is %s\n",
            head == NULL ? "true" : "false" );
            
    create( &head, 10 );
    
    printf( "After  calling the function create head == NULL is %s\n",
            head == NULL ? "true" : "false" );
            
    return 0;
}

Now the program output is

Before calling the function create head == NULL is true
After  calling the function create head == NULL is false            

In your program in the question you did not declare the pointer to the head node like in the program above

struct TreeNode *head = NULL;

You allocated this pointer dynamically. In fact what you are doing in your program is the following

#include <stdio.h>
#include <stdlib.h>

struct TreeNode{
    int val;
    struct TreeNode *left;
    struct TreeNode *right;
};

void create( struct TreeNode **head, int val )
{
    *head = malloc( sizeof( struct TreeNode ) );
    
    ( *head )->val   = val;
    ( *head )->left  = NULL;
    ( *head )->right = NULL;
}

int main(void) 
{
    struct TreeNode **p2r = malloc( sizeof( struct TreeNode * ) );
    *p2r = NULL;
    
    printf( "Before calling the function create *p2r == NULL is %s\n",
            *p2r == NULL ? "true" : "false" );
            
    create( p2r, 10 );
    
    printf( "After  calling the function create *p2r == NULL is %s\n",
            *p2r == NULL ? "true" : "false" );
            
    return 0;
}

The program output is

Before calling the function create *p2r == NULL is true
After  calling the function create *p2r == NULL is false

That is compared with the previous program when you used the expression &head of the type struct TreeNode ** to call the function create you are now introduced an intermediate variable p2r which stores the value of the expression &head due to this code snippet

struct TreeNode **p2r = malloc( sizeof( struct TreeNode * ) );
*p2r = NULL;

That is early you called the function create like

create( &head, 10 );

Now in fact you are calling the function like

struct TreeNode **p2r = &head; // where head was allocated dynamically
create( p2r, 10 );

The same takes place in your program. That is within the function insert dereferencing the pointer p2r you have a direct access to the pointer to the head node

if (parent_ptr == NULL){
    *p2r = new_node;
    ^^^^ 
}

As a result the function changes the pointer to the head node passed by reference through the pointer p2r .

The data members left and right of other nodes are also changed through references to them using the pointer parent_ptr

else if (parent_ptr->val < val){ //then insert on the right
    parent_ptr -> right = new_node;
    ^^^^^^^^^^^^^^^^^^^  
}else{
    parent_ptr -> left = new_node;
    ^^^^^^^^^^^^^^^^^^
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM