简体   繁体   English

插入 AVL 树只替换根节点

[英]Insertion into AVL tree only replaces root node

I'm currently working on an assignment where the N most frequent words in a book (.txt) must be printed.我目前正在执行一项任务,其中必须打印一本书 (.txt) 中最常用的 N 个单词。 The issue that I'm currently facing is that when I add a node to one of my trees, it simply replaces the root node and thus, the tree remains as a single node.我目前面临的问题是,当我将一个节点添加到我的一棵树时,它只是替换了根节点,因此,树仍然是一个节点。

Code snippet which adds words from the file "stopwords.txt" to a tree named stopwords:将文件“stopwords.txt”中的单词添加到名为 stopwords 的树中的代码片段:

Dict stopwords = newDict();

if (!readFile("stopwords.txt"))
   {
      fprintf(stderr, "Can't open stopwords\n");
      exit(EXIT_FAILURE);
   }

   FILE *fp = fopen("stopwords.txt", "r");

   while (fgets(buf, MAXLINE, fp) != NULL)
   {
      token = strtok(buf, "\n");
      DictInsert(stopwords, buf); //the root is replaced here
   }
   fclose(fp);

The data structures are defined as follows:数据结构定义如下:

typedef struct _DictNode *Link;

typedef struct _DictNode
{
   WFreq data;
   Link left;
   Link right;
   int height;
} DictNode;

typedef struct _DictRep *Dict;

struct _DictRep
{
   Link root;
};

typedef struct _WFreq {
   char  *word;  // word buffer (dynamically allocated)
   int    freq;  // count of number of occurences
} WFreq;

Code to insert and rebalance tree:插入和重新平衡树的代码:

// create new empty Dictionary
Dict newDict(void)
{
   Dict d = malloc(sizeof(*d));
   if (d == NULL)
   {
      fprintf(stderr, "Insufficient memory!\n");
      exit(EXIT_FAILURE);
   }
   d->root = NULL;
   return d;
}

// insert new word into Dictionary
// return pointer to the (word,freq) pair for that word
WFreq *DictInsert(Dict d, char *w)
{
   d->root = doInsert(d->root, w); //the root is replaced here before doInsert runs
   return DictFind(d, w);
}

static int depth(Link n)
{
   if (n == NULL)
      return 0;
   int ldepth = depth(n->left);
   int rdepth = depth(n->right);
   return 1 + ((ldepth > rdepth) ? ldepth : rdepth);
}

static Link doInsert(Link n, char *w)
{
   if (n == NULL)
   {
      return newNode(w);
   }

   // insert recursively
   int cmp = strcmp(w, n->data.word);
   if (cmp < 0)
   {
      n->left = doInsert(n->left, w);
   }
   else if (cmp > 0)
   {
      n->right = doInsert(n->right, w);
   }
   else
   { // (cmp == 0)
      // if time is already in the tree,
      // we can return straight away
      return n;
   }

   // insertion done
   // correct the height of the current subtree
   n->height = 1 + max(height(n->left), height(n->right));

   // rebalance the tree
   int dL = depth(n->left);
   int dR = depth(n->right);

   if ((dL - dR) > 1)
   {
      dL = depth(n->left->left);
      dR = depth(n->left->right);
      if ((dL - dR) > 0)
      {
         n = rotateRight(n);
      }
      else
      {
         n->left = rotateLeft(n->left);
         n = rotateRight(n);
      }
   }
   else if ((dR - dL) > 1)
   {
      dL = depth(n->right->left);
      dR = depth(n->right->right);
      if ((dR - dL) > 0)
      {
         n = rotateLeft(n);
      }
      else
      {
         n->right = rotateRight(n->right);
         n = rotateLeft(n);
      }
   }

   return n;
}

static Link newNode(char *w)
{
   Link n = malloc(sizeof(*n));
   if (n == NULL)
   {
      fprintf(stderr, "Insufficient memory!\n");
      exit(EXIT_FAILURE);
   }

   n->data.word = w;
   n->data.freq = 1;
   n->height = 1;
   n->left = NULL;
   n->right = NULL;
   return n;
}

// Rotates  the  given  subtree left and returns the root of the updated
// subtree.
static Link rotateLeft(Link n)
{
   if (n == NULL)
      return n;
   if (n->right == NULL)
      return n;
   Link rightNode = n->right;
   n->right = rightNode->left;
   rightNode->left = n;

   n->height = max(height(n->left), height(n->right)) + 1;
   rightNode->height = max(height(rightNode->right), n->height) + 1;

   return rightNode;
}

// Rotates the given subtree right and returns the root of  the  updated
// subtree.
static Link rotateRight(Link n)
{
   if (n == NULL)
      return n;
   if (n->left == NULL)
      return n;
   Link leftNode = n->left;
   n->left = leftNode->right;
   leftNode->right = n;

   n->height = max(height(n->left), height(n->right)) + 1;
   leftNode->height = max(height(leftNode->right), n->height) + 1;

   return leftNode;
}

I believe that most of the code is functional and it is simply the insertion which fails.我相信大部分代码都是有效的,只是插入失败了。 When I attempted to debug this with gdb, I had discovered that the root node (d->root) was replaced before the recursive insert function (doInsert) was run, causing the program to always return the node n which, as a result, already exists in the tree.当我尝试使用 gdb 进行调试时,我发现根节点(d->root)在递归插入 function(doInsert)运行之前被替换,导致程序总是返回节点 n,结果,已经存在于树中。 For example, if the text file contained the following:例如,如果文本文件包含以下内容:
a一个
b b
c c
then the program would first insert "a" as stopwords->root , then "b" would replace "a" and become the new stopwords->root , finally "c" would replace "b" as the stopwords->root , resulting in a tree with one node, "c" .然后程序会首先插入"a"作为stopwords->root ,然后"b"将替换"a"并成为新的stopwords->root ,最后"c"将替换"b"作为stopwords->root ,结果在具有一个节点的树中, "c"

There are many inconsistencies in your code.您的代码中有许多不一致之处。

One mistake is here:这里有一个错误:

d->root = doInsert(d->root, w);

You reassign unconditionally the root each time when you insert a new node.每次插入新节点时,您都会无条件地重新分配根。

You are supposed to return the new node from the function doInsert and to reassign the root only if the new node had become a new root.您应该从 function doInsert返回新节点,并且仅当新节点已成为新根时才重新分配根。

But other mistake that you make is that you return from doInsert a local variable n that was not newly allocated but that was initialized to point to the previous root.但是你犯的另一个错误是你从doInsert返回了一个局部变量n ,它不是新分配的,而是初始化为指向前一个根的。

Inside doInsert you need to allocate a new node NEW and use a variable x to walk down from the root until you find a place to insert a new allocated node NEW .doInsert内部,您需要分配一个新节点NEW并使用变量x从根向下走,直到找到插入新分配节点NEW的位置。 If x stops at root then you reinitialize the d->root = NEW .如果x在 root 处停止,则重新初始化d->root = NEW

Your function newNode just stores the passed string pointer, so what is pointed at will change when you modify the original string.您的 function newNode仅存储传递的字符串指针,因此当您修改原始字符串时,指向的内容会发生变化。

To prevent that, you should copy the input string on node insertions.为防止这种情况,您应该在节点插入时复制输入字符串。

To archive that,要存档,

    n->data.word = w;

should be应该

    n->data.word = malloc(strlen(w) + 1);
    if (n->data.word == NULL)
    {
        fprintf(stderr, "Insufficient memory!\n");
        exit(EXIT_FAILURE);
    }
    strcpy(n->data.word, w);

Add #include <string.h> to use strlen() and strcpy() if it isn't.添加#include <string.h>以使用strlen()strcpy()如果不是。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM