简体   繁体   English

C ++二进制搜索树比较节点数据并删除重复项

[英]C++ Binary search tree compare data of nodes and remove duplicates

I have created a binary search tree in c++ and have loaded it up with two types of data, strings and ints. 我用c ++创建了一个二进制搜索树,并用两种类型的数据(字符串和整数)加载了它。 I am reading a text file and loading the tree up alphabetically with the words I am pulling, and also the number of the line the word is found on. 我正在读取一个文本文件,并按字母顺序将要拉的单词以及该单词所在的行号加载到树上。 I am able to print the words and the numbers just fine. 我能够很好地打印单词和数字。 What I am wanting to do now is check to see if a word has already been printed, and if it has then I will only print out the number of the line from which the word is found on. 我现在想要做的是检查一个单词是否已经被打印,如果已经打印了,那么我将只打印出从中找到该单词的行号。 The way I am thinking about doing this is by comparing previous data as the tree is traversed and printed. 我考虑的方法是在遍历和打印树时比较以前的数据。 This is my print function. 这是我的打印功能。

void inOrderPrint(Node *rootPtr ) {
    if ( rootPtr != NULL ) {
        for (int i =0; rootPtr->data[i]; i++){
            while(ispunct(rootPtr->data[i]))
                rootPtr->data.erase(i,1);
                }
        rootPtr->data = rootPtr->data.substr(0,10);
        inOrderPrint( rootPtr->left );
        cout << (rootPtr->data)<<rootPtr->lineNum <<endl;
        inOrderPrint( rootPtr->right );
    }
}

This is what I was thinking: 这就是我的想法:

if (rootPtr->data == previous rootPtr->data)
    cout<<setw(10)<<theCurrentNode lineNum;
else
    do normal printing

I think that if this function were to run on the first node and it compares it to the non existent previous node, it would automatically try to compare it to NULL, the if statement would return false and it would move on to the else. 我认为,如果此函数在第一个节点上运行并将其与不存在的前一个节点进行比较,它将自动尝试将其与NULL比较,if语句将返回false,然后移至else。

Any suggestions on how to go about doing this with actual c++ syntax? 关于如何使用实际的c ++语法执行此操作的任何建议? Or does anyone see a flaw in my logic? 还是有人看到我的逻辑有缺陷?

Thanks in advance! 提前致谢!

This answer will describe how to make the program print unique entries and the line number of the first occurrence in the file. 该答案将描述如何使程序打印唯一的条目以及文件中第一次出现的行号。 If there are duplicate occurrences it will print only the line number of the first occurrence for each duplicate occurrence. 如果有重复出现,则将仅为每个重复出现打印第一个出现的行号。 The approach is to make sure that there are no duplicate nodes in the tree and to count redundant occurrences. 该方法是确保树中没有重复的节点,并计算重复出现的次数。

To do this we might modify the node structure as follows: 为此,我们可以如下修改节点结构:

struct Node{
    string data;
    int lineNum;
    int count =1;
    Node* left;
    Node* right;
};

The function Insert might be edited to count duplicates like this: 可以对插入函数进行编辑,以计算重复次数,如下所示:

Node* Insert(Node* rootPtr,string data,int lineNum){
if(rootPtr == NULL){
    rootPtr = GetNewNode(data,lineNum);
    for (int i =0; rootPtr->data[i]; i++){
        while(ispunct(rootPtr->data[i]))
            rootPtr->data.erase(i,1);
            }
    rootPtr->data = rootPtr->data.substr(0,10);

    return rootPtr;
}
else if(data< rootPtr->data){
    rootPtr->left = Insert(rootPtr->left,data,lineNum);
    for (int i =0; rootPtr->data[i]; i++){
        while(ispunct(rootPtr->data[i]))
            rootPtr->data.erase(i,1);
            }
    rootPtr->data = rootPtr->data.substr(0,10);
}
else if(data > rootPtr->data) {
    rootPtr->right = Insert(rootPtr->right,data,lineNum);
    for (int i =0; rootPtr->data[i]; i++){
        while(ispunct(rootPtr->data[i]))
            rootPtr->data.erase(i,1);
            }
    rootPtr->data = rootPtr->data.substr(0,10);

}
else if(data == rootPtr->data)
    ++rootPtr->count;

return rootPtr;
}

Finally the print function can be modified: 最后,可以修改打印功能:

void inOrderPrint(Node *rootPtr ) {
//ofstream outputFile;
//outputFile.open("Output.txt");

if ( rootPtr != NULL ) {
    inOrderPrint( rootPtr->left );
    cout << (rootPtr->data)<<" " << rootPtr->lineNum <<endl;
    int j =rootPtr->count;
    while( --j )
    cout << rootPtr->lineNum <<endl;

    //outputFile << (rootPtr->data)<<rootPtr->lineNum <<endl;
    inOrderPrint( rootPtr->right );
}
}

Now this should be much closer to what you want. 现在,这应该更接近您想要的。 It would also be a good idea to separate the text processing from the node processing. 将文本处理与节点处理分开也是一个好主意。 (This answer sort of assumes that you will take care of that.) Otherwise duplicate nodes will be created if the preprocessed text does not match the processed text. (这种回答方式假设您将进行处理。)否则,如果预处理后的文本与处理后的文本不匹配,则会创建重复的节点。

Good luck! 祝好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM