简体   繁体   English

特里树结构声明

[英]Trie tree struct declaration

so I've this piece of code (which is not mine) and I can't understand for the life of me what those structures look like. 所以我有这段代码(不是我的),我一生都无法理解那些结构是什么样的。 Can someone explain please? 有人可以解释一下吗?

typedef struct trie_node trie_node_t;
struct trie_node
{
    int value;
    trie_node_t *children[ALPHABET_SIZE];
};

// trie ADT
typedef struct trie trie_t;
struct trie
{
    trie_node_t *root;
    int count;
};

Int count in second struct is for counting all words put in the tree, but I would like to know how many times every single word was put in there, and apart from modifying rest of the code, how should I modify the structure to achieve that? 第二个结构中的int count是用于计数放入树中的所有单词,但是我想知道每个单词被放入其中的次数,除了修改代码的其余部分外,我应该如何修改结构以实现该功能?

Rest of the code: http://pastebin.com/9zQuCBjb 其余代码: http//pastebin.com/9zQuCBjb

I suppose you are familiar with the concept of a trie, where you find words and prefixes of words by walking (or crawling, to use the words of the linked code) down the tree with the letters of the word and branching at each node according to the letters you find. 我想您已经熟悉了trie的概念,您可以通过沿着单词的字母在树上行走(或爬行,以使用链接的代码的单词)并在每个节点处分支来找到单词和单词的前缀到您找到的字母。 Each node has many children; 每个节点有许多子节点。 26 if you use the case-insensitive Latin alphabet. 如果使用不区分大小写的拉丁字母,则为26。

The word is encoded in the path on which you get there: 该单词被编码在您到达的路径中:

root->[f]->[i]->[s]->[h]  --> "fish"

Now you need to know whether the current node represents a word. 现在,您需要知道当前节点是否代表一个单词。 "fish" is a word, but "fis" isn't. "fish"是一个词,但"fis"不是。 You can't use the fact that the node is a leaf without children, because "fishbone" might be in the dictionary. 您不能使用节点是没有子叶的事实,因为"fishbone"可能在字典中。 That's what the value entry is for: Zero means the current node does not represent a word, otherwise the value is a one-based index of the current word. 这就是输入value目的:零表示当前节点不代表单词,否则该值是当前单词的基于1的索引。

When you create a new entry, you just crawl down the trie, possibly creating new nodes as you go and marking the last node with the current count fo words as value. 创建新条目时,您只需向下爬取该Trie,就可以在创建过程中创建新节点,并使用当前字数将最后一个节点标记为值。 If "fishbode" is already in the trie and you add "fish" , you don't create new nodes and only mark the "h" node with a new value. 如果"fishbode"已经在特里"fishbode"中,并且添加了"fish" ,则不会创建新节点,而只会用新值标记"h"节点。

The trie struct is just a helper to contain the trie's root node and a count. trie结构只是一个帮助,它包含了trie的根节点和一个计数。

If you want to keep track of occurrences, add a count field to the nodes and increment it whenever you set value . 如果要跟踪事件的发生,请在节点上添加一个count字段,并在设置value时将其递增。 (The original code doesn't check whether a value is already in the trie before and adds words unconditionally, thereby overwriting any old values.) (原始代码不会检查之前的值是否已经存在,并且会无条件添加单词,从而覆盖所有旧值。)

You can also keep a count of all words starting with the prefix at the current node by having an prefix_count field and incrementing that whenever you pass a node when inserting a key. 您还可以通过具有prefix_count字段并在插入键时通过节点时增加该值,来对当前节点prefix_count前缀开头的所有单词进行计数。

When you want to retrieve the occurrencs, you'll have to walk all subtrees. 当您想要检索发生事件时,您将必须遍历所有子树。

Tries are useful for autoexpansion of words from the first letters of user input or T9-style typing systems, but they are rather memory greedy. 尝试对于自动扩展用户输入的首字母或T9样式的键入系统中的单词很有用,但是它们非常贪婪。 If you just want to count the occurrences of words (without making use of the benefits of a trie), it might be easier to achieve this with a single hash map of word to count. 如果您只想计算单词的出现次数(而没有利用trie的好处),则可以使用单个要计数的单词哈希表来实现这一点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM