简体   繁体   English

创建树数据结构

[英]Creating tree data structure

i have some data: 我有一些数据:

A
AXNHJNEHWXNOECMEJK
DNFJNXYEEQWhsdbchjsxs
XMJQWsdsEOJdfsKMDJE

.... ....

Each row is array and each letter is object. 每行都是数组,每个字母都是对象。 I have comparer function which could say that letter A is equavalent of letter a(actually it is not letter. It's russian words and comparer function use morphology to let me know that word are equal for example матрешка==матрешки==матрешкины and arrays are russian sentences. For example: "Мама мыла раму"). 我有比较函数可以说字母A等于字母a(实际上它不是字母。它是俄语单词和比较函数使用形态学让我知道单词是相同的例如матрешка==матрешки==матрешкины和数组是俄语句子。例如:“Мамамылараму”)。 I want to create tree data structure which looks like: 我想创建树状数据结构,如下所示:

1) A
2.1) BA
2.2) DHBAFH
3.1) BEDMEWA
etc...

Otherwise child nodes must contain letters from parent nodes. 否则子节点必须包含父节点的字母。 If you know how to work google adwords i think you can understand me. 如果你知道如何工作谷歌adwords我认为你可以理解我。 My question is how to do that FAST. 我的问题是如何快速做到这一点。 I need to create tree with thousands arrays. 我需要创建具有数千个数组的树。 Compare function works very slow(it use big dictionary) that's why speed is real problem. 比较功能工作很慢(它使用大字典),这就是为什么速度是真正的问题。

Some simple data(sorry for russian): 一些简单的数据(对不起俄语):

here is set of sentences 这是一组句子

сайты        
сайты недорого
сайты дешево
сайты дешево и быстро
красивый сайт по доступным ценам 
хочу купить хороший стул 
стул по доступным ценам

we must create following tree data structure 我们必须创建以下树数据结构

1) сайты
1->2.1) сайты недорого
1->2.2) сайты дешево
1->2.3) красивый сайт по доступным ценам 
1->2.2->3) сайты дешево и быстро

other parent nodes: 其他父节点:

1) хочу купить хороший стул 
1) стул по доступным ценам

Child nodes must contain more words then parent. 子节点必须包含更多单词,然后父节点。

Well, 好,

Seems that this link could be helpful for your problem 似乎此链接可能对您的问题有所帮助

Fast String Searching With Suffix Trees: http://marknelson.us/1996/08/01/suffix-trees/ 使用后缀树快速搜索字符串: http//marknelson.us/1996/08/01/suffix-trees/

and

Suffix tree 后缀树

http://en.wikipedia.org/wiki/Suffix_tree http://en.wikipedia.org/wiki/Suffix_tree

Start with sentences that have one word. 从有一个单词的句子开始。 They all are going to be parent nodes, so this is simple. 它们都将成为父节点,因此这很简单。

Then continue with two-word sentences. 然后继续用两个单词的句子。 You have to match each of them with every one-word parent node, which is going to be quite slow, because of your slow comparison function. 你必须将它们与每个单字父节点相匹配,因为你的比较慢,所以它会很慢。 You can do two optimizations, though: first check whether the words are exactly the same. 但是,您可以进行两项优化:首先检查单词是否完全相同 You can do this yourself and it's going to be fast. 你可以自己做,这会很快。 Another one is to remember the results of the comparison function for every pair of compared words. 另一个是记住每对比较单词的比较函数的结果。 You're going to waste some memory, but you're going to gain some speed. 你会浪费一些记忆,但你会获得一些速度。

When a node matches, add the sentence to it. 当节点匹配时,将句子添加到它。 When the sentence doesn't match any node, make it a parent node. 当句子与任何节点不匹配时,将其设为父节点。

For sentences with gradually increasing lengths, you do the same, except you have to try matching children of a node that matched, to find the correct place to add the sentence. 对于逐渐增加长度的句子,除了必须尝试匹配匹配的节点的子项之外,您也会这样做,以找到添加句子的正确位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM