简体   繁体   English

将大样本放在二叉搜索树上(不平衡)

[英]Putting large sample on Binary Search Tree (not balanced)

I need to construct a Binary Search Tree from a file that has more than 2 million lines (each line will give me a pair key/val). 我需要从具有超过200万行的文件中构造一个二叉搜索树(每行将为我提供一对密钥/值)。 Since the data is ordered, if I just read one line, get the key and val and add to my tree, the height will be huge so that the tree will be inefficient to search. 由于数据是有序的,因此如果我只读一行,获取键和val并将其添加到我的树中,则高度将很大,因此搜索树效率低下。 So, I was thinking if there is a good way to construct this search tree so that it doesnt have a huge height. 因此,我在考虑是否有一种构造此搜索树的好方法,以使它没有很大的高度。 My attempt was to get the first 100.000 keys, shuffle, put on tree and so on, but it doesnt seems much efficient. 我的尝试是获取第一个100.000个键,随机播放,放在树上等等,但是似乎效率不高。 Any suggestion? 有什么建议吗?

PS: I have to use a not balanced search tree. PS:我必须使用不平衡的搜索树。

Thanks ! 谢谢 !

If you can read the file multiple times you can read the file the first time and read say 1000 entries (ie one every 2000 rows) in al list and then make a first balanced insertion so you insert first the element at position 500 then two at position 250 and 750 then positions 4 at positions 125,375,625,975, etc. After the first pass you can read the whole file (and managing the duplicates) and get a more balanced tree. 如果您可以多次读取文件,则可以第一次读取文件,并在列表中读取1000条记录(即每2000行一次),然后进行第一个平衡插入,因此首先将元素插入位置500,然后将两个插入位置位置250和750,然后位置4在位置125,375,625,975,依此类推。在第一遍之后,您可以读取整个文件(并管理重复文件)并获得更平衡的树。

An alternative is not to use a BinarySearchTree at all, but an Array, since the data are ordered you can use binary search (you check the value at the middle of the array and if the value you get is bigger you repeat the operation with the first half of the list, it ithe value is lower you use the second half of the list); 另一种选择是根本不使用BinarySearchTree,而是使用Array,因为数据是有序的,因此您可以使用二进制搜索(您可以检查数组中间的值,如果得到的值更大,则可以使用列表的前半部分,如果您使用列表的后半部分,则该值较低); but I don't know if using a List meets your requirements. 但我不知道使用列表是否符合您的要求。

As a side note, creating a BST when you're already handed a sorted array is kind of a crazy thing to do, but with that aside... 附带一提,在您已经处理完排序数组时创建BST是一件疯狂的事情,但除此之外...

If you're given a sorted array already, it's practically giving you the answer for how to construct a balanced BST with a minimum height. 如果已经给您一个排序数组,它实际上为您提供了如何构建具有最小高度的平衡BST的答案。 For simplicity, let's imagine the array is: 为了简单起见,让我们想象一下数组是:

[0,1,2,3,4,5,6,7,8,9,10]

In such a case, what would be the optimal element to store at the root for a balanced tree? 在这种情况下,平衡树的根部存储的最佳元素是什么? The natural answer is the middle of the list, 5 . 自然的答案是该列表的中间5

So then we're left with two sub-ranges of the array: 因此,我们剩下了数组的两个子范围:

i<5: [0,1,2,3,4]
i>5: [6,7,8,9,10]

So what is be the ideal element to store at the left child? 那么,存储在左孩子身上的理想元素是什么? Again we take the center of the left child list ( i<5 ), and that would be 2 , and we have two sub-ranges of that array: 同样,我们以左侧子列表的中心( i<5 )为中心,即2 ,并且该数组有两个子范围:

i<2: [0,1]
i>2: [3,4]

And we can repeat this logic recursively until we're left with a single child or none in both ranges, at which point we've made a leaf node. 我们可以递归地重复此逻辑,直到我们剩下一个孩子或两个范围都没有孩子为止,此时我们已经创建了一个叶子节点。

Applied to both sides of every branch recursively, drilling down to the leaves, this will give you that optimal balanced tree. 递归应用于每个分支的两侧,向下钻取到叶子,这将为您提供最佳的平衡树。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM