简体   繁体   English

B +树上的最小/最大记录数?

[英]min/max number of records on a B+Tree?

I was looking at the best & worst case scenarios for a B+Tree ( http://en.wikipedia.org/wiki/B-tree#Best_case_and_worst_case_heights ) but I don't know how to use this formula with the information I have. 我一直在寻找B + Tree的最佳和最差情况( http://en.wikipedia.org/wiki/B-tree#Best_case_and_worst_case_heights ),但是我不知道如何将这个公式与已有的信息一起使用。 Let's say I have a tree B with 1,000 records, what is the maximum (and maximum) number of levels B can have? 假设我有一棵具有1000条记录的树B,那么B可以拥有的最大(和最大)数量是多少? I can have as many/little keys on each page. 每页上可以有很多键。 I can also have as many/little number of pages. 我也可以有尽可能多的页面。 Any ideas? 有任何想法吗? (In case you are wondering, this is not a homework question, but it will surely help me understand some stuff for hw.) (如果您想知道,这不是一个作业问题,但这肯定会帮助我了解一些硬件方面的知识。)

I don't have the math handy, but... 我没有数学方便,但是...

Basically, the primary factor to tree depth is the "fan out" of each node in the tree. 基本上,影响树深度的主要因素是树中每个节点的“扇出”。

Normally, in a simply B-Tree, the fan out is 2, 2 nodes as children for each node in the tree. 通常,在简单的B树中,扇出为2、2个节点作为树中每个节点的子节点。

But with a B+Tree, typically they have a fan out much larger. 但是对于B + Tree,通常它们的扇形要大得多。

One factor that comes in to play is the size of the node on disk. 发挥作用的一个因素是磁盘上节点的大小。

For example, if you have a 4K page size, and, say, 4000 byte of free space (not including any other pointers or other meta data related to the node), and lets say that a pointer to any other node in the tree is a 4 byte integer. 例如,如果您的页面大小为4K,并且有4000字节的可用空间(不包括与该节点相关的任何其他指针或其他元数据),并且可以说指向树中任何其他节点的指针是一个4字节的整数。 If your B+Tree is in fact storing 4 byte integers, then the combined size (4 bytes of pointer information + 4 bytes of key information) = 8 bytes. 如果您的B + Tree实际上存储了4个字节的整数,则组合大小(4个字节的指针信息+ 4个字节的键信息)= 8个字节。 4000 free bytes / 8 bytes == 500 possible children. 4000个可用字节/ 8个字节== 500个可能的子级。

That give you a fan out of 500 for this contrived case. 这样一来,您就可以为这个人为的案例选出500名粉丝。

So, with one page of index, ie the root node, or a height of 1 for the tree, you can reference 500 records. 因此,使用索引的一页(即根节点)或树的高度为1,您可以引用500条记录。 Add another level, and you're at 500*500, so for 501 4K pages, you can reference 250,000 rows. 添加另一个级别,您的大小为500 * 500,因此对于501个4K页面,您可以引用250,000行。

Obviously, the large the key size, or the smaller the page size of your node, the lower the fan out that the tree is capable of. 显然,密钥的大小越大,或节点的页面大小越小,树所能提供的支持就越低。 If you allow variable length keys in each node, then the fan out can easily vary. 如果在每个节点中允许使用长度可变的键,则扇出可以轻松地变化。

But hopefully you can see the gist of how this all works. 但希望您能看到所有这些的工作原理。

The best and worst case depends on the no. 最好和最坏的情况取决于否。 of children each node can have. 每个节点可以拥有的子级数。 For the best case, we consider the case, when each node has the maximum number of children (ie m for an m-ary tree) with each node having m-1 keys. 在最佳情况下,我们考虑以下情况:每个节点具有最大数量的子代(即,对于一元树,m个节点),每个节点具有m-1个键。 So, 所以,

1st level(or root) has m-1 entries 2nd level has m*(m-1) entries (since the root has m children with m-1 keys each) 3rd level has m^2*(m-1) entries .... Hth level has m^(h-1)*(m-1) 第一层(或根)具有m-1个条目第二层具有m *(m-1)个条目(因为根具有m个子,每个子带有m-1个键)第三层具有m ^ 2 *(m-1)个条目。 ... Hth级别具有m ^(h-1)*(m-1)

Thus, if H is the height of the tree, the total number of entries is equal to n=m^H-1 which is equivalent to H=log_m(n+1) 因此,如果H是树的高度,则条目总数等于n = m ^ H-1,等于H = log_m(n + 1)

Hence, in your case, if you have n=1000 records with each node having m children (m should be odd), then the best case height will be equal to log_m(1000+1) 因此,在您的情况下,如果您有n = 1000条记录,并且每个节点有m个孩子(m应该是奇数),则最佳情况下的高度将等于log_m(1000 + 1)

Similarly, for the worst case scenario: 同样,在最坏的情况下:

Level 1(root) has at least 1 entry (and minimum 2 children) 2nd level has as least 2*(d-1) entries (where d=ceil(m/2) is the minimum number of children each internal node (except root) can have) 3rd level has 2d*(d-1) entries ... Hth level has 2*d^(h-2)*(d-1) entries 级别1(root)至少具有1个条目(至少2个子级)第2级具有至少2 *(d-1)条目(其中d = ceil(m / 2)是每个内部节点的最小子级数量(除root)可以拥有)第三层具有2d *(d-1)个条目... Hth层具有2 * d ^(h-2)*(d-1)个条目

Thus, if H is the height of the tree, the total number of entries is equal to n=2*d^H-1 which is equivalent to H=log_d((n+1)/2+1) 因此,如果H是树的高度,则条目的总数等于n = 2 * d ^ H-1,等于H = log_d((n + 1)/ 2 + 1)

Hence, in your case, if you have n=1000 records with each node having m children (m should be odd), then the worst case height will be equal to log_d((1000+1)/2+1) 因此,在您的情况下,如果您有n = 1000条记录,并且每个节点有m个孩子(m应该是奇数),则最坏情况下的高度将等于log_d((1000 + 1)/ 2 + 1)

It depends on the arity of the tree. 这取决于树的硬度。 You have to define this value. 您必须定义此值。 If you say that each node can have 4 children then and you have 1000 records, then the height is 如果您说每个节​​点可以有4个孩子,那么您有1000条记录,那么高度为

Best case log_4(1000) = 5 最佳情况log_4(1000)= 5

Worst case log_{4/2}(1000) = 10 最坏情况log_ {4/2}(1000)= 10

The arity is m and the number of records is n. 单位为m,记录数为n。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM