简体   繁体   English

数据库如何在B-Tree / B + Tree内部存储数据

[英]How Database stores data internally in B-Tree/B+Tree

My question is that How database stores data and how it performs query internally. 我的问题是数据库如何存储数据以及如何在内部执行查询。

Suppose we have following fields in our table: 假设我们的表中包含以下字段:

  1. ID ID
  2. Name 名称
  3. Age 年龄
  4. Weight 重量
  5. Manager 经理

and we query select * from Table1 where age>50 and weight<100 然后查询select * from Table1 where age>50 and weight<100

I am just curious that how it perform query internally. 我很好奇它如何在内部执行查询。

What will the Node of B-Tre/B+Tree contains in this example? 在此示例中,B-Tre / B + Tree的节点将包含什么?

The example you have chosen is one of the few cases where a single Tree can't do the job (two independent ranges). 您选择的示例是少数单个树无法完成工作(两个独立范围)的情况之一。

However, the first chapter of my work-in-progress e-Book explains the inner workings of B-Tree indexes: http://use-the-index-luke.com/anatomy/ 但是,我正在进行的电子书的第一章介绍了B-Tree索引的内部工作原理: http : //use-the-index-luke.com/anatomy/

EDIT for more details why two indexes might be useful for the above example. 编辑更多细节,为什么两个索引可能对上面的示例有用。

The above query can be supported by three possible index configurations: 三种可能的索引配置可以支持以上查询:

  1. concatenated index on AGE and then WEIGHT (in this order). AGE上串联索引,然后在WEIGHT上串联(按此顺序)。
    In case, the query would read all records WHERE AGE > 50 and then filter by WEIGHT . 如果查询将读取WHERE AGE > 50所有记录,然后按WEIGHT进行过滤。

  2. concatenated index on WEIGHT and then AGE (the other order). WEIGHT上连接索引,然后在AGE上连接索引(另一个顺序)。
    That goes the different way: read all records WHERE WEIGHT < 100 and then filter by AGE . 这是不同的方式:读取WHERE WEIGHT < 100所有记录,然后按AGE过滤。

Whatever is more efficient depends on the data you have. 哪种效率更高取决于您拥有的数据。 If there are less records AGE > 50 than WEIGHT < 100 the first will be more efficient, otherwise the second. 如果AGE > 50记录少于WEIGHT < 100的记录,则第一个效率更高,否则第二个效率更高。 However, if you query with different values, the picture might change. 但是,如果使用不同的值查询,则图片可能会更改。

The reason that a concatenated index can't support the query well is that each index order is on one axis only. 串联索引不能很好地支持查询的原因是,每个索引顺序仅在一个轴上。 each index entry is before or after another one, but never next to it. 每个索引条目都在另一个索引条目之前或之后,但永远不会在它旁边。 All index entries build one chain. 所有索引条目建立一个链。

A query that has two independent range queries would require two axes, not like a chain, but more like a chess board. 具有两个独立范围查询的查询将需要两个轴,这不像一个链,而是更像一个棋盘。 one axis for AGE the other for WEIGHT . 一个轴代表AGE ,另一个轴代表WEIGHT If that would be possible, your query would need to scan only one corner of the chess board. 如果可能的话,您的查询将只需要扫描棋盘的一个角。

However, a b-tree has only one axis, hence you must chose which criteria to use first. 但是,b树只有一个轴,因此您必须选择首先使用哪个条件。 If you chose AGE it means that starting with AGE 50 , the entire chain will be scanned until the end. 如果选择AGE则表示从AGE 50开始,将扫描整个链,直到结束。 Only some of the records stored at the side of the chain will also qualify for WEIGHT < 100 , the other records must be read but will be discarded. 仅存储在链的一侧的某些记录也符合WEIGHT < 100 ,其他记录必须读取但将被丢弃。

So, a long story to explain why one tree can not support a query with two independent range clauses. 因此,有一个很长的故事来解释为什么一棵树不能支持带有两个独立范围子句的查询。 On the other hand, one concatenated index can do the following quite well: 另一方面,一个串联索引可以很好地完成以下任务:

WHERE age = 50 AND weight < 100
WHERE weight = 100 AND age > 50
WHERE age > 50 AND age < 70;

However, the problem arises if there are two inequality operators are used on two different columns. 但是,如果在两个不同的列上使用两个不等式运算符,则会出现问题。

So, what to do? 那么该怎么办?

The third possible approach is to have two independent indexes on the two columns. 第三种可能的方法是在两列上具有两个独立的索引。 That allows to have as many axes as you like (just create more indexes). 这样就可以拥有任意数量的轴(只需创建更多索引)。 However, there are a few huge problems with that. 但是,这存在一些巨大的问题。 First of all, not all DB products support that. 首先,并不是所有的数据库产品都支持。 Whenever it is supported, it is a rather expansive operation. 无论何时支持,它都是一个相当大的操作。 It works typically that way that each index is scanned, a bitmap index is built for each result. 它通常以扫描每个索引的方式工作,为每个结果构建一个位图索引。 Those bitmap indexes are then joined to apply the AND operator. 然后将那些位图索引连接起来以应用AND运算符。 That's a lot of data munging--it is only worth the effort if each condition is not very selective for it's own, but both together are very selective. 这需要大量的数据处理-仅当每个条件对其自身的选择性不是很高时才值得付出努力,但同时两者都具有很高的选择性。

Wan't my advice? 我没有建议吗?

If your query runs in an OLTP environment: use one concatenated index. 如果查询在OLTP环境中运行:请使用一个串联索引。 Two independent indexes are an option of last resort only. 两个独立的索引仅是最后的选择。 However, if you are working in an OLAP environment, you might anyways need bitmap indexes. 但是,如果您在OLAP环境中工作,则可能始终需要位图索引。

ps.: Indexing AGE was an exercise in my book (with solution)--especially because storing AGE is a bad practice, you should store the date of birth instead. ps .:索引AGE是我书中的一项练习 (包含解决方案),尤其是因为存储AGE是一种不好的做法,因此您应该存储出生日期。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM