简体繁体 English

什么是聚集索引表？

[英]What is a Clustered Index table?

原文 2018-03-14 21:04:39 6 1 mysql/ sql-server/ database/ postgresql/ db2

I may be wrong, but it seems to be different opinions in the interwebs about what these are. 我可能错了，但在互联网上似乎有不同意见。 SQL Server, MySQL, DB2, and PostgreSQL show different definitions for these tables. SQL Server，MySQL，DB2和PostgreSQL显示了这些表的不同定义。

After reading a ton from different vendors (database manuals, user posts, etc.) I was able to distinguish three types of tables of interest (there are many, many more types of no interest for this question). 在阅读了不同供应商的数据（数据库手册，用户帖子等）之后，我能够区分出三种类型的感兴趣的表（对于这个问题，有许多，更多类型没有兴趣）。 Please bear with me: 请多多包涵：

Heap Table : 堆表：
- All rows are stored (probably unordered) in the heap table. 所有行都存储在堆表中（可能是无序的）。
- Each row has an internal ROWID that identifies it. 每行都有一个标识它的内部ROWID。
- Indexes are optional. 索引是可选的。 If added, they include the indexed columns as the index key, plus the ROWID (to eventually access the real rows in the heap). 如果添加，它们包括索引列作为索引键，加上ROWID（最终访问堆中的实际行）。
- Note : this case is of no interest for this question, but I added it here to make a difference with the third case below. 注意：这个案例对这个问题没有意义，但是我在这里添加了它，以便与下面的第三个案例有所不同。
Pure Index Table : <-- Is this a Clustered Index Table? 纯索引表 ：< - 这是一个聚集索引表吗？
- There's one main index that includes the key columns, as well as the non-key columns in it. 有一个主索引包括键列，以及其中的非键列。 All the data is stored in the index. 所有数据都存储在索引中。
- The data follows the main index order, so it's by definition sorted by the main index. 数据遵循主索引顺序，因此按定义按主索引排序。
- There's no need for a heap table to store the rows. 堆表不需要存储行。 All data is already in the index. 所有数据都已在索引中。 There's no ROWID whatsoever, since there's no heap table. 没有任何ROWID，因为没有堆表。
- SQL Server tables (typically) fall by default in this category. 默认情况下， SQL Server表（通常）属于此类别。
- MySQL InnoDB tables seem to also fall in this category since they don't seem to have a heap table at all. MySQL InnoDB表似乎也属于这一类，因为它们似乎根本没有堆表。
Index + Sorted Heap Table : <-- Is this a Clustered Index Table? 索引+排序堆表 ：< - 这是一个聚簇索引表吗？
- There's one main "clustered index". 有一个主要的“聚集索引”。
- There's a heap table where the rows are stored in the order defined by the clustered index. 有一个堆表，其中行按聚簇索引定义的顺序存储。
- Each row in the heap table has a ROWID. 堆表中的每一行都有一个ROWID。
- The clustered index does not include non-key columns, but a ROWID to access the real row in the heap table. 聚簇索引不包括非键列，而是包含用于访问堆表中实际行的ROWID。
- DB2 seems to be able to "Cluster" tables. DB2似乎能够“集群”表。
- PostgreSQL seems to also call these tables as "Cluster ing Index" tables. PostgreSQL似乎也称这些表为“Cluster ing Index”表。

Now, which ones of these #2 or #3 is a "Clustered Index Table"? 现在，这些＃2或＃3中的哪些是“聚集索引表”？ Who's telling the truth and who's lying? 谁在说实话，谁说谎？ :D ：d

In other words, is the term "Clustered Index Table" a commercial term that each vendor freely defines as he/she pleases, or is there an official unique definition according to some official database theory? 换句话说，术语“聚集索引表”是一个商业术语，每个供应商可以根据他/她喜欢自由定义，还是根据某些官方数据库理论有官方唯一定义？

1 个解决方案

As far as I know, a "clustered index" is an index where the leaf nodes of the index are the data pages. 据我所知，“聚簇索引”是索引的叶节点是数据页的索引。 This is different from a non-clustered index where the leaf nodes are references to rows stored on the data pages. 这与非聚集索引不同，其中叶节点是对存储在数据页上的行的引用。

A table can have at most one clustered index. 一个表最多只能有一个聚簇索引。 In a table with a clustered index, the data is sorted on by the index keys. 在具有聚簇索引的表中，数据按索引键排序。

Postgres does not support clustered indexes. Postgres不支持聚簇索引。 It does have a table optimization called "cluster" that sorts the data based on an index. 它确实有一个名为“cluster”的表优化，它根据索引对数据进行排序。 However, this ordering is not maintained, so it is not the same as a clustered index. 但是，不保留此顺序，因此它与聚簇索引不同。