简体   繁体   中英

What is a Clustered Index table?

I may be wrong, but it seems to be different opinions in the interwebs about what these are. SQL Server, MySQL, DB2, and PostgreSQL show different definitions for these tables.

After reading a ton from different vendors (database manuals, user posts, etc.) I was able to distinguish three types of tables of interest (there are many, many more types of no interest for this question). Please bear with me:

  1. Heap Table :

    • All rows are stored (probably unordered) in the heap table.
    • Each row has an internal ROWID that identifies it.
    • Indexes are optional. If added, they include the indexed columns as the index key, plus the ROWID (to eventually access the real rows in the heap).
    • Note : this case is of no interest for this question, but I added it here to make a difference with the third case below.
  2. Pure Index Table : <-- Is this a Clustered Index Table?

    • There's one main index that includes the key columns, as well as the non-key columns in it. All the data is stored in the index.
    • The data follows the main index order, so it's by definition sorted by the main index.
    • There's no need for a heap table to store the rows. All data is already in the index. There's no ROWID whatsoever, since there's no heap table.
    • SQL Server tables (typically) fall by default in this category.
    • MySQL InnoDB tables seem to also fall in this category since they don't seem to have a heap table at all.
  3. Index + Sorted Heap Table : <-- Is this a Clustered Index Table?

    • There's one main "clustered index".
    • There's a heap table where the rows are stored in the order defined by the clustered index.
    • Each row in the heap table has a ROWID.
    • The clustered index does not include non-key columns, but a ROWID to access the real row in the heap table.
    • DB2 seems to be able to "Cluster" tables.
    • PostgreSQL seems to also call these tables as "Cluster ing Index" tables.

Now, which ones of these #2 or #3 is a "Clustered Index Table"? Who's telling the truth and who's lying? :D

In other words, is the term "Clustered Index Table" a commercial term that each vendor freely defines as he/she pleases, or is there an official unique definition according to some official database theory?

As far as I know, a "clustered index" is an index where the leaf nodes of the index are the data pages. This is different from a non-clustered index where the leaf nodes are references to rows stored on the data pages.

A table can have at most one clustered index. In a table with a clustered index, the data is sorted on by the index keys.

Postgres does not support clustered indexes. It does have a table optimization called "cluster" that sorts the data based on an index. However, this ordering is not maintained, so it is not the same as a clustered index.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM