简体   繁体   中英

When we create a clustered index does it takes extra space?

I am asking this question with repect to mysql database.I read that clustered index orders the table based on primary key or columns that we provide for making clustered index, where as in non clustered index there is separate space taken for key and record pointer.

Also I read as there is no separate index table, clustered index is faster than non clustered index where as non clustered index must first look into index table find corresponding record pointer and fetch record data

Does that mean there is no extra space taken for clustered index?

PS:I know that there are already some similar answers on this question but I can't understand.

There is no extra space taken because every InnoDB table is stored as the clustered index. There is in fact only the clustered index, and secondary indexes. There's no separate storage for data, because all the unindexed columns are simply stored in the terminal nodes of the clustered index. You might like to read more about it here:https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html

It is true that if you do a lookup using a secondary index, and then select columns besides those in the secondary index, InnoDB would do a sort of double lookup. Once to search the secondary index, which results in the value of the primary key(s) where the value you are searching for is found, and then it uses those primary keys to search the clustered index to combine with the other columns.

This double-lookup is mitigated partially by the Adaptive Hash , which is a cache of frequently-searched values. This cache is populated automatically as you run queries. So over time, if you run queries for the same values over again, it isn't so costly.

The situation is more complex than your question.

First, let's talk only about ENGINE=InnoDB ; other engines work differently.

  • There is about 1% overhead for the non-leaf BTree nodes to "cluster" the PRIMARY KEY with the data.

  • If you do not explicitly specify a PRIMARY KEY , it may be able to use a UNIQUE key as the PK. But if not, then a hidden, 6-byte number will be used for the PK. This would take more space than if you had, say, a 4-byte INT for the PK, That is, you cannot create a table without a PRIMARY KEY .

  • The above 2 items is TMI; think of the PK as taking no extra space.

  • Yes, lookup by the PK is faster than lookup by a secondary key. But if you need a secondary key, then create it. Playing a game of first fetching ids, then fetching the rows is slower than doing all the work in a single query.

  • A Secondary key also uses BTree also. But it is sorted by the key's column(s) and does not include all the other columns. Instead, it includes the PK's columns. (Hence the "double-lookup" that Bill mentioned.)

  • A "covering index" is one that contains all the columns needed for a particular SELECT . In that case, all the work can be done in the index's BTree, thereby avoiding the double-lookup. That is, a covering index is as fast as a primary key lookup. (I would guess that 20% of indexes are "covering" or could be made covering by adding a column or two.)

  • BTrees have a bunch of overhead. A Rule of Thumb: Add up the size of each column (4 bytes for INT , etc), then multiply by 2 or 3. The result will often be a good estimate of the disk space needed for the Data or Index Btree.

  • This discussion does not cover FULLEXT or SPATIAL indexes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM