MySQL Clustered vs Non Clustered Index Performance

Question

I'm running a couple tests on MySQL Clustered vs Non Clustered indexes where I have a table 100gb_table which contains ~60 million rows:

100gb_table schema:
CREATE TABLE 100gb_table (
    id int PRIMARY KEY NOT NULL AUTO_INCREMENT,
    c1 int,
    c2 text,
    c3 text,
    c4 blob NOT NULL,
    c5 text,
    c6 text,
    ts timestamp NOT NULL default(CURRENT_TIMESTAMP)
);

and I'm executing a query that only reads the clustered index:

SELECT id FROM 100gb_table ORDER BY id;

I'm seeing that it takes almost an ~55 min for this query to complete which is strangely slow. I modified the table by adding another index on top of the Primary Key column and ran the following query which forces the non-clustered index to be used:

SELECT id FROM 100gb_table USE INDEX (non_clustered_key) ORDER BY id;

This finished in <10 minutes , much faster than reading with the clustered index. Why is there such a large discrepancy between these two? My understanding is that both indexes store the index column's values in a tree structure, except the clustered index contains table data in the leaf nodes so I would expect both queries to be similarly performant. Could the BLOB column possibly be distorting the clustered index structure?

Answer 1

The answer comes in how the data is laid out.

The PRIMARY KEY is "clustered" with the data; that is, the data is order ed by the PK in a B+Tree structure. To read all of the ids , the entire BTree must be read.

Any secondary index is also in a B+Tree structure, but it contains (1) the columns of the index, and (2) any other columns in the PK.

In your example (with lots of [presumably] bulky columns), the data BTree is a lot bigger than the secondary index (on just id ). Either test probably required reading all the relevant blocks from the disk.

A side note... This is not as bad as it could be. There is a limit of about 8KB on how big a row can be. TEXT and BLOB columns, when short enough, are included in that 8KB. But when one is bulky, it is put in another place, leaving behind a 'pointer' to the text/blob. Hence, the main part of the data BTree is smaller than it might be if all the text/blob data were included directly.

Since SELECT id FROM tbl is a mostly unnecessary query, the design of InnoDB does not worry about the inefficiency you discovered.

Tack on ORDER BY or WHERE , etc, and there are many different optimizations that could into play. You might even find that INDEX(c1) will let your query run in not much more than 10 minutes. (I think I have given you all the clues for 'why'.)

Also, if you had done SELECT * FROM tbl , it might have taken much longer than 55 minutes. This is because of having extra [random] fetches to get the texts/blobs from the "off-record" storage. And from the network time to shovel far more data.

MySQL Clustered vs Non Clustered Index Performance

Question

1 answers

solution1
1 ACCPTED 2022-07-08 16:40:07

MySQL Clustered vs Non Clustered Index Performance

Question

1 answers

solution1 1 ACCPTED 2022-07-08 16:40:07

solution1
1 ACCPTED 2022-07-08 16:40:07