简体   繁体   中英

mysql not picking up the optimal index

Here's my table:

CREATE TABLE `idx_weight` (
  `ID` bigint(20) NOT NULL AUTO_INCREMENT,
  `SECURITY_ID` bigint(20) NOT NULL COMMENT,
  `CONS_ID` bigint(20) NOT NULL,
  `EFF_DATE` date NOT NULL,
  `WEIGHT` decimal(9,6) DEFAULT NULL,
  PRIMARY KEY (`ID`),
  UNIQUE KEY `BPK_AK` (`SECURITY_ID`,`CONS_ID`,`EFF_DATE`),
  KEY `idx_weight_ix` (`SECURITY_ID`,`EFF_DATE`)
) ENGINE=InnoDB AUTO_INCREMENT=75334536 DEFAULT CHARSET=utf8

For query 1:

explain select SECURITY_ID, min(EFF_DATE) as startDate, max(EFF_DATE) as endDate from idx_weight where security_id = 1782 :

+----+-------------+------------+------+----------------------+---------------+---------+-------+--------+-------------+
| id | select_type | table      | type | possible_keys        | key           | key_len | ref   | rows   | Extra       |
+----+-------------+------------+------+----------------------+---------------+---------+-------+--------+-------------+
|  1 | SIMPLE      | idx_weight | ref  | BPK_AK,idx_weight_ix | idx_weight_ix | 8       | const | 887856 | Using index |
+----+-------------+------------+------+----------------------+---------------+---------+-------+--------+-------------+

This query runs fine.

Now Query 2 (the only thing changed is the security_id param):

explain select SECURITY_ID, min(EFF_DATE) as startDate, max(EFF_DATE) as endDate from idx_weight where security_id = 26622 :

+----+-------------+------------+------+----------------------+--------+---------+-------+----------+-------------+
| id | select_type | table      | type | possible_keys        | key    | key_len | ref   | rows     | Extra       |
+----+-------------+------------+------+----------------------+--------+---------+-------+----------+-------------+
|  1 | SIMPLE      | idx_weight | ref  | BPK_AK,idx_weight_ix | BPK_AK | 8       | const | 10700002 | Using index |
+----+-------------+------------+------+----------------------+--------+---------+-------+----------+-------------+

Notice that it picks up the index BPK_AK , and the actual query runs for over 1 minute.

This is incorrect. Second time took over 10 seconds. I'm guessing the first time the index is not in the buffer pool.

I can get a workaround by appending group by security_id :

explain select SECURITY_ID, min(EFF_DATE) as startDate, max(EFF_DATE) as endDate from idx_weight where security_id = 26622 group by security_id :

+----+-------------+------------+-------+----------------------+---------------+---------+------+-------+---------------------------------------+
| id | select_type | table      | type  | possible_keys        | key           | key_len | ref  | rows  | Extra                                 |
+----+-------------+------------+-------+----------------------+---------------+---------+------+-------+---------------------------------------+
|  1 | SIMPLE      | idx_weight | range | BPK_AK,idx_weight_ix | idx_weight_ix | 8       | NULL | 10314 | Using where; Using index for group-by |
+----+-------------+------------+-------+----------------------+---------------+---------+------+-------+---------------------------------------+

But I still don't understand why would mysql not picking idx_weight_ix for some security_id , which is a covering index for this query (and a lot cheaper). Any idea?

=========================================================================

Update: @oysteing Learned a new trick, cool! :)

Here's the optimizer trace:

Query 1: https://gist.github.com/aping/c4388d49d666c43172a856d77001f4ce

Query 2: https://gist.github.com/aping/1af5504b428ca136a8b1c41c40d763e4

And some extra information that might be useful:

From INFORMATION_SCHEMA.STATISTICS :

+------------+---------------+--------------+-------------+-------------+
| NON_UNIQUE | INDEX_NAME    | SEQ_IN_INDEX | COLUMN_NAME | CARDINALITY |
+------------+---------------+--------------+-------------+-------------+
|          0 | BPK_AK        |            1 | SECURITY_ID |       74134 |
|          0 | BPK_AK        |            2 | CONS_ID     |      638381 |
|          0 | BPK_AK        |            3 | EFF_DATE    |    68945218 |
|          1 | idx_weight_ix |            1 | SECURITY_ID |       61393 |
|          1 | idx_weight_ix |            2 | EFF_DATE    |      238564 |
+------------+---------------+--------------+-------------+-------------+

CARDINALITY for SECURITY_ID are different, but technically they should be exactly the same, am I right?

From this: https://dba.stackexchange.com/questions/49656/find-the-size-of-each-index-in-a-mysql-table?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

+---------------+-------------------+
| index_name    | indexentry_length |
+---------------+-------------------+
| BPK_AK        |        1376940279 |
| idx_weight_ix |         797175951 |
+---------------+-------------------+

The index size is about 800MB vs 1.3GB.

Running select count(*) from idx_weight where security_id = 1782 returns 509994

and select count(*) from idx_weight where security_id = 26622 returns 5828054

Then force using BPK_AK for query 1:

select SQL_NO_CACHE SECURITY_ID, min(EFF_DATE) as startDate, max(EFF_DATE) as endDate from idx_weight use index (BPK_AK) where security_id = 1782 took 0.2 sec.

So basically, 26622 has 10 times more rows than 1782 , but using the same index, it took 50 times more time.

PS: buffer pool size is 25GB.

When you mix normal columns (SECURITY_ID) and aggregate functions (min & max in your case), you should use the GROUP BY. If you do not, MySQL is free give any result it pleases. With GROUP BY, you will get the correct result. Newer MySQL databases force this behavior by default.

The reason the second index is not selected when you leave out the GROUP BY, is most likely due to the fact that the aggregate functions are not limited into the same group (=security_id) abd therefore cannot be used as limiter.

The optimizer traces shows that the reason for the difference in the selection of index, is due to the estimates received from InnoDB. For each potential index, the optimizer asks the storage engine for an estimate on how many records are in the range. For the first query it gets the following estimates:

BPK_AK:       1031808
idx_weight_ix: 887856

So the estimated read cost is lowest for idx_weight_ix, and this index is chosen. For the second query the estimates are:

BPK_AK:        11092112
idx_weight_ix: 12003098

And the estimated read cost of BPK_AK is lowest due to the lower number of rows . You could say that MySQL should know that the real number of rows in the range is the same in both cases, but that logic has not been implemented.

I do not know the details of how InnoDB computes this estimates, but it basically does two "index dives" to find the first and last row in the range, and then somehow computes the "distance" between the two. It could be that the estimates are affected by unused space in index pages, and that OPTIMIZE TABLE could fix this, but running OPTIMIZE TABLE will probably take very long on such a large table.

The quickest way to solve this, is to add a GROUP BY clause as mentioned by a few other people here. Then MySQL will only need to read 2 rows per group; the first and the last since index is ordered by EFF_DATE for each value of security_id. Alternatively, you could use FORCE INDEX to force a particular index.

It may also be that MySQL 8.0 will handle this query better. The cost model has change somewhat, and it will put higher cost on "cold" indexes that are not cached in the buffer pool.

I can get a workaround by appending group by security_id

Well, yes. I wouldn't do it any other way, since when you use aggregate functions you NEED to group by something. I didn't even know that MySQL allowed you to work around it.

I think @slaakso is right. Upvote him.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM