mysql解释类型索引与所有性能问题

Question

I have the following table below: I have 3.5 million records.我有下表：我有 350 万条记录。

CREATE TABLE `video_downloads` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `user_id` bigint(20) NOT NULL,
  `video_id` bigint(20) NOT NULL,
  `download_at` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3573041 DEFAULT CHARSET=latin1

Only id and user_id are indexed .只有id和user_id被indexed 。

Here is my query:这是我的查询：

select max(video_id), user_id
from video_downloads
group by user_id

With this current table setup, this query took me about more than 10 mins to run.使用当前的表设置，这个查询花了我大约 10 多分钟来运行。 So here is the explain所以这里是explain

| id | select_type | table           | type  | possible_keys | key     | key_len | ref | rows    | Extra |
|----|-------------|-----------------|-------|---------------|---------|---------|-----|---------|-------|
| 1  | SIMPLE      | video_downloads | index |               | user_id | 8       |     | 3562709 |       |

{
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "video_downloads",
      "access_type": "index",
      "key": "user_id",
      "key_length": "8",
      "used_key_parts": ["user_id"],
      "rows": 3562709,
      "filtered": 100
    }
  }
}

And then I removed the index for user_id running the same query and it took me about 1.5 s .然后我删除了运行相同查询的user_id的索引，它花了我大约1.5 s 。

Here is the explain without user_id index这是没有user_id索引的explain

| id | select_type | table           | type | possible_keys | key | key_len | ref | rows    | Extra                           |
|----|-------------|-----------------|------|---------------|-----|---------|-----|---------|---------------------------------|
| 1  | SIMPLE      | video_downloads | ALL  |               |     |         |     | 3562709 | Using temporary; Using filesort |

{
  "query_block": {
    "select_id": 1,
    "filesort": {
      "sort_key": "video_downloads.user_id",
      "temporary_table": {
        "table": {
          "table_name": "video_downloads",
          "access_type": "ALL",
          "rows": 3562709,
          "filtered": 100
        }
      }
    }
  }
}

I think my main question is why there is such a huge difference in terms of the time with and without the index on user_id .我认为我的主要问题是为什么在user_id上有和没有索引的时间会有如此巨大的差异。 And when there is an index on user_id the type is index which means it's using the index but the query is very slow.并且当user_id上有索引时， type是index ，这意味着它正在使用索引但查询非常慢。

I am a bit confused with the result I do not think I understand why this is happening, I check the official doc still not fully understand it.我对结果有点困惑我不明白为什么会发生这种情况，我查看了官方文档仍然没有完全理解它。

update I think the main reason could be it's using the index data to fetch the row from disk, and it's going to do it one by one and randomly.更新我认为主要原因可能是它使用index数据从磁盘中获取行，并且它会一个一个随机地进行。 So that's 3.5 million times random read from disk.所以这是从磁盘随机读取的 350 万次。 That's the only reason I can think of.这是我能想到的唯一原因。 However, is that going to be that slow?然而，这会那么慢吗？ (more than 10 mins vs 1.5 s ???). （超过10 mins vs 1.5 s ？？？）。

However, from MySQL doc但是，来自MySQL 文档

Sometimes MySQL does not use an index, even if one is available.有时 MySQL 不使用索引，即使索引可用。 One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table.发生这种情况的一种情况是优化器估计使用索引将需要 MySQL 访问表中很大比例的行。

In my case, MySQL does not seem to take the right decision.就我而言，MySQL 似乎没有做出正确的决定。 I can see the possible_keys is null but key is still using the index why?我可以看到possible_keys key为null但key仍在使用索引为什么？ is it because of the group by ?是不是因为group by ？

Answer 1

The "statistics" that the Optimizer uses are not always perfect.优化器使用的“统计数据”并不总是完美的。 However "10 min" vs "1.5 sec" is quite spectacular.然而，“10 分钟”与“1.5 秒”是相当壮观的。 I wonder if there was outside interference.我想知道是否有外部干扰。 Oh, what Engine is being used?哦，正在使用什么引擎？

When it used the single-column index, it probably had to bounce between the index and the data, fetching 3.5M rows one at a time, but randomly.当它使用单列索引时，它可能不得不在索引和数据之间跳来跳去，一次获取 350 万行，但随机获取。

When it did the table scan ("All"), it also read 3.5M rows, but sequentially.当它进行表扫描（“全部”）时，它也读取了 350 万行，但是是按顺序读取的。 But then it had to followup with a sort.但随后它必须进行排序。

Buffer_pool缓冲池

16M for innodb_buffer_pool_size is the problem. innodb_buffer_pool_size 16M 是问题所在。 Set that to about 70% of RAM size unless you have an especially small machine.除非您的机器特别小，否则将其设置为 RAM 大小的 70% 左右。

The 10-minute query was probably solid I/O, reading and rereading the data from the table randomly . 10 分钟的查询可能是可靠的 I/O，随机读取和重新读取表中的数据。

On a spinning disk (HDD, not SDD), 3.5M reads at 100 blocks/second is several hours.在旋转磁盘（HDD，而不是 SDD）上，以 100 块/秒的速度读取 3.5M 需要几个小时。 So you were lucky to get finished in only 10 minutes.所以，你是幸运的获得成品仅需10分钟。 The 1.5s says how useful a big enough RAM cache is. 1.5s 表示足够大的 RAM 缓存有多大用处。

The 1.5s may be all it took to read straight (not randomly) through the entire table once. 1.5 秒可能是直接（不是随机）读取整个表格一次所需的时间。

mysql解释类型索引与所有性能问题

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-09-04 18:09:29

mysql解释类型索引与所有性能问题

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-09-04 18:09:29

解决方案1
1 已采纳 2020-09-04 18:09:29