[英]mysql explain type index vs all performance question
I have the following table below: I have 3.5 million records.我有下表: 我有 350 万条记录。
CREATE TABLE `video_downloads` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) NOT NULL,
`video_id` bigint(20) NOT NULL,
`download_at` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3573041 DEFAULT CHARSET=latin1
Only id
and user_id
are indexed
.只有
id
和user_id
被indexed
。
Here is my query:这是我的查询:
select max(video_id), user_id
from video_downloads
group by user_id
With this current table setup, this query took me about more than 10 mins to run.使用当前的表设置,这个查询花了我大约 10 多分钟来运行。 So here is the
explain
所以这里是
explain
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|----|-------------|-----------------|-------|---------------|---------|---------|-----|---------|-------|
| 1 | SIMPLE | video_downloads | index | | user_id | 8 | | 3562709 | |
{
"query_block": {
"select_id": 1,
"table": {
"table_name": "video_downloads",
"access_type": "index",
"key": "user_id",
"key_length": "8",
"used_key_parts": ["user_id"],
"rows": 3562709,
"filtered": 100
}
}
}
And then I removed the index for user_id
running the same query and it took me about 1.5 s
.然后我删除了运行相同查询的
user_id
的索引,它花了我大约1.5 s
。
Here is the explain
without user_id
index这是没有
user_id
索引的explain
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|----|-------------|-----------------|------|---------------|-----|---------|-----|---------|---------------------------------|
| 1 | SIMPLE | video_downloads | ALL | | | | | 3562709 | Using temporary; Using filesort |
{
"query_block": {
"select_id": 1,
"filesort": {
"sort_key": "video_downloads.user_id",
"temporary_table": {
"table": {
"table_name": "video_downloads",
"access_type": "ALL",
"rows": 3562709,
"filtered": 100
}
}
}
}
}
I think my main question is why there is such a huge difference in terms of the time with and without the index on user_id
.我认为我的主要问题是为什么在
user_id
上有和没有索引的时间会有如此巨大的差异。 And when there is an index on user_id
the type
is index
which means it's using the index but the query is very slow.并且当
user_id
上有索引时, type
是index
,这意味着它正在使用索引但查询非常慢。
I am a bit confused with the result I do not think I understand why this is happening, I check the official doc still not fully understand it.我对结果有点困惑我不明白为什么会发生这种情况,我查看了官方文档仍然没有完全理解它。
update I think the main reason could be it's using the index
data to fetch the row from disk, and it's going to do it one by one and randomly.更新我认为主要原因可能是它使用
index
数据从磁盘中获取行,并且它会一个一个随机地进行。 So that's 3.5 million times random read from disk.所以这是从磁盘随机读取的 350 万次。 That's the only reason I can think of.
这是我能想到的唯一原因。 However, is that going to be that slow?
然而,这会那么慢吗? (more than
10 mins
vs 1.5 s
???). (超过
10 mins
vs 1.5 s
???)。
However, from MySQL doc但是,来自MySQL 文档
Sometimes MySQL does not use an index, even if one is available.
有时 MySQL 不使用索引,即使索引可用。 One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table.
发生这种情况的一种情况是优化器估计使用索引将需要 MySQL 访问表中很大比例的行。
In my case, MySQL does not seem to take the right decision.就我而言,MySQL 似乎没有做出正确的决定。 I can see the
possible_keys
is null
but key
is still using the index why?我可以看到
possible_keys
key
为null
但key
仍在使用索引为什么? is it because of the group by
?是不是因为
group by
?
The "statistics" that the Optimizer uses are not always perfect.优化器使用的“统计数据”并不总是完美的。 However "10 min" vs "1.5 sec" is quite spectacular.
然而,“10 分钟”与“1.5 秒”是相当壮观的。 I wonder if there was outside interference.
我想知道是否有外部干扰。 Oh, what Engine is being used?
哦,正在使用什么引擎?
When it used the single-column index, it probably had to bounce between the index and the data, fetching 3.5M rows one at a time, but randomly.当它使用单列索引时,它可能不得不在索引和数据之间跳来跳去,一次获取 350 万行,但随机获取。
When it did the table scan ("All"), it also read 3.5M rows, but sequentially.当它进行表扫描(“全部”)时,它也读取了 350 万行,但是是按顺序读取的。 But then it had to followup with a sort.
但随后它必须进行排序。
Buffer_pool缓冲池
16M for innodb_buffer_pool_size
is the problem. innodb_buffer_pool_size
16M 是问题所在。 Set that to about 70% of RAM size unless you have an especially small machine.除非您的机器特别小,否则将其设置为 RAM 大小的 70% 左右。
The 10-minute query was probably solid I/O, reading and rereading the data from the table randomly . 10 分钟的查询可能是可靠的 I/O,随机读取和重新读取表中的数据。
On a spinning disk (HDD, not SDD), 3.5M reads at 100 blocks/second is several hours.在旋转磁盘(HDD,而不是 SDD)上,以 100 块/秒的速度读取 3.5M 需要几个小时。 So you were lucky to get finished in only 10 minutes.
所以,你是幸运的获得成品仅需10分钟。 The 1.5s says how useful a big enough RAM cache is.
1.5s 表示足够大的 RAM 缓存有多大用处。
The 1.5s may be all it took to read straight (not randomly) through the entire table once. 1.5 秒可能是直接(不是随机)读取整个表格一次所需的时间。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.