简体   繁体   English

在MySQL5.6中优化查询

[英]Optimizing query in MySQL5.6

I have an INNODB table levels : 我有一个INNODB表levels

+--------------------+--------------+------+-----+---------+-------+
| Field              | Type         | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| id                 | int(9)       | NO   | PRI | NULL    |       |
| level_name         | varchar(20)  | NO   |     | NULL    |       |
| user_id            | int(10)      | NO   |     | NULL    |       |
| user_name          | varchar(45)  | NO   |     | NULL    |       |
| rating             | decimal(5,4) | NO   |     | 0.0000  |       |
| votes              | int(5)       | NO   |     | 0       |       |
| plays              | int(5)       | NO   |     | 0       |       |
| date_published     | date         | NO   | MUL | NULL    |       |
| user_comment       | varchar(255) | NO   |     | NULL    |       |
| playable_character | int(2)       | NO   |     | 1       |       |
| is_featured        | tinyint(1)   | NO   | MUL | 0       |       |
+--------------------+--------------+------+-----+---------+-------+

There are ~4 million rows. 有大约400万行。 Because of the front-end functionality, I need to query this table with a variety of filters and sorts. 由于具有前端功能,因此我需要使用各种过滤器和排序查询该表。 They are on playable_character , rating , plays , and date_published . 它们在playable_characterratingplaysdate_published The date_published can be filtered to show by the last day, week, month, or anytime(last 3 years). 可以过滤date_published以在最后一天,一周,一个月或任何时间(过去3年)显示。 There's also paging. 还有分页。 So, depending on the user choices, the queries can look, for example, like one of these: 因此,根据用户的选择,查询可能看起来像其中之一:

SELECT * FROM levels
WHERE playable_character = 0 AND
    date_published BETWEEN date_sub(now(), INTERVAL 3 YEAR) AND now()
ORDER BY date_published DESC
LIMIT 0, 1000;

SELECT * FROM levels
WHERE playable_character = 4 AND
    date_published BETWEEN date_sub(now(), INTERVAL 1 WEEK) AND now()
ORDER BY rating DESC
LIMIT 4000, 1000;

SELECT * FROM levels
WHERE playable_character = 5 AND
    date_published BETWEEN date_sub(now(), INTERVAL 1 MONTH) AND now()
ORDER BY plays DESC
LIMIT 1000, 1000;

I started out with an index idx_date_char(date_published, playable_character) that works great on the first example query here -- basically anything that's ordering by date_published . 我从索引idx_date_char(date_published, playable_character)开始,该索引在此处的第一个示例查询中效果很好 -基本上是按date_published排序的date_published Using EXPLAIN, I get 'using index condition', which is good. 使用EXPLAIN,我得到“使用索引条件”,这很好。 I think I understand why the index works, since the same two indexed columns exist in the WHERE and ORDER BY clauses. 我想我理解索引为什么起作用的原因,因为WHERE和ORDER BY子句中存在相同的两个索引列。

My problem is with queries that ORDER by plays or rating . 我的问题是按playsrating ORDER的查询。 I understand I'm introducing a third column, but for the life of me I can't get an index that works well, despite trying just about every variation I could think of: composite indexes of all three or four in every order, and so on. 我知道我要介绍第三列,但是就我的一生而言,尽管尝试了我可能想到的几乎所有变体,但我仍然无法获得运行良好的索引:每个顺序中所有三个或四个的复合索引,并且以此类推。 Maybe the query could be written differently? 也许查询的书写方式可能不同?

I should add that rating and plays are always queried as DESC . 我应该添加该rating并且plays总是以DESC查询。 Only date_published may be either DESC or ASC . 只有date_published可以是DESCASC

Any suggestions greatly appreciated. 任何建议,不胜感激。 TIA. TIA。

The columns used in your where clause AND order by should be part of the index. where子句AND order by中使用的列应该是索引的一部分。 I would have an index on 我会有一个索引

( playable_character, date_published DESC, rating DESC, plays DESC )

The reason I would put the playable character FIRST is you want that ID primary, then all those dates within question. 我将可玩角色设置为FIRST的原因是,您希望该ID为主要ID,然后是所有这些日期在问题之内。 The rating and plays are just along for the ride for assisting the ORDER BY clause). 评分和比赛就可以帮助ORDER BY子句。

Think of the index like this. 这样考虑一下索引。 If you have it ordered by Date_Published, then Playable_Character, think of a room of boxes. 如果按Date_Published排序,然后按Playable_Character排序,则考虑一下盒子的空间。 Each box has a date.. Within that box for a given date, you have them in order of character. 每个框都有一个日期。在给定日期的框内,按字符顺序排列它们。 So, you have 3 years worth of data to go through, you have to open all boxes for the last 3 years and find the character you are looking for. 因此,您有3年的数据需要经过,您必须打开最近3年的所有框并找到所需的字符。

Now, think of it like this. 现在,这样想。 Each box is by character, and within that, all their dates are pre-sorted. 每个方框均按字符排列,其中所有日期均已预先排序。 So, you go to one box, open it... Move to the date in question and grab the records from XY range you want. 因此,您转到一个框,将其打开...移动到有问题的日期,并从所需的XY范围中获取记录。 Now, you can apply a simple order by of those records. 现在,您可以通过这些记录应用简单的订单。

It seems you would make good use of data sorted in this way for each of the queries: 看来您将充分利用以这种方式对每个查询排序的数据:

  1. playable_character, date_published playable_character,date_published
  2. playable_character, date_published, rating playable_character,date_published,等级
  3. playable_character, date_published, plays playable_character,date_published,播放

Bear in mind that the data you need sorted in the first query happens to be a subset of the data the second and third query needs, so we can get rid of it. 请记住,您在第一个查询中排序的数据恰好是第二个和第三个查询所需数据的子集,因此我们可以摆脱它。

Also note that adding DESC or ASC to an index is syntactically correct but doesn't actually change anything as that feature is not currently supported (it is expected to be supported in the future so that is why it is there). 还要注意,将DESCASC添加到索引在语法上是正确的,但实际上并不会更改任何内容,因为当前不支持该功能(预计将来会支持该功能,因此就是该功能所在)。 All indexes are stored in ascending order. 所有索引均以升序存储。 More information here . 更多信息在这里

So these are the indexes that you should create: 因此,这些是您应该创建的索引:

ALTER TABLE levels ADD INDEX (playable_character, date_published, rating)
ALTER TABLE levels ADD INDEX (playable_character, date_published, plays)

That should make the 3 queries up there run faster than Forrest Gump. 那应该使那里的3个查询比Forrest Gump运行得更快。

When your query includes a range predicate like BETWEEN , the order of columns in your index is important. 当查询包含像BETWEEN这样的范围谓词时,索引中列的顺序很重要。

  • First, include one or more columns referenced by equality predicates. 首先,包括由相等谓词引用的一个或多个列。
  • Next, include one column referenced by a range predicate. 接下来,包括由范围谓词引用的一列。
  • Any further columns in the index after the column referenced by a range predicate cannot be used for other range predicates or for sorting. 范围谓词引用的列之后的索引中任何其他列均不能用于其他范围谓词或排序。
  • If you have no range predicate, you can add a column for sort order. 如果没有范围谓词,则可以为排序顺序添加一列。

So your first query can benefit from an index on (playable_character, date_published) . 因此,您的第一个查询可以从(playable_character, date_published)上的索引中受益。 The sorting should be a no-op because the optimizer will just fetch rows in the index order. 排序应为空,因为优化器只会按索引顺序获取行。

The second and third queries are bound to do a filesort, because you have a range predicate and then you're sorting by a different column. 第二个查询和第三个查询必然要进行文件排序,因为您有一个范围谓词,然后要按不同的列进行排序。 If you had had only equality predicates, you would be able to use the third column to avoid the filesort, but that doesn't work when you have a range predicate. 如果仅具有相等谓词,则可以使用第三列来避免文件排序,但是当您具有范围谓词时,这将不起作用。

The best you can hope for is that the conditions reduce the size of the result set so that it can sort in memory without doing too many sort merge passes . 您所希望的最好的结果是,条件会减小结果集的大小,以使结果集可以在内存中进行排序,而无需进行过多的排序合并遍历 You can help this by increasing sort_buffer_size , but be careful not to increase it too much, because it's allocated per thread. 您可以通过增加sort_buffer_size来帮助解决此问题,但请注意不要增加太多,因为它是按线程分配的。

The ASC / DESC keywords in index definitions makes no difference in MySQL. 索引定义中的ASC / DESC关键字在MySQL中没有区别。
See http://dev.mysql.com/doc/refman/5.6/en/create-index.html : 参见http://dev.mysql.com/doc/refman/5.6/en/create-index.html

These keywords are permitted for future extensions for specifying ascending or descending index value storage. 这些关键字允许将来用于指定升序或降序索引值存储的扩展。 Currently, they are parsed but ignored; 目前,它们已被解析但被忽略; index values are always stored in ascending order. 索引值始终按升序存储。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM