简体   繁体   English

在mySQL中为特定查询优化索引

[英]Optimize Indexes for Particular Query in mySQL

I have a fairly simple query that is taking about 14 seconds to complete and I would like to speed it up. 我有一个相当简单的查询,大约需要14秒才能完成,我想加快速度。 I think I have the correct indexes in place, but I'm not sure... 我想我有正确的索引,但是我不确定...

Here is the query 这是查询

SELECT *
FROM opportunities
WHERE cid = 7785
  AND STATUS != 4
  AND otype != 200
  AND links > 0
  AND ontopic != 'F'
ORDER BY links DESC
LIMIT 0, 100;

Here is the table schema 这是表架构

CREATE TABLE `opportunities` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `cid` int(11) NOT NULL,
  `url` varchar(900) CHARACTER SET utf8 NOT NULL,
  `status` tinyint(4) NOT NULL,
  `links` int(11) NOT NULL,
  `otype` int(11) NOT NULL,
  `reserved` tinyint(4) NOT NULL,
  `ontopic` varchar(3) CHARACTER SET utf8 NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `cid` (`cid`,`url`),
  KEY `cid1` (`cid`),
  KEY `url` (`url`),
  KEY `otype` (`otype`),
  KEY `reserved` (`reserved`),
  KEY `ontopic` (`ontopic`),
  KEY `status` (`status`),
  KEY `links` (`links`),
  KEY `ontopic_links` (`ontopic`,`links`),
  KEY `cid_status_otype_links_ontopic` (`cid`,`status`,`otype`,`links`,`ontopic`)
) ENGINE=InnoDB AUTO_INCREMENT=13022832 DEFAULT CHARSET=latin1

Here is the result of the EXPLAIN command 这是EXPLAIN命令的结果

id: 1
select_type: Simple
table: opportunities
partitions: null
type: range
possible_keys: cid,cid1,otype,ontopic,status,links,ontopic_links,cid_status_otype_links_ontopic
key: links
keylen: 4
ref: null
rows: 1531552
filtered: 0.33
Extra: Using index condition; Using where

Thoughts / Questions 想法/问题

Am I reading it correctly that it is using the "links" key to do the query? 我是否正确阅读它正在使用“链接”键进行查询? Why wouldn't it use a more complete index, like the cid_status_otype_links_ontopic which covers all the conditions of my query? 为什么它不使用更完整的索引,例如cid_status_otype_links_ontopic涵盖了查询的所有条件?

Thanks in advance! 提前致谢!

As requested 按照要求

There are 30,961 results that match the query when you remove the LIMIT 0,100. 删除LIMIT 0,100时,有30,961个与查询匹配的结果。 Interestingly, the "count()" command returns almost instantaneously. 有趣的是,“ count()”命令几乎立即返回。

  • What you have must plow through all of the rows, using your 5-column index, then sort the results and deliver 100 rows. 您必须使用5列索引对所有行进行耕作,然后对结果进行排序并提供100行。

  • The only index likely to be useful is INDEX(cid, links) . 唯一可能有用的INDEX(cid, links)INDEX(cid, links) This is because cid is the only column being tested with = , then having links might be useful for the ORDER BY and LIMIT . 这是因为cid是唯一用=测试的列,因此具有links 可能ORDER BYLIMIT有用。 There is still the risk that the != tests will require filtering a lot of rows. !=测试仍然存在需要过滤大量行的风险。

  • Are status and otype multi-valued? statusotype多值的吗? If either has only 2 values, then turning the != into = and adding it to the index would be beneficial. 如果任何一个只有2个值,则将!=转换为=并将其添加到索引将是有益的。

  • Do you really need all the columns ( SELECT * )? 您是否真的需要所有列( SELECT * )? If not, and if you don't need any big columns ( url ), then you could go with a 'covering' index. 如果没有, 并且不需要任何大列( url ),则可以使用“覆盖”索引。

More on writing indexes . 有关编写索引的更多信息

It's a funny thing about using inequality comparisons, that they count as range conditions. 使用不平等比较是一件很有趣的事情,它们被视为范围条件。

That is, equality matches one value, but anything other than equality ( != , > , < , IN , BETWEEN ). 也就是说,相等匹配一个值,但不等于相等( !=><INBETWEEN )。

By matching multiple values, it means that only the first column in an index used in a range condition is going to be optimized. 通过匹配多个值,这意味着将仅优化范围条件中使用的索引中的第一列。 You'd think that your index cid_status_otype_links_ontopic has all the columns mentioned in conditions of your query, but only the first two will be used. 您可能认为索引cid_status_otype_links_ontopic包含查询条件中提到的所有列,但仅会使用前两个列。 The first because you have an equality comparison for cid . 第一个是因为您对cid有相等的比较。 The second because the next column is used in an inequality comparison, and then that's where it stops using columns from the index.* 第二个原因是在不等式比较中使用了下一列,然后该列不再使用索引中的列。*

Evidence: if you can force that index to be used, you should see the keylen field of the EXPLAIN result show only 5, which is the size of cid (4 bytes) + status (1 byte). 证据:如果可以强制使用该索引,则应该看到EXPLAIN结果的keylen字段仅显示5,即cid的大小(4个字节)+ status (1个字节)。

The MySQL optimizer apparently has predicted that it would be more beneficial to use your links index, because that allows it to access the rows in index order, which is the same as the sort order you requested with your ORDER BY . MySQL优化器显然已经预言使用links索引会更有利,因为它允许它以索引顺序访问行,这与您使用ORDER BY请求的排序顺序相同。

Evidence: you don't see "Using filesort" in your EXPLAIN notes. 证据:你没有看到在你的EXPLAIN笔记“使用文件排序”。

Is that really better than using one of the other indexes? 这真的比使用其他索引之一好吗? Maybe, maybe not. 也许吧,也许不是。 The optimizer's predictions aren't always perfect. 优化器的预测并不总是完美的。

You can use an index hint to override the optimizer's choice: 您可以使用索引提示来覆盖优化器的选择:

SELECT * FROM opportunities USE INDEX (cid_status_otype_links_ontopic) WHERE ...

Try that out, do the EXPLAIN of that query and compare it to your other EXPLAIN. 尝试一下,执行该查询的解释并将其与您的其他解释进行比较。 Then execute both queries and see which is reliably faster. 然后执行两个查询,看看哪个可靠更快。

(* Actually, I have to add a footnote about the index column usage. MySQL 5.6 and later can do a little bit better than just the two columns, when you see the note "Using Index Condition" in the EXPLAIN. But it's not quite the same. You can read more about that here: https://dev.mysql.com/doc/refman/5.6/en/index-condition-pushdown-optimization.html ) (*实际上,我必须添加一个有关索引列使用情况的脚注。当您在EXPLAIN中看到注释“使用索引条件”时,MySQL 5.6及更高版本的功能会比仅两列更好。您可以在此处了解更多信息: https : //dev.mysql.com/doc/refman/5.6/en/index-condition-pushdown-optimization.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM