简体   繁体   English

MySQL:关于INDEX使用情况,LIKE 123和= 123之间的差异

[英]MySQL: Difference between LIKE 123 and = 123 regarding INDEX usage

I am experiencing a very strange behaviour which just turned out to be a matter of using the correct operator in my where condition. 我遇到了一个非常奇怪的行为,事实证明这是在我的where条件中使用正确的运算符的问题。

Assume the following table structure with some million entries: 假定下面的表结构包含大约一百万个条目:

CREATE TABLE `obj` (
  `obj__id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `obj__obj_type__id` int(10) unsigned DEFAULT NULL,
  `obj__title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `obj__const` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `obj__description` text COLLATE utf8_unicode_ci,
  `obj__created` datetime DEFAULT NULL,
  `obj__created_by` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `obj__updated` datetime DEFAULT NULL,
  `obj__updated_by` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `obj__property` int(10) unsigned DEFAULT '0',
  `obj__status` int(10) unsigned DEFAULT '1',
  `obj__sysid` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `obj__scantime` datetime DEFAULT NULL,
  `obj__imported` datetime DEFAULT NULL,
  `obj__hostname` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `obj__undeletable` int(1) unsigned NOT NULL DEFAULT '0',
  `obj__rt_cf__id` int(11) unsigned DEFAULT NULL,
  `obj__cmdb_status__id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`obj__id`),
  KEY `obj_FKIndex1` (`obj__obj_type__id`),
  KEY `obj_ibfk_2` (`obj__cmdb_status__id`),
  KEY `obj__sysid` (`obj__sysid`),
  KEY `obj__title` (`obj__title`),
  KEY `obj__const` (`obj__const`),
  KEY `obj__hostname` (`obj__hostname`),
  KEY `obj__status` (`obj__status`),
  KEY `obj__updated_by` (`obj__updated_by`)
) ENGINE=InnoDB AUTO_INCREMENT=7640131 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

A very simple select with two conditions ordering by obj__title with a limit of 500 performs quiet slow (500ms): 一个非常简单的选择,其中有两个条件(由obj__title排序,限制为500)执行安静的慢速(500毫秒):

SELECT SQL_NO_CACHE * FROM obj WHERE (obj__status = 2) AND (obj__obj_type__id = 59) ORDER BY obj__title ASC LIMIT 0, 500;

Without the "ORDER BY obj__title" it runs like a charm (<1ms). 没有“ ORDER BY obj__title”,它的运行就像一个超级按钮(<1ms)。

EXPLAIN SELECT is telling me that MySQL is performing a filesort and not using the obj__title index. EXPLAIN SELECT告诉我MySQL正在执行文件排序,而不使用obj__title索引。 So, ok, it is quiet obvious that this query is slow: 因此,很明显,此查询很慢:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  obj index_merge obj_FKIndex1,obj__status    obj_FKIndex1,obj__status    5,5 NULL    1336    Using intersect(obj_FKIndex1,obj__status); Using where; Using filesort

When i am forcing the index obj__title to use with FORCE or USE INDEX, mysql is not using the other indexes resulting in a very poor performance again. 当我强制索引obj__title与FORCE或USE INDEX一起使用时,mysql没有使用其他索引,从而导致性能再次很差。 But nevermind, it is quiet obvious that the poor performance has something to do with the combination of the two conditions and the order by. 但是没关系,很明显,性能不佳与这两个条件和顺序的组合有关。

Now that i spend hours on investigating on optimizing this query i came up with a very simple exchange: I exchanged the operator of my conditions from = to LIKE. 现在,我花了很多时间研究如何优化此查询,我想到了一个非常简单的交换方法:将条件的运算符从=交换为LIKE。 So my query is like: 所以我的查询就像:

EXPLAIN SELECT SQL_NO_CACHE * FROM obj WHERE (obj__status LIKE 2) AND (obj__obj_type__id LIKE 59) ORDER BY obj__title ASC LIMIT 0, 500;

This is what happened.. 就是这样

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  obj index   obj_FKIndex1,obj__status    obj__title  768 NULL    500 Using where

Query performance is 150ms. 查询性能为150ms。 I was shocked actually. 我真的很震惊。

I am not really happy with the speed but at least it is performing ok. 我对速度并不满意,但至少表现还不错。

But what I would really like to know is why LIKE is using the index, and = does not? 但是我真正想知道的是为什么LIKE使用索引,而=不使用索引? I did not found any hints on that on the MySQL documentation. 我没有在MySQL文档上找到任何提示。 Only a few notes about LIKE being case insensitive and LIKE acting a bit different for VARCHARS > 255, or any other CHAR or TEXT fields.. No single word about it's integer behaviour. 关于IKE不区分大小写,并且对于VARCHARS> 255或任何其他CHAR或TEXT字段,LIKE的行为略有不同。

Can someone shed light on this situation? 有人可以阐明这种情况吗? Any Database design or query tips to speed up the query more are very welcome as well! 也非常欢迎任何数据库设计或查询技巧来加快查询速度!

For this query: 对于此查询:

SELECT SQL_NO_CACHE *
FROM obj
WHERE (obj__status = 2) AND (obj__obj_type__id = 59)
ORDER BY obj__title ASC
LIMIT 0, 500;

The best index is obj(obj__status, obj__obj_type__id, obj__title) . 最好的索引是obj(obj__status, obj__obj_type__id, obj__title)

Otherwise, I would expect an index on one of the two where fields. 否则,我期望两个where字段之一的索引。

However, when you use like , you are comparing numbers to strings. 但是,当您使用like ,您将数字与字符串进行比较。 This generally prevents an index from being used. 这通常会阻止使用索引。 The only possible index is for the order by , which happens to work in your case. 唯一可能的索引是order byorder by ,这恰好在您的情况下起作用。

But, the proper index should have better performance. 但是,适当的索引应具有更好的性能。

The ORDER BY has to satisfied before the LIMIT . ORDER BY必须在LIMIT之前满足。 If there are a bloatload of rows, and MySQL performs a sort operation ("Using filesort") shown in the Extra column, that can be expensive. 如果行过多,并且MySQL执行Extra列中显示的排序操作(“使用文件排序”),则可能会很昂贵。

MySQL can also satisfy an ORDER BY obj__title without performing a sort operation, by making use of an index with a leading column of obj__title . MySQL还可以通过使用带有obj__title前导列的索引来满足ORDER BY obj__title而无需执行排序操作。 And that's what you see happening when you change the predicates. 这就是您更改谓词时看到的情况。 EXPLAIN shows that the index on obj__title is being used, there's no sort operation. EXPLAIN显示正在使用obj__title上的索引,没有排序操作。 But MySQL has to inspect each row, to see if it satisfies the predicates or not. 但是MySQL必须检查每一行,以查看它是否满足谓词。

The LIKE predicate is causing the column to be evaluated in a string context, rather than numeric. LIKE谓词使该列在字符串上下文(而不是数字)中求值。 That is, MySQL has to perform an implicit conversion from integer to varchar. 也就是说,MySQL必须执行从整数到varchar的隐式转换。 And that prevents MySQL from using the index to satisfy the predicates. 并且这阻止了MySQL使用索引来满足谓词。 MySQL is basically being forced to do the conversion for every row in the table, in order to evaluate the predicate. 基本上,MySQL被迫对表中的每一行进行转换,以评估谓词。


For best performance of that first query: 为了获得第一个查询的最佳性能:

  SELECT SQL_NO_CACHE * 
    FROM obj 
   WHERE obj__status = 2
     AND obj__obj_type__id = 59
   ORDER BY obj__title ASC
   LIMIT 0, 500

You'd want an index with leading columns: 您想要一个包含前导列的索引:

 .... ON obj (obj__status, obj__obj_type__id, obj__title)

Then, MySQL could satisfy both of the equality predicates and the order by making use of the single index. 然后,MySQL的可以通过利用单一指标的同时满足等式谓词秩序。

Note that this makes the index on just the single column obj__status redundant. 请注意,这使仅单列obj__status上的索引变得多余。 Any query making use of the index on obj__status could make use of the new index. 任何使用obj__status上的索引的查询都可以使用新索引。

Your first select needs this composite index. 您的第一选择需要此综合索引。 (I take the liberty of removing the "obj_" which just clutters the SQL.) (我可以随意删除只会使SQL混乱的“ obj_”。)

INDEX(type_id, status, title)

MySQL rarely uses more than one index in a query; MySQL很少在一个查询中使用多个索引。 this 3-column index is suited for WHERE status=(const) AND type_id=(const) ORDER BY title . 此3列索引适用于WHERE status=(const) AND type_id=(const) ORDER BY title I see that it used "index intersect" to try to compensate for the lack of a suitable composite index, but only partially. 我看到它使用“索引相交”来尝试弥补缺少合适的复合索引的不足,但只是部分弥补了这一不足。

Perhaps the optimizer looked at LIKE and said "Punt! I give up on using numeric comparisons, so let's not use either index on type_id or status. Instead, let's see if we can avoid the filesort by using INDEX(title) ". 也许优化器看着LIKE说:“平底锅!我放弃使用数字比较,所以我们不要在type_id或status上使用索引。相反,让我们看看是否可以通过使用INDEX(title)避免文件排序”。 And it happened to be better. 而且碰巧更好。

There is another thing that makes that filesort especially costly. 还有另一件事使该文件排序特别昂贵。 "Using temporary" and "Filesort" prefer to do everything in RAM via a MEMORY table. “使用临时”和“文件排序” 更喜欢通过MEMORY表在RAM中进行所有操作。 But several things can prevent that. 但是有几件事可以防止这种情况。 One is fetching of a TEXT field, which you do ( SELECT * which includes description TEXT ). 一种是获取TEXT字段,您可以执行此操作( SELECT *包括description TEXT )。 I doubt if the optimizer noticed that. 我怀疑优化器是否注意到了这一点。 But the timings seem to have. 但是时机似乎已经到了。

For more tips on indexing, see my index cookbook . 有关建立索引的更多技巧,请参阅我的索引手册 Meanwhile, use LIKE only on strings, not numeric values. 同时,仅对字符串使用LIKE ,而不对数字值使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM