为什么大于与等于在MySQL SELECT中有所不同？

Question

I have a large MyISAM table. 我有一个大的MyISAM表。 It's approaching 1 million rows. 它接近100万行。 It's basically a list of items and some information about them. 它基本上是一个项目列表和一些有关它们的信息。

There are two indices: 有两个指数：

primary: the item ID primary：商品ID
date (date) and col (int). date（date）和col（int）。

I run two queries: 我运行两个查询：

SELECT * FROM table WHERE date = '2011-02-01' AND col < 5 LIMIT 10

SELECT * FROM table WHERE date < '2011-02-01' AND col < 5 LIMIT 10

The first one finishes in ~0.0005 seconds and the second in ~0.05 seconds. 第一个在~0.0005秒内完成，第二个在~0.05秒内完成。 That is 100X difference. 这是100倍的差异。 Is it wrong for me to expect both of these to run at roughly the same speed? 我期望这两者以大致相同的速度运行是不对的？ I must not be understanding the indices very well. 我不能很好地理解这些指数。 How can I speed up the second query? 如何加快第二次查询？

Answer 1

Regardless of Mysql it boils down to basic algorithm theory. 无论Mysql如何，它归结为基本算法理论。

Greater than and Less than operations on a large set are slower than Identity operations. 大集上的大于和小于操作比Identity操作慢。 With a large data set an ideal data structure for determining less than or greater is a self balancing tree (binary or n-tree). 对于大数据集，用于确定小于或大于的自然平衡树（二进制或n树）的理想数据结构。 On aa self balanced tree the worst case scenario to find all less/greater is log n . 在自平衡树上，找到所有更小/更大的最坏情况是log n 。

The ideal data structure for identity lookup is a hashtable. 身份查找的理想数据结构是哈希表。 The performance of hashtables is generally O(1) aka fixed time. 哈希表的性能通常是O（1）又称固定时间。 A hashtable however is not good for greater/less. 然而，散列表对于更大/更小是不利的。

Generally a well balanced tree is only slightly less performing than a hashtable (which is how Haskell gets away with using a tree for hashtables). 通常，一个平衡良好的树只比一个哈希表（这就是Haskell使用树用于哈希表的方式）的表现稍差。

Thus irregardless of what Mysql does its not surprise that <,> is slower than = 因此，无论Mysql做什么，<，>都比=慢，这并不奇怪

Old Answer below: 旧答案如下：

Because the first one is like Hashtable lookup since its '=' (particularly if your index is a hashtable) it will be faster than the second one which might work better with a tree like index. 因为第一个就像Hashtable查找一样，因为它的'='（特别是如果你的索引是一个哈希表），它会比第二个更快，它可能更像树索引。

Since MySql allows to configure the index format you can try changing that but I'm rather sure the first will always run faster than the second. 由于MySql允许配置索引格式，您可以尝试更改它，但我相信第一个将始终比第二个运行得更快。

Answer 2

I'm assuming you have an index on the date column. 我假设你在日期列上有一个索引。 The first query uses the index, the second query probably does a linear scan (at least over part of the data). 第一个查询使用索引，第二个查询可能执行线性扫描（至少部分数据）。 A direct fetch is always faster than a linear scan. 直接提取总是比线性扫描更快。

Answer 3

MySQL stores its indexes by default in a BTREE. MySQL默认将其索引存储在BTREE中。 No hashing in general. 一般没有哈希。

The short answer for the performance difference is that the < form evaluates more nodes then the = form. 性能差异的简短答案是<form评估更多节点然后评估= form。

The index that you've got on there (date, col) stores the values roughly like a phone book: 你在那里得到的索引（日期，col）将值大致存储为电话簿：

2011-01-01, col=1, row_ptr
2011-01-01, col=2, row_ptr
2011-01-01, col=3, row_ptr
etc...
2011-02-01, col=1, row_ptr
2011-02-01, col=2, row_ptr
2011-02-01, col=3, row_ptr
etc...
2011-02-02, col=1, row_ptr
2011-02-02, col=2, row_ptr
etc...

...in ascending sorted tree nodes of size B (2011-01-01, col=1) < (2011-01-01, col=2) < (2011-01-02, col=1). ...在大小为B的升序排序树节点中（2011-01-01，col = 1）<（2011-01-01，col = 2）<（2011-01-02，col = 1）。

Your question is essentially asking the difference between: 你的问题基本上是要求区别：

Find all phone numbers with last name 'Smith' and first name starting with 'A' 查找姓氏为“Smith”的所有电话号码，以“A”开头的名字
Find all phone numbers that come before 'Smith' and have first name starting with 'A' . 查找“史密斯”之前的所有电话号码，并以“A”开头的名字 。

It should be obvious why #1 is so much faster then #2. 很明显为什么＃1比＃2快得多。

There are also considerations of memory /disk transfer efficiency and heap allocations (= does WAY fewer transfers then <) that account for a not-insignificant amount of time but depend largely on the distribution of the data and the specific location of the 2011-02-01, col=min(col) key record. 还考虑了内存/磁盘传输效率和堆分配（= WAY减少传输然后<），这可以解释不可忽视的时间，但主要取决于数据的分布和2011-02的具体位置-01，col = min（col）密钥记录。

[1] http://en.wikipedia.org/wiki/B-tree [1] http://en.wikipedia.org/wiki/B-tree
[2] http://forge.mysql.com/wiki/MySQL_Internals_MyISAM [2] http://forge.mysql.com/wiki/MySQL_Internals_MyISAM
[3] http://forge.mysql.com/wiki/MySQL_Internals_InnoDB [3] http://forge.mysql.com/wiki/MySQL_Internals_InnoDB

Answer 4

The first one performs a seek over data where as the second one goes for a scan . 第一个执行搜索数据，其中第二个用于扫描。 Scans are always costlier than seeks hence the time difference . 扫描总是比寻找更昂贵因此时差。

Its like that, the the scan means running through all the pages of the book where as seek is directly jumping to a page number. 就像那样，扫描意味着贯穿本书的所有页面，其中搜索直接跳转到页码。

Hope this might help. 希望这可能有所帮助。

为什么大于与等于在MySQL SELECT中有所不同？

问题描述

4 个解决方案

解决方案1
2 已采纳 2011-02-04 03:56:04

解决方案2
2 2011-02-04 06:19:48

解决方案3
2 2011-10-18 23:39:14

解决方案4
1 2011-02-04 06:06:00

为什么大于与等于在MySQL SELECT中有所不同？

问题描述

4 个解决方案

解决方案1 2 已采纳 2011-02-04 03:56:04

解决方案2 2 2011-02-04 06:19:48

解决方案3 2 2011-10-18 23:39:14

解决方案4 1 2011-02-04 06:06:00

解决方案1
2 已采纳 2011-02-04 03:56:04

解决方案2
2 2011-02-04 06:19:48

解决方案3
2 2011-10-18 23:39:14

解决方案4
1 2011-02-04 06:06:00