简体   繁体   English

Select MAX 或 Order By Limit 1

[英]Select MAX or Order By Limit 1

MIN/MAX vs ORDER BY and LIMIT MIN/MAX 与 ORDER BY 和 LIMIT

To follow up on this question: I found some results very different from what Sean McSomething describes:跟进这个问题:我发现一些结果与 Sean McSomething 所描述的非常不同:

I have a table with about 300M rows.我有一个大约 300M 行的表。

Select max(foo) from bar; takes about 15 sec.大约需要 15 秒。 to run跑步

Select foo from bar order by foo desc limit 1; takes 3 sec.需要 3 秒。 to run跑步

Sean's statement "It looks like MIN() is the way to go - it's faster in the worst case, indistinguishable in the best case" just doesn't hold for this case...but I have no idea why.肖恩的声明“看起来 MIN() 是通往 go 的方式 - 在最坏的情况下它更快,在最好的情况下无法区分”只是不适用于这种情况......但我不知道为什么。 Can anyone offer an explanation?任何人都可以提供解释吗?

Edit: Since I am unable to show the table's structure here: assume that bar is a table in an ndb_cluster with no relations, foo is an arbitrary data point with no index.编辑:由于我无法在此处显示表的结构:假设 bar 是 ndb_cluster 中没有关系的表, foo 是没有索引的任意数据点。

要避免完整传递,请在foo列上添加INDEX

I came across this question and thought I'd add what I've found. 我遇到了这个问题,并认为我会添加我发现的内容。 Note that the columns are indexed. 请注意,列已编制索引。 I'm running on MariaDB 10.2.14. 我在MariaDB 10.2.14上运行。

I have a query which looks like SELECT MAX(created) FROM tbl WHERE group=0 AND created IS NOT NULL . 我有一个查询,看起来像SELECT MAX(created) FROM tbl WHERE group=0 AND created IS NOT NULL There's an index on (group,created) (both are ints, but created can be NULL). (group,created)上有一个索引(两者都是整数,但是创建的可以是NULL)。 There are many entries with group=0 , not many where created IS NULL . group=0有很多条目, created IS NULL条目不多。 tbl is using the Aria storage engine. tbl正在使用Aria存储引擎。

EXPLAIN shows the index is being used and gives a row count of 46312, with extra saying "Using where; Using index" EXPLAIN显示索引正在使用并且行计数为46312,额外说“使用where;使用索引”

Running the query takes around 0.692s, but the status has something interesting: 运行查询大约需要0.692秒,但状态有一些有趣的:

Handler_read_key: 1 Handler_read_next: 45131 Handler_read_prev: 0

This seems to suggest that the key is being fully scanned for the maximum; 这似乎表明钥匙正在被最大限度地扫描; using MIN instead of MAX seems to give similar results. 使用MIN代替MAX似乎给出了类似的结果。 This seems to suggest that MIN/MAX actually can't make use of the optimisation to just pick the first/last entry of the index here. 这似乎表明MIN / MAX实际上无法利用优化来选择索引的第一个/最后一个条目。

However, if the query is changed to SELECT created FROM tbl WHERE group=0 AND created IS NOT NULL ORDER BY created DESC LIMIT 1 , whilst the query seems to take about the same amount of time to run, and EXPLAIN shows the same info, the status shows: 但是,如果查询更改为SELECT created FROM tbl WHERE group=0 AND created IS NOT NULL ORDER BY created DESC LIMIT 1 ,而查询似乎需要大约相同的运行时间,并且EXPLAIN显示相同的信息,状态显示:

Handler_read_key: 1 Handler_read_next: 0 Handler_read_prev: 0

I get similar results if the order by is changed to ASC. 如果订单被更改为ASC,我会得到类似的结果。 It seems to me that using an ORDER BY...LIMIT can skip an index scan, which could lead to faster queries if there are many rows which match the index condition, if my understanding is correct. 在我看来,使用ORDER BY ... LIMIT可以跳过索引扫描,如果有许多行符合索引条件,如果我的理解是正确的话,这可能导致更快的查询。
Note that for the above results, there's enough RAM and cache allocated for holding all indexes in cache, so, presumably, index scans are fast. 请注意,对于上述结果,已经分配了足够的RAM和缓存来保存缓存中的所有索引,因此,可能是索引扫描速度很快。

I haven't done any experiments with other conditions (different MySQL versions and storage engines) but I suppose the moral of this story is, checking status of queries via SHOW SESSION STATUS may help provide answers to these things. 我没有对其他条件(不同的MySQL版本和存储引擎)进行任何实验,但我想这个故事的道德是,通过SHOW SESSION STATUS检查查询SHOW SESSION STATUS可能有助于为这些事情提供答案。
At least in this case, the ORDER BY...LIMIT may be more efficient than MIN/MAX even when an index can be used. 至少在这种情况下,即使可以使用索引,ORDER BY ... LIMIT也可能比MIN / MAX更有效。

Index or no index makes no difference for relative comparisons .索引或无索引对于相对比较没有区别。 Of course, one should always add indexes set to get the best performance when reading ("selecting") data.当然,在读取(“选择”)数据时,应始终添加索引集以获得最佳性能。

I have a table where each row is a version of a user.我有一个表,其中每一行都是用户的一个版本。 New rows are added for new users, but also for updates to a user.为新用户添加新行,但也为用户更新。

Listing all users' names on MariaDB 10.3:列出 MariaDB 10.3 上的所有用户名:

ORDER BY... LIMIT 1 : 166, 157, 156, 169, 153, 158 ms ORDER BY... LIMIT 1 :166、157、156、169、153、158 毫秒

SELECT u.displayName
FROM users u
WHERE u.version = (
    SELECT u2.version
    FROM users u2
    WHERE u2.username = u.username
    ORDER BY u2.version DESC
    LIMIT 1)
ORDER BY u.displayName

MAX(...) : 729, 724, 723, 721, 722 ms MAX(...) :729、724、723、721、722 毫秒

SELECT u.displayName
FROM users u
WHERE u.version = (
    SELECT MAX(u2.version)
    FROM users u2
    WHERE u2.username = u.username)
ORDER BY u.displayName

Each milliseconds value is a separate run of the code, busting cache by varying the table aliases randomly (when I don't vary the aliases, subsequent runs are multiple orders of magnitude faster).每个毫秒值都是代码的单独运行,通过随机改变表别名来破坏缓存(当我不改变别名时,后续运行速度要快几个数量级)。

I'm very surprised the difference is that large, you'd think this is a fairly common thing / easy thing to optimize.我很惊讶差异如此之大,您会认为这是一件相当普遍的事情/容易优化的事情。 Not that I'm a database developer so I can't say that I would have done any better, but if someone here is a database developer and wants to weigh in, I would certainly be interested if you want to post an answer with the technical difference!并不是说我是数据库开发人员,所以我不能说我会做得更好,但是如果这里有人是数据库开发人员并且想参与进来,如果您想发布答案,我肯定会感兴趣技术差异!

I've a similar situation, index on the column in question, and yet the order by & limit solution seems quicker. 我有一个类似的情况,相关列上的索引,但顺序和限制解决方案似乎更快。 How good is that :) 这有多好:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM