简体   繁体   English

索引可以提高排序性能?

[英]Index to improve sorting performance?

I have a fairly complex query which includes an ORDER BY and a LIMIT clause. 我有一个相当复杂的查询,其中包含ORDER BYLIMIT子句。 When the ORDER BY uses the primary key, the query takes less than 5 milliseconds. ORDER BY使用主键时,查询花费的时间少于5毫秒。 However, if I change the query such that the ORDER BY is done by a different column (of type FLOAT ), the response time balloons to more than 50 seconds (four orders of magnitude higher!). 但是,如果我更改查询以使ORDER BY由另一列(类型为FLOAT )完成,则响应时间将超过50秒(高出四个数量级!)。

Now, I presume the problem is that the query ordered by primary key performs an index scan, whereas the query ordered by the float column does a sequential scan and requires sorting at the end. 现在,我假设问题在于,按主键排序的查询执行索引扫描,而按float列排序的查询进行顺序扫描,并且需要在末尾进行排序。

I thought that simply adding an index on the float column would suffice for Postgresql to plan this query in a smarter way. 我认为简单地在float列上添加索引就足以让Postgresql以一种更明智的方式计划此查询。 Apparently I was wrong. 显然我错了。 What may I have missed? 我可能错过了什么?

EDIT: I did run EXPLAIN ANALYZE before posting the question. 编辑:我确实发布问题之前运行EXPLAIN ANALYZE Hence my presumption is not just a wild guess; 因此,我的推论不仅仅是一个疯狂的猜测。 however, since the output of EXPLAIN ANALYZE runs for more than 30 lines, it's not immediately clear why one query uses the index whereas the other has to sort all the rows. 但是,由于EXPLAIN ANALYZE的输出运行超过30行,因此尚不清楚为什么一个查询使用索引而另一个查询必须对所有行进行排序。

  1. Run explain analyze on the query - so you will not have to guess what happens. 在查询中运行解释分析-这样您就不必猜测会发生什么。
  2. To optimize query you generally have to read explain analyze output, the query, and then figure out the best course of action. 要优化查询,通常必须阅读解释分析输出,查询,然后找出最佳的操作方案。 Sometimes - it's adding index, sometimes - rewriting the query. 有时-它添加索引,有时-重写查询。 but it's not possible to tell which is best for your case, as we don't see explain nor query. 但由于我们看不到解释或查询,因此无法确定哪种方法最适合您的情况。

It is very hard to decipher what is happening without seeing the query. 在不查看查询的情况下很难解释正在发生的事情。 My guess is that the query plan is able to do the joins based on the table with the primary key, keeping the data in the proper order. 我的猜测是查询计划能够使用主键基于表进行联接,从而使数据保持正确的顺序。 The query plan is then basically fetch a row, look up values in other tables, massage them, and return the values in order. 然后,查询计划基本上是获取一行,在其他表中查找值,对其进行按摩,然后按顺序返回值。 The processing goes as far as the limit does. 处理将达到limit

When you replace this by another column in the order by , all the rows have to be processed. 当您按order by将其替换为另一列时,必须处理所有行。 These are sorted and returned. 这些被排序并返回。 It might be the size of the underlying tables or it might be the size of the result set resulting in longer processing. 它可能是基础表的大小,也可能是结果集的大小,从而导致更长的处理时间。 But, the fundamental reason is that all rows need to be generated. 但是,根本原因是需要生成所有行。

For a query that returns many rows, it's unusual for a database to use a non-covering index. 对于返回许多行的查询,数据库使用非覆盖索引是不常见的。 The cost of the table lookup (from the index to the table data) is too high. 表查找(从索引到表数据)的开销太高。 A table scan will be used instead. 将使用表扫描代替。

For example, 例如,

select name from people where name > 'N' order by birthdate

Would the database use an index on (birthday) ? 数据库会在(birthday)使用索引吗? On the plus side, the rows would be returned in the right order. 从好的方面来说,将以正确的顺序返回行。 On the down side, every row would need a table lookup for the name column. 不利的一面是,每一行都需要对name列进行表查找。 The second is much more expensive and so the index would not be used. 第二个要贵得多,因此不会使用该索引。

An index on (birthday, name) is different. (birthday, name)的索引不同。 It includes the name, so no table lookup is required. 它包含名称,因此不需要表查找。 The database can use the index to quickly return rows in the right order. 数据库可以使用索引以正确的顺序快速返回行。

An index that includes all columns required for a query is called a covering index. 包含查询所需的所有列的索引称为覆盖索引。 Make sure your index includes all columns used by your query, then try again. 确保索引包含查询使用的所有列,然后重试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM