简体   繁体   English

在postgres中使用LIMIT时未使用的索引

[英]Index not used when LIMIT is used in postgres

I have a words table with an index on (language_id, state). 我有一个带有索引的单词表(language_id,state)。 Here are the results for EXPLAIN ANALYZE: 以下是EXPLAIN ANALYZE的结果:

No limit 没有限制

explain analyze SELECT "words".* FROM "words" WHERE (words.language_id = 27) AND (state IS NULL);

Bitmap Heap Scan on words  (cost=10800.38..134324.10 rows=441257 width=96) (actual time=233.257..416.026 rows=540556 loops=1)
Recheck Cond: ((language_id = 27) AND (state IS NULL))
->  Bitmap Index Scan on ls  (cost=0.00..10690.07 rows=441257 width=0) (actual time=230.849..230.849 rows=540556 loops=1)
Index Cond: ((language_id = 27) AND (state IS NULL))
Total runtime: 460.277 ms
(5 rows)

Limit 100 限制100

explain analyze SELECT "words".* FROM "words" WHERE (words.language_id = 27) AND (state IS NULL) LIMIT 100;

Limit  (cost=0.00..51.66 rows=100 width=96) (actual time=0.081..0.184 rows=100 loops=1)
->  Seq Scan on words  (cost=0.00..227935.59 rows=441257 width=96) (actual time=0.080..0.160 rows=100 loops=1)
Filter: ((state IS NULL) AND (language_id = 27))
Total runtime: 0.240 ms
(4 rows)

Why is this happening? 为什么会这样? How can I get the index to be used in all cases? 如何在所有情况下使用索引?

Thanks. 谢谢。

I think that the PostreSQL query planner just thinks that in the second case - the one with the LIMIT - it's not worth applying the index as it [the LIMIT] is too small. 我认为PostreSQL查询规划器只是认为在第二种情况下 - 具有LIMIT的情况 - 因为它[LIMIT]太小而不值得应用索引。 So it's not an issue. 所以这不是问题。

Take a look at the PostgreSQL documentation about Using EXPLAIN and Query Planning . 查看有关使用EXPLAIN查询计划的PostgreSQL文档。 The reason for the query planner to prefer a sequential scan over an index scan in the LIMIT 100 case is simply because the sequential scan is cheaper. LIMIT 100情况下,查询计划程序更喜欢对索引扫描进行顺序扫描的原因仅仅是因为顺序扫描更便宜。

There is no ORDER BY clause in the query, so the planner is ok with the first 100 (random) rows that match the filter condition. 查询中没有ORDER BY子句,因此规划器可以使用与过滤条件匹配的前100个(随机)行。 An index scan would require to read the index pages first and then read the data pages to fetch the according rows. 索引扫描需要首先读取索引页,然后读取数据页以获取相应的行。 The sequential scan only needs to read the data pages to fetch the rows. 顺序扫描只需读取数据页面即可获取行。 In your case table statistics seem to suggest that there are enough (random) rows that match the filter condition. 在您的情况下,表统计信息似乎表明有足够的(随机)行匹配过滤条件。 The cost of sequential page reads to get the 100 rows is considered cheaper than the cost of reading the index first and then fetch the actual rows. 获取100行的顺序页面读取的成本被认为比首先读取索引的成本便宜,然后获取实际的行。 You might see a different plan when you raise the limit or when less rows match the filter condition. 当您提高限制或较少的行与过滤条件匹配时,您可能会看到不同的计划。

With the default settings the planner considers the cost of a random page read ( random_page_cost ) four times the cost of a sequential page read ( seq_page_cost ). 使用默认设置规划者认为读随机页面(random_page_cost)读取顺序页(seq_page_cost)的四倍的成本费用。 These settings can be adjusted to tune query plans (eg when the whole database is in RAM a random page read is not more expensive than a sequential page read and an index scan should be preferred). 可以调整这些设置以调整查询计划(例如,当整个数据库在RAM中时,随机页面读取并不比顺序页面读取更昂贵,并且应该首选索引扫描)。 You can also try out different query plans by enabling/disabling certain kinds of scans, eg: 您还可以通过启用/禁用某些类型的扫描来尝试不同的查询计划,例如:

set enable_seqscan = [on | off]
set enable_indexscan = [on | off]

While it is possible to enable/disable certain kinds of scans on a global basis this should be only used ad hoc for debugging or troubleshooting on a per session basis. 虽然可以在全局范围内启用/禁用某些类型的扫描,但这应该仅用于每个会话的调试或故障排除。

Also run VACUUM ANALYZE words before you test the query plans, otherwise an automatic vacuum ( autovaccum ) run between the tests might influence the results. 在测试查询计划之前还要运行VACUUM ANALYZE words ,否则测试之间运行的自动真空( autovaccum )可能会影响结果。

Without limit: rows=540556 loops=1 Total runtime: 460.277 ms 无限制:rows = 540556 loops = 1总运行时间:460.277 ms

With limit: rows=100 loops=1 Total runtime: 0.240 ms 使用limit:rows = 100 loops = 1总运行时间:0.240 ms

I don't see a problem here. 我这里没有看到问题。 If your query yields 500K rows, it will need more time. 如果您的查询产生500K行,则需要更多时间。

It's also weird that the two queries return a different number of rows. 两个查询返回不同数量的行也很奇怪。 I guess you have been inserting though... Uhm, what if you do a sub-select? 我猜你一直在插入...嗯,如果你做一个子选择怎么办?

select * from (select ...) limit 100;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM