简体   繁体   English

Postgres不一致使用Index vs Seq Scan

[英]Postgres inconsistent use of Index vs Seq Scan

I'm having difficulty understanding what I perceive as an inconsistancy in how postgres chooses to use indices. 我很难理解我认为Postgres如何选择使用索引的不一致之处。 We have a query based on NOT IN against an indexed column that postgres executes sequentially, but when we perform the same query as IN , it uses the index. 我们有一个基于NOT IN的查询,该查询针对的是postgres顺序执行的索引列,但是当我们执行与IN相同的查询时,它将使用索引。

I've created a simplistic example that I believe demonstrates the issue, notice this first query is sequential 我创建了一个简单的示例,我相信它可以说明问题,请注意此第一个查询是顺序的

CREATE TABLE node
(
  id SERIAL PRIMARY KEY,
  vid INTEGER
);
CREATE INDEX x ON node(vid);

INSERT INTO node(vid) VALUES (1),(2);

EXPLAIN ANALYZE
SELECT *
FROM node
WHERE NOT vid IN (1);

Seq Scan on node  (cost=0.00..36.75 rows=2129 width=8) (actual time=0.009..0.010 rows=1 loops=1)
  Filter: (vid <> 1)
  Rows Removed by Filter: 1
Total runtime: 0.025 ms

But if we invert the query to IN , you'll notice that it now decided to use the index 但是,如果我们将查询转换为IN ,您会注意到它现在决定使用索引

EXPLAIN ANALYZE
SELECT *
FROM node
WHERE vid IN (2);

Bitmap Heap Scan on node  (cost=4.34..15.01 rows=11 width=8) (actual time=0.017..0.017 rows=1 loops=1)
  Recheck Cond: (vid = 1)
  ->  Bitmap Index Scan on x  (cost=0.00..4.33 rows=11 width=0) (actual time=0.012..0.012 rows=1 loops=1)
        Index Cond: (vid = 1)
Total runtime: 0.039 ms

Can anyone shed any light on this? 谁能对此有所启示? Specifically, is there a way to re-write out NOT IN to work with the index (when obviously the result set is not as simplistic as just 1 or 2). 具体来说,有一种方法可以改写NOT IN以使用索引(当结果集显然不像1或2那样简单时)。

We are using Postgres 9.2 on CentOS 6.6 我们在CentOS 6.6上使用Postgres 9.2

PostgreSQL is going to use an Index when it makes sense. PostgreSQL将在有意义的时候使用索引。 It is likely that the statistics state that your NOT IN has too many tuples to return to make an Index effective. 统计信息可能表明您的NOT IN元组太多,无法返回以使索引有效。

You can test this by doing the following: 您可以通过执行以下操作对此进行测试:

set enable_seqscan to false;
explain analyze .... NOT IN
set enable_seqscan to true;
explain analyze .... NOT IN

The results will tell you if PostgreSQL is making the correct decision. 结果将告诉您PostgreSQL是否做出正确的决定。 If it isn't you can make adjustments to the statistics of the column and or the costs (random_page_cost) to get the desired behavior. 如果不是,则可以调整列的统计信息或费用(random_page_cost)以得到所需的行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM