SELECT that uses sequential scan instead of index scan

Question

I'm trying to optimize some of my selects using the explain analyze, and I can't understand why postgresql uses a sequentials scan instead of index scan:

explain analyze SELECT SUM(a.deure)-SUM(a.haver) as Value FROM assentaments a
LEFT JOIN comptes c ON a.compte_id = c.id WHERE c.empresa_id=2 AND c.nivell=11 AND
(a.data >='2007-01-01' AND a.data <='2007-01-31')  AND c.codi_compte LIKE '6%';


------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=44250.26..44250.27 rows=1 width=12)
(actual time=334.054..334.054 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..44249.20 rows=211 width=12)
      (actual time=65.277..333.179 rows=713 loops=1)
    ->  Seq Scan on comptes c  (cost=0.00..8001.72 rows=118 width=4)
        (actual time=0.053..64.287 rows=236 loops=1)
        Filter: (((codi_compte)::text ~~ '6%'::text) AND
        (empresa_id = 2) AND (nivell = 11))
      ->  Index Scan using index_compte_id on assentaments a
          (cost=0.00..307.16 rows=2 width=16) (actual time=0.457..1.138 rows=3 loops=236)
           Index Cond: (a.compte_id = c.id)
           Filter: ((a.data >= '2007-01-01'::date) AND (a.data <= '2007-01-31'::date))

  Total runtime: 334.104 ms
  (8 rows)

I've created a custom index:

CREATE INDEX "index_multiple" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST,
empresa_id ASC NULLS LAST, nivell ASC NULLS LAST);

And also I've created three new index for this three fields on comptes table just to check If it takes an index scan, but not, the result is the same:

CREATE INDEX "index_codi_compte" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST);
CREATE INDEX "index_comptes" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST);
CREATE INDEX "index_multiple" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST,     empresa_id ASC NULLS LAST, nivell ASC NULLS LAST);
CREATE INDEX "index_nivell" ON "public"."comptes" USING btree(nivell ASC NULLS LAST);

thanks!

m.

EDIT:

assentaments.id and assentaments.data have their index also

select count(*) FROM comptes => 148498
select count(*) from assentaments => 2128771

select count(distinct(codi_compte)) FROM comptes => 137008
select count(distinct(codi_compte)) FROM comptes WHERE codi_compte LIKE '6%' => 368
select count(distinct(codi_compte)) FROM comptes WHERE codi_compte LIKE '6%' AND empresa_id=2; => 303

Answer 1

If you want an index on TEXT to index LIKE queries, you need to create it with text_pattern_ops, like this:

test=> CREATE TABLE t AS SELECT n::TEXT FROM generate_series( 1,100000 ) n;
test=> CREATE INDEX tn ON t(n);
test=> VACUUM ANALYZE t;
test=> EXPLAIN ANALYZE SELECT * FROM t WHERE n LIKE '123%';
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..1693.00 rows=10 width=5) (actual time=0.027..14.631 rows=111 loops=1)
   Filter: (n ~~ '123%'::text)
 Total runtime: 14.664 ms

test=> CREATE INDEX tn2 ON t(n text_pattern_ops);
CREATE INDEX
Temps : 267,589 ms
test=> EXPLAIN ANALYZE SELECT * FROM t WHERE n LIKE '123%';
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=5.25..244.79 rows=10 width=5) (actual time=0.089..0.121 rows=111 loops=1)
   Filter: (n ~~ '123%'::text)
   ->  Bitmap Index Scan on tn2  (cost=0.00..5.25 rows=99 width=0) (actual time=0.077..0.077 rows=111 loops=1)
         Index Cond: ((n ~>=~ '123'::text) AND (n ~<~ '124'::text))
 Total runtime: 0.158 ms

see details here:

http://www.postgresql.org/docs/9.1/static/indexes-opclass.html

If you do not want to create an additional index, and column is a TEXT, you can replace "compte LIKE '6%'" by "compte >= '6' AND compte < '7'" which is a simple index range condition.

test=> EXPLAIN ANALYZE SELECT * FROM t WHERE n >= '123' AND n < '124';
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Index Scan using tn on t  (cost=0.00..126.74 rows=99 width=5) (actual time=0.030..0.127 rows=111 loops=1)
   Index Cond: ((n >= '123'::text) AND (n < '124'::text))
 Total runtime: 0.153 ms

In your case this solution is probably better.

Answer 2

It appears that the DBMS is estimating that the JOIN on assentaments will be much more restrictive than filtering comptes, then joining.

Options could be...
1. Put an index on assentaments.compte_id
2. Alter your index on comptes to be include id as the first indexed field.

The first option may allow the execution plan to reverse: Filter comptes, then join to assentaments.

The second option may allow the execution plan to stay the same, but enable the use of the index.

Answer 3

This is most commonly due to the bad statistics on the index, ie if the index is not selective enough (for example, many repeating values), accessing and filtering on index can be even more time consuming than doing seq scan.

Are your values on c.codi_compte selective enough? Maybe you have too many null values?

Answer 4

I would try with

a compound index (data, compte_id) on table assentaments and
a compound index (empresa_id, nivell, codi_compte, id) on table comptes

You should also turn that LEFT JOIN into INNER JOIN . The WHERE conditions you have make them equivalent. Perhaps the query planner is not aware of it.

Another suspicion is the type of field comptes.codi_compte . If it is integer and not char() , then the

WHERE c.codi_compte LIKE '6%'

is translated as:

WHERE CAST(c.codi_compte AS CHAR) LIKE '6%'

which means the index cannot be used. If that's the case, you can convert the field to char type.

Answer 5

There are a few things you could/should do. First:

SELECT SUM(a.deure)-SUM(a.haver) as Value

SUM() will touch every row that matches... no way to INDEX that operation.

FROM assentaments a, comptes c

When debugging queries, I find it easier to use a natural JOIN instead of an explicit JOIN . The query planner is freed up a bit more and often times makes a better choice. That's not the case here, just a general comment, however. Here's where there are likely mismatches between your INDEX es and your query.

WHERE TRUE = TRUE
    AND a.compte_id  = c.id
    AND c.empresa_id = 2
    AND c.nivell     = 11

Of those three queries, you have the following INDEX :

CREATE INDEX "index_multiple" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST, empresa_id ASC NULLS LAST, nivell ASC NULLS LAST);

Break that apart since this isn't a UNIQUE INDEX , you shouldn't see any change in the integrity of your data. The reason I'm suggesting this is because I'd guess that codi_compte has a low cardinality. I'd guess that empresa_id would have a higher cardinality. In general, create your INDEX es from highest cardinality to lowest.

I suspect three INDEX es will do a bitmap join or hash join faster. The crux of the problem is that PostgreSQL (probably correctly) thinks that doing an index_scan is more expensive than doing a seq_scan .

    AND (a.data >='2007-01-01' AND a.data <='2007-01-31')
    AND c.codi_compte LIKE '6%';

An INDEX on a.data could also be helpful because PostgreSQL would likely do an index_scan on the date given depending on the number of rows in the assentaments table.

CREATE INDEX "index_codi_compte" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST);
CREATE INDEX "index_comptes"     ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST);

I don't know why you have this INDEX twice.

CREATE INDEX "index_multiple" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST,     empresa_id ASC NULLS LAST, nivell ASC NULLS LAST);

As per above, break that INDEX apart.

CREATE INDEX "index_nivell" ON "public"."comptes" USING btree(nivell ASC NULLS LAST);

That INDEX is fine.

Quick tip:

SELECT matching, total, matching / total AS "Want this to be a small number"
FROM
    (SELECT count(*)::FLOAT AS matching FROM tbl WHERE col_id = 1) AS matching,
    (SELECT count(*)::FLOAT AS total FROM tbl) AS total;


 matching rows | total rows | want this to be a small number 
---------------+------------+--------------------------------
             1 |         10 |                            0.1
(1 row)

Where the third column ideally is equal to 1/total .

SELECT that uses sequential scan instead of index scan

Question

EDIT:

5 answers

solution1
6 ACCPTED 2011-06-04 08:30:50

solution2
2 2011-06-03 12:57:01

solution3
0 2011-06-03 13:03:12

solution4
0 2011-06-03 13:32:44

solution5
0 2011-06-03 15:09:48

SELECT that uses sequential scan instead of index scan

Question

EDIT:

5 answers

solution1 6 ACCPTED 2011-06-04 08:30:50

solution2 2 2011-06-03 12:57:01

solution3 0 2011-06-03 13:03:12

solution4 0 2011-06-03 13:32:44

solution5 0 2011-06-03 15:09:48

solution1
6 ACCPTED 2011-06-04 08:30:50

solution2
2 2011-06-03 12:57:01

solution3
0 2011-06-03 13:03:12

solution4
0 2011-06-03 13:32:44

solution5
0 2011-06-03 15:09:48