How to tweak index_scan cost in postgres?

Question

For the following query:

SELECT *
FROM "routes_trackpoint"
WHERE "routes_trackpoint"."track_id" = 593
ORDER BY "routes_trackpoint"."id" ASC
LIMIT 1;

Postgres is choosing a query plan which reads all the rows in the "id" index to perform the ordering, and them perform manual filtering to get the entries with the correct track id:

Limit  (cost=0.43..511.22 rows=1 width=65) (actual time=4797.964..4797.966 rows=1 loops=1)
   Buffers: shared hit=3388505
   ->  Index Scan using routes_trackpoint_pkey on routes_trackpoint  (cost=0.43..719699.79 rows=1409 width=65) (actual time=4797.958..4797.958 rows=1 loops=1)
         Filter: (track_id = 75934)
         Rows Removed by Filter: 13005436
         Buffers: shared hit=3388505
 Total runtime: 4798.019 ms
(7 rows)

Disabling the index scan, the query plan ( SET enable_indexscan=OFF; ) is better and the response much faster.

Limit  (cost=6242.46..6242.46 rows=1 width=65) (actual time=77.584..77.586 rows=1 loops=1)
   Buffers: shared hit=1075 read=6
   ->  Sort  (cost=6242.46..6246.64 rows=1674 width=65) (actual time=77.577..77.577 rows=1 loops=1)
         Sort Key: id
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=1075 read=6
         ->  Bitmap Heap Scan on routes_trackpoint  (cost=53.41..6234.09 rows=1674 width=65) (actual time=70.384..74.782 rows=1454 loops=1)
               Recheck Cond: (track_id = 75934)
               Buffers: shared hit=1075 read=6
               ->  Bitmap Index Scan on routes_trackpoint_track_id  (cost=0.00..52.99 rows=1674 width=0) (actual time=70.206..70.206 rows=1454 loops=1)
                     Index Cond: (track_id = 75934)
                     Buffers: shared hit=2 read=6
 Total runtime: 77.655 ms
(13 rows)

How can I get Postgres to select the better plan automatically?

I have tried the following:

ALTER TABLE routes_trackpoint ALTER COLUMN track_id SET STATISTICS 5000;
ALTER TABLE routes_trackpoint ALTER COLUMN id SET STATISTICS 5000;
ANALYZE routes_trackpoint;

But the query plan remained the same.

The table definition is:

watchdog2=# \d routes_trackpoint
                                   Table "public.routes_trackpoint"
  Column  |           Type           |                           Modifiers                            
----------+--------------------------+----------------------------------------------------------------
 id       | integer                  | not null default nextval('routes_trackpoint_id_seq'::regclass)
 track_id | integer                  | not null
 position | geometry(Point,4326)     | not null
 speed    | double precision         | not null
 bearing  | double precision         | not null
 valid    | boolean                  | not null
 created  | timestamp with time zone | not null
Indexes:
    "routes_trackpoint_pkey" PRIMARY KEY, btree (id)
    "routes_trackpoint_position_id" gist ("position")
    "routes_trackpoint_track_id" btree (track_id)
Foreign-key constraints:
    "track_id_refs_id_d59447ae" FOREIGN KEY (track_id) REFERENCES routes_track(id) DEFERRABLE INITIALLY DEFERRED

PS: We have forced postgres to sort by "created" instead, which also helped him use the index on "track_id".

Answer 1

Avoid LIMIT as much as you can. Plan #1: use NOT EXISTS() to get the first one

EXPLAIN ANALYZE
SELECT * FROM routes_trackpoint tp
WHERE tp.track_id = 593
AND NOT EXISTS (
        SELECT * FROM routes_trackpoint nx
        WHERE nx.track_id = tp.track_id AND nx.id < tp.id
        );

Plan #2: use row_number() OVER some_window to get the first one of the group.

EXPLAIN ANALYZE
SELECT tp.*
FROM routes_trackpoint tp
JOIN (select track_id, id
        , row_number() OVER (partition BY track_id ORDER BY id) rn
        FROM routes_trackpoint tp2
        ) omg ON omg.id = tp.id
WHERE tp.track_id = 593
AND omg.rn = 1
        ;

Or -even better- move the WHERE clause to the subquery :

EXPLAIN ANALYZE
SELECT tp.*
FROM routes_trackpoint tp
JOIN (select track_id, id
        , row_number() OVER (partition BY track_id ORDER BY id) rn
        FROM routes_trackpoint tp2
        WHERE tp2.track_id = 593
        ) omg ON omg.id = tp.id
WHERE 1=1
-- AND tp.track_id = 593
AND omg.rn = 1
        ;

Plan#3 use the postgres-specific DISTINCT ON() construct (thanks to @a_horse_with_no_name):

-- EXPLAIN ANALYZE
SELECT DISTINCT ON (track_id) track_id, id
FROM routes_trackpoint tp2
WHERE tp2.track_id = 593
-- order by track_id, created desc
order by track_id, id
        ;

How to tweak index_scan cost in postgres?

Question

1 answers

solution1
1 ACCPTED 2015-11-22 16:15:32

How to tweak index_scan cost in postgres?

Question

1 answers

solution1 1 ACCPTED 2015-11-22 16:15:32

solution1
1 ACCPTED 2015-11-22 16:15:32