简体   繁体   中英

Postgres not performing Index only scan

I have a query like this

explain analyze
SELECT user_id, project_id, office_id, SUM(duration) AS tDuration
     FROM users
 WHERE date(start_datetime at TIME ZONE 'UTC') = '2020-05-01'
     GROUP BY project_id, user_id, office_id;

and i have created index on table like this

CREATE INDEX i1_users on users (date(start_datetime at TIME ZONE 'UTC'), project_id, user_id, office_id) include (duration);

but its not doing index scan scan as all the data needed are present in index itself

the explain result is as follows

GroupAggregate  (cost=7.80..7.82 rows=1 width=36) (actual time=5.672..11.735 rows=298 loops=1)
  Group Key:project_id, user_id, office_id
  ->  Sort  (cost=7.80..7.80 rows=1 width=32) (actual time=5.632..7.527 rows=298 loops=1)
        Sort Key: project_id, user_id, office_id
        Sort Method: quicksort  Memory: 48kB
        ->  Index Scan using i2_users on users  (cost=0.56..7.79 rows=1 width=32) (actual time=0.034..2.616 rows=298 loops=1)
              Index Cond: (date(timezone('UTC'::text, start_datetime)) = '2020-05-01'::date)
Planning Time: 2.070 ms
Execution Time: 13.991 ms

I have tried vacuum analyze users as well but no luck. And when ther is lage data in the table its doing sequence scan and sorting but since in index there is sorted data why not just use that?

you are comparing a date "date(start_datetime at TIME ZONE 'UTC')" with a string "'2020-05-01'", which will prevent index-usage. might help:

SELECT user_id, project_id, office_id, SUM(duration) AS tDuration
 FROM users
WHERE date(start_datetime at TIME ZONE 'UTC') = TO_DATE('2020-05-01','YYYY-MM-DD')
 GROUP BY project_id, user_id, office_id;

(timezone might have to be added in to_date)

But do you realy need the timezone-conversion? if the column only stores the date, use it directly (avoiding the function based-index, allowing better statistics/optimisation):

CREATE INDEX i1_users on users (start_datetime, project_id, user_id, office_id) include (duration);
SELECT user_id, project_id, office_id, SUM(duration) AS tDuration
 FROM users WHERE start_datetime = to_date('2020-05-01','YYYY-MM-DD')
 GROUP BY project_id, user_id, office_id;

If start_datetime contains a true timestamp:

CREATE INDEX i1_users on users (date_trunc('day',start_datetime), project_id, user_id, office_id) include (duration);

SELECT user_id, project_id, office_id, SUM(duration) AS tDuration
 FROM users WHERE date_trunc('day',start_datetime) = to_date('2020-05-01','YYYY-MM-DD')
 GROUP BY project_id, user_id, office_id;

The intelligence of the IOS-capable-detection part of the planner is a bit underwhelming here. It makes a list of all the columns it thinks it needs and makes sure those are available, and includes start_datetime in that list. That part of the code doesn't understand that the presence of date(start_datetime at TIME ZONE 'UTC') obviates the need for start_datetime itself.

You can "fix" this by adding start_datetime itself to the index, but of source at the cost of enlarging the index:

CREATE INDEX on users (date_trunc('day',start_datetime), project_id, user_id, office_id)
    include (duration,start_datetime);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM