简体   繁体   中英

PostgreSQL generate_series strange behaviour

The two following queries produce the exact same output:

select
  ref_date::date
from generate_series('2020-10-01', '2020-10-01'::date, interval '1 day') ref_date
--   ref_date
-- 2020-10-01

select now()::date ref_date
--   ref_date
-- 2020-10-01

However, when running explain on each of them, we get different things:

# query 1
Function Scan on generate_series ref_date  (cost=0.01..12.51 rows=1000 width=4)

# query 2
Result  (cost=0.00..0.01 rows=1 width=4)

Things get worse when including one or the other in a sequence of joins, with joining conditions based on ref_date :

select
  stuff
from (select ref_date::date from generate_series('2020-10-01', '2020-10-01'::date, interval '1 day') ref_date) ref_date
left join (other_stuff) x on true
left join (more_stuff) y on y.id = x.id and y.timestamp < ref_date
-- executes in 10 minutes
-- EXPLAIN is long and complex
-- query uses index on more_stuff.(id) only
   despite an index on (id, timestamp) being available

select
  stuff
from (select now()::date ref_date) ref_date
left join (other_stuff) x on true
left join (more_stuff) y on y.id = x.id and y.timestamp < ref_date
-- executes in ten milliseconds
-- EXPLAIN is short and simple
-- query adequately uses index on more_stuff.(id, timestamp)

The reason I can't use now()::date in reality is that I need the generate_series() to generate multiple dates (like, spanning 5 years).

Question :

Is there a way to use an alternative method which uses a sequence of dates and is as efficient as when using now()::date in the above examples?

Notes:

  • the generate_series() method performs much worse than now()::date even when only one date is generated
  • using a pre-constructed table with the output of generate_series (instead of using generate_series directly in the query) produces the same results as using the function directly, even with an index on this table
  • the EXPLAIN ANALYZE output for both versions (now() and generate_series()) can be found here: https://gist.github.com/JivanRoquet/a4f1c82ecf54b420844e652584317c76

A correlated subquery can do what you're asking.

select stuff
FROM generate_series('2020-09-01'::date, '2020-10-01'::date, interval '1 day') as ref_date
LEFT JOIN LATERAL
(select (other_stuff)) AS x on true
left join (more_stuff) y on y.timestamp < ref_date

This should generate a nested loop join, with the plan for the inner part matching your fast query. The LATERAL keyword forces the database to freshly evaluate the right-hand-side for each row in the left-hand-side.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM