简体   繁体   中英

Do I use an index to speed up this Postgres query?

Here's the query itself and the EXPLAIN result. I've done a lot of reading, but it's still very complex for me and I don't understand if I can use indexes to speed anything up here. It's currently taking probably ten minutes to execute. If anyone has suggestions on optimization, I'd really appreciate it!

Update I upgraded to PostgresSQL 13 and used pgtune.leopard.in.ua to optimize the configuration for my server, and now it takes about 60 seconds to run this. Much, much better. But I am still curious if there's some way I should be indexing this or if it's not going to get any faster?

SELECT
    p.name as playlist_name,
    p.description as playlist_description,
    o.user_id as owner_user_id,
    o.display_name as owner_display_name,
    percentile_disc(0.80) WITHIN GROUP (ORDER BY t.release_date) as percentile_release_date
FROM tracks t
JOIN playlists_tracks pt ON pt.track_id = t.id
JOIN playlists p ON pt.playlist_id = p.id
JOIN owners o ON p.owner_id = o.id
LEFT JOIN contacts c ON o.id = c.owner_id
WHERE t.user_id IS NOT NULL AND t.no_analysis IS NOT true AND c.owner_id IS NULL
GROUP BY p.id, o.user_id, o.display_name
QUERY PLAN
GroupAggregate  (cost=1941954.50..5592792.71 rows=25667715 width=136) (actual time=31191.535..59412.822 rows=248285 loops=1)
  Group Key: p.id, o.user_id, o.display_name
  Buffers: shared hit=175465 read=475303, temp read=732236 written=732286
  ->  Gather Merge  (cost=1941954.50..5015269.13 rows=25667715 width=136) (actual time=31191.000..47967.494 rows=43504313 loops=1)
        Workers Planned: 4
        Workers Launched: 4
        Buffers: shared hit=175459 read=475302, temp read=732236 written=732286
        ->  Sort  (cost=1940954.44..1956996.76 rows=6416929 width=136) (actual time=31046.494..33961.990 rows=8700863 loops=5)
              Sort Key: p.id, o.user_id, o.display_name
              Sort Method: external merge  Disk: 1195696kB
              Buffers: shared hit=175459 read=475302, temp read=732236 written=732286
              Worker 0:  Sort Method: external merge  Disk: 1191408kB
              Worker 1:  Sort Method: external merge  Disk: 1179872kB
              Worker 2:  Sort Method: external merge  Disk: 1106536kB
              Worker 3:  Sort Method: external merge  Disk: 1184376kB
              ->  Parallel Hash Join  (cost=485715.22..958481.32 rows=6416929 width=136) (actual time=8106.581..24000.603 rows=8700863 loops=5)
                    Hash Cond: (p.owner_id = o.id)
                    Buffers: shared hit=175279 read=475302
                    ->  Parallel Hash Join  (cost=393044.42..825701.28 rows=6417848 width=76) (actual time=5358.248..18777.840 rows=8898524 loops=5)
                          Hash Cond: (pt.playlist_id = p.id)
                          Buffers: shared hit=175103 read=410111
                          ->  Parallel Hash Join  (cost=177558.29..593368.29 rows=6417848 width=8) (actual time=4287.577..15642.345 rows=8898524 loops=5)
                                Hash Cond: (pt.track_id = t.id)
                                Buffers: shared read=410111
                                ->  Parallel Seq Scan on playlists_tracks pt  (cost=0.00..381946.95 rows=12900195 width=8) (actual time=0.700..7507.414 rows=8909338 loops=5)
                                      Buffers: shared read=252945
                                ->  Parallel Hash  (cost=169739.28..169739.28 rows=625521 width=8) (actual time=4283.741..4283.744 rows=881202 loops=5)
                                      Buckets: 8388608 (originally 4194304)  Batches: 1 (originally 1)  Memory Usage: 270752kB
                                      Buffers: shared read=157166
                                      ->  Parallel Seq Scan on tracks t  (cost=0.00..169739.28 rows=625521 width=8) (actual time=2.441..3691.324 rows=881202 loops=5)
                                            Filter: ((user_id IS NOT NULL) AND (no_analysis IS NOT TRUE))
                                            Rows Removed by Filter: 11694
                                            Buffers: shared read=157166
                          ->  Parallel Hash  (cost=193051.06..193051.06 rows=1794806 width=72) (actual time=1062.430..1062.433 rows=1471675 loops=5)
                                Buckets: 8388608  Batches: 1  Memory Usage: 509184kB
                                Buffers: shared hit=175103
                                ->  Parallel Seq Scan on playlists p  (cost=0.00..193051.06 rows=1794806 width=72) (actual time=0.074..342.586 rows=1471675 loops=5)
                                      Buffers: shared hit=175103
                    ->  Parallel Hash  (cost=82895.64..82895.64 rows=782012 width=68) (actual time=2739.889..2739.896 rows=1299732 loops=5)
                          Buckets: 8388608 (originally 4194304)  Batches: 1 (originally 1)  Memory Usage: 441920kB
                          Buffers: shared hit=56 read=65191
                          ->  Hash Anti Join  (cost=24.08..82895.64 rows=782012 width=68) (actual time=591.483..1976.640 rows=1299732 loops=5)
                                Hash Cond: (o.id = c.owner_id)
                                Buffers: shared hit=56 read=65191
                                ->  Parallel Seq Scan on owners o  (cost=0.00..72998.24 rows=782124 width=68) (actual time=0.166..1116.619 rows=1299993 loops=5)
                                      Buffers: shared read=65177
                                ->  Hash  (cost=18.48..18.48 rows=448 width=4) (actual time=591.292..591.294 rows=1309 loops=5)
                                      Buckets: 2048 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 63kB
                                      Buffers: shared hit=56 read=14
                                      ->  Seq Scan on contacts c  (cost=0.00..18.48 rows=448 width=4) (actual time=589.213..591.081 rows=1309 loops=5)
                                            Buffers: shared hit=56 read=14
Planning:
  Buffers: shared hit=232 read=10
Planning Time: 5.765 ms
JIT:
  Functions: 189
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 29.901 ms, Inlining 346.416 ms, Optimization 1783.670 ms, Emission 813.419 ms, Total 2973.406 ms
Execution Time: 59598.360 ms

You are essentially selecting all the data, so an index won't help you. PostgreSQL's estimates are correct, so the plan is probably good.

There is very little you can do except caching more in RAM and getting faster storage. Perhaps setting jit=off can make a slight difference. Even throwing more cores at the query won't help much, because most of the time is spent gathering results and grouping the data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM