简体   繁体   中英

Speed up slow Postgres query with window functions

I'm trying to optimise a query, as the query generated by my ORM (Django) is causing timeouts. I've done everything possible within the ORM to run it as one query, so now I wanted to know if there are any Postgres tricks that can speed things up.

The database contains 1m+ and growing relationships (id, source and target) which I need to filter to exclude connections where the source doesn't appear at least 2 times.

This is the current query - and the list of "target" ids can grow which leads to exponential slowdowns.

SELECT * FROM
(SELECT
    "source",
    "target",
    count("id") OVER (PARTITION BY "source") AS "count_match"
FROM
    "database_name"
WHERE
    ("database_name"."target" IN (123, 456, 789))
) AS temp_data WHERE "temp_data"."count_match" >= 2

I've read about VIEWS and temporary TABLES but that seems like a lot of setup and tear-down for a one-off query.

EDIT: Further info and tests on higher memory

Result of EXPLAIN ANALYSE :

Subquery Scan on alias_test  (cost=622312.29..728296.62 rows=1177604 width=24) (actual time=10245.731..18019.237 rows=1604749 loops=1)
  Filter: (alias_test.count_match >= 2)
  Rows Removed by Filter: 2002738
  ->  WindowAgg  (cost=622312.29..684136.48 rows=3532811 width=20) (actual time=10245.687..16887.428 rows=3607487 loops=1)
        ->  Sort  (cost=622312.29..631144.32 rows=3532811 width=20) (actual time=10245.630..12455.796 rows=3607487 loops=1)
              Sort Key: database_name.source
              Sort Method: external merge  Disk: 105792kB
              ->  Bitmap Heap Scan on database_name  (cost=60934.74..238076.96 rows=3532811 width=20) (actual time=352.529..1900.162 rows=3607487 loops=1)
                    Recheck Cond: (target = ANY ('{5495502,80455548,10129504,2052517,11564026,1509187,1981101,1410001}'::bigint[]))
                    Heap Blocks: exact=33716
                    ->  Bitmap Index Scan on database_name_target_426d2f46_uniq  (cost=0.00..60051.54 rows=3532811 width=0) (actual time=336.457..336.457 rows=3607487 loops=1)
                          Index Cond: (target = ANY ('{5495502,80455548,10129504,2052517,11564026,1509187,1981101,1410001}'::bigint[]))
Planning time: 0.288 ms
Execution time: 18318.194 ms

Table structure:

    Column     |           Type           |                                     Modifiers
---------------+--------------------------+-----------------------------------------------------------------------------------
 created_date  | timestamp with time zone | not null
 modified_date | timestamp with time zone | not null
 id            | integer                  | not null default nextval('database_name_id_seq'::regclass)
 source        | bigint                   | not null
 target        | bigint                   | not null
 active        | boolean                  | not null
Indexes:
    "database_name_pkey" PRIMARY KEY, btree (id)
    "database_name_source_24c75675_uniq" btree (source)
    "database_name_target_426d2f46_uniq" btree (target)

Hardware:

I've tried increasing the server power to an 8GB memory instance and updated the .conf file with the following from PGTune:

max_connections = 10
shared_buffers = 2GB
effective_cache_size = 6GB
work_mem = 209715kB
maintenance_work_mem = 512MB
min_wal_size = 1GB
max_wal_size = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100

Despite the higher work_mem setting, it's still using a disk write for the merge which is confusing to me. Perhaps the window function is causing this behaviour?

Your query is already optimal. There is no way to avoid scanning the whole table to get the information you need, and a sequential scan is the best way to do that.

Make sure that work_mem is big enough that the aggregationcan be done in memory – you can set log_temp_files to monitor if temporary files are used (which makes things much slower).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM