I'm trying to optimise a query, as the query generated by my ORM (Django) is causing timeouts. I've done everything possible within the ORM to run it as one query, so now I wanted to know if there are any Postgres tricks that can speed things up.
The database contains 1m+ and growing relationships (id, source and target) which I need to filter to exclude connections where the source doesn't appear at least 2 times.
This is the current query - and the list of "target" ids can grow which leads to exponential slowdowns.
SELECT * FROM
(SELECT
"source",
"target",
count("id") OVER (PARTITION BY "source") AS "count_match"
FROM
"database_name"
WHERE
("database_name"."target" IN (123, 456, 789))
) AS temp_data WHERE "temp_data"."count_match" >= 2
I've read about VIEWS
and temporary TABLES
but that seems like a lot of setup and tear-down for a one-off query.
Result of EXPLAIN ANALYSE
:
Subquery Scan on alias_test (cost=622312.29..728296.62 rows=1177604 width=24) (actual time=10245.731..18019.237 rows=1604749 loops=1)
Filter: (alias_test.count_match >= 2)
Rows Removed by Filter: 2002738
-> WindowAgg (cost=622312.29..684136.48 rows=3532811 width=20) (actual time=10245.687..16887.428 rows=3607487 loops=1)
-> Sort (cost=622312.29..631144.32 rows=3532811 width=20) (actual time=10245.630..12455.796 rows=3607487 loops=1)
Sort Key: database_name.source
Sort Method: external merge Disk: 105792kB
-> Bitmap Heap Scan on database_name (cost=60934.74..238076.96 rows=3532811 width=20) (actual time=352.529..1900.162 rows=3607487 loops=1)
Recheck Cond: (target = ANY ('{5495502,80455548,10129504,2052517,11564026,1509187,1981101,1410001}'::bigint[]))
Heap Blocks: exact=33716
-> Bitmap Index Scan on database_name_target_426d2f46_uniq (cost=0.00..60051.54 rows=3532811 width=0) (actual time=336.457..336.457 rows=3607487 loops=1)
Index Cond: (target = ANY ('{5495502,80455548,10129504,2052517,11564026,1509187,1981101,1410001}'::bigint[]))
Planning time: 0.288 ms
Execution time: 18318.194 ms
Table structure:
Column | Type | Modifiers
---------------+--------------------------+-----------------------------------------------------------------------------------
created_date | timestamp with time zone | not null
modified_date | timestamp with time zone | not null
id | integer | not null default nextval('database_name_id_seq'::regclass)
source | bigint | not null
target | bigint | not null
active | boolean | not null
Indexes:
"database_name_pkey" PRIMARY KEY, btree (id)
"database_name_source_24c75675_uniq" btree (source)
"database_name_target_426d2f46_uniq" btree (target)
Hardware:
I've tried increasing the server power to an 8GB memory instance and updated the .conf
file with the following from PGTune:
max_connections = 10
shared_buffers = 2GB
effective_cache_size = 6GB
work_mem = 209715kB
maintenance_work_mem = 512MB
min_wal_size = 1GB
max_wal_size = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
Despite the higher work_mem
setting, it's still using a disk write for the merge which is confusing to me. Perhaps the window function is causing this behaviour?
Your query is already optimal. There is no way to avoid scanning the whole table to get the information you need, and a sequential scan is the best way to do that.
Make sure that work_mem
is big enough that the aggregationcan be done in memory – you can set log_temp_files
to monitor if temporary files are used (which makes things much slower).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.