简体   繁体   English

使用窗口功能加快慢速Postgres查询

[英]Speed up slow Postgres query with window functions

I'm trying to optimise a query, as the query generated by my ORM (Django) is causing timeouts. 我正在尝试优化查询,因为我的ORM(Django)生成的查询导致超时。 I've done everything possible within the ORM to run it as one query, so now I wanted to know if there are any Postgres tricks that can speed things up. 我已经完成了ORM中所有可能的操作以将其作为一个查询来运行,所以现在我想知道是否有任何Postgres技巧可以加快处理速度。

The database contains 1m+ and growing relationships (id, source and target) which I need to filter to exclude connections where the source doesn't appear at least 2 times. 数据库包含1m +和不断增长的关系(id,源和目标),我需要对其进行过滤以排除源至少出现2次的连接。

This is the current query - and the list of "target" ids can grow which leads to exponential slowdowns. 这是当前查询-“目标” ID的列表可能会增加,从而导致指数下降。

SELECT * FROM
(SELECT
    "source",
    "target",
    count("id") OVER (PARTITION BY "source") AS "count_match"
FROM
    "database_name"
WHERE
    ("database_name"."target" IN (123, 456, 789))
) AS temp_data WHERE "temp_data"."count_match" >= 2

I've read about VIEWS and temporary TABLES but that seems like a lot of setup and tear-down for a one-off query. 我已经阅读了有关VIEWS和临时TABLES但是对于一次过的查询而言,这似乎需要大量的设置和删除。

EDIT: Further info and tests on higher memory 编辑:更多信息和更高内存上的测试

Result of EXPLAIN ANALYSE : 解释EXPLAIN ANALYSE结果:

Subquery Scan on alias_test  (cost=622312.29..728296.62 rows=1177604 width=24) (actual time=10245.731..18019.237 rows=1604749 loops=1)
  Filter: (alias_test.count_match >= 2)
  Rows Removed by Filter: 2002738
  ->  WindowAgg  (cost=622312.29..684136.48 rows=3532811 width=20) (actual time=10245.687..16887.428 rows=3607487 loops=1)
        ->  Sort  (cost=622312.29..631144.32 rows=3532811 width=20) (actual time=10245.630..12455.796 rows=3607487 loops=1)
              Sort Key: database_name.source
              Sort Method: external merge  Disk: 105792kB
              ->  Bitmap Heap Scan on database_name  (cost=60934.74..238076.96 rows=3532811 width=20) (actual time=352.529..1900.162 rows=3607487 loops=1)
                    Recheck Cond: (target = ANY ('{5495502,80455548,10129504,2052517,11564026,1509187,1981101,1410001}'::bigint[]))
                    Heap Blocks: exact=33716
                    ->  Bitmap Index Scan on database_name_target_426d2f46_uniq  (cost=0.00..60051.54 rows=3532811 width=0) (actual time=336.457..336.457 rows=3607487 loops=1)
                          Index Cond: (target = ANY ('{5495502,80455548,10129504,2052517,11564026,1509187,1981101,1410001}'::bigint[]))
Planning time: 0.288 ms
Execution time: 18318.194 ms

Table structure: 表结构:

    Column     |           Type           |                                     Modifiers
---------------+--------------------------+-----------------------------------------------------------------------------------
 created_date  | timestamp with time zone | not null
 modified_date | timestamp with time zone | not null
 id            | integer                  | not null default nextval('database_name_id_seq'::regclass)
 source        | bigint                   | not null
 target        | bigint                   | not null
 active        | boolean                  | not null
Indexes:
    "database_name_pkey" PRIMARY KEY, btree (id)
    "database_name_source_24c75675_uniq" btree (source)
    "database_name_target_426d2f46_uniq" btree (target)

Hardware: 硬件:

I've tried increasing the server power to an 8GB memory instance and updated the .conf file with the following from PGTune: 我尝试将服务器功率增加到8GB内存实例,并使用PGTune中的以下内容更新了.conf文件:

max_connections = 10
shared_buffers = 2GB
effective_cache_size = 6GB
work_mem = 209715kB
maintenance_work_mem = 512MB
min_wal_size = 1GB
max_wal_size = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100

Despite the higher work_mem setting, it's still using a disk write for the merge which is confusing to me. 尽管work_mem设置较高,但仍使用磁盘写入进行合并,这令我感到困惑。 Perhaps the window function is causing this behaviour? 也许窗口功能导致了这种现象?

Your query is already optimal. 您的查询已经是最佳的。 There is no way to avoid scanning the whole table to get the information you need, and a sequential scan is the best way to do that. 无法避免扫描整个表以获取所需的信息,而顺序扫描是做到这一点的最佳方法。

Make sure that work_mem is big enough that the aggregationcan be done in memory – you can set log_temp_files to monitor if temporary files are used (which makes things much slower). 确保work_mem足够大,以便可以在内存中完成聚合–您可以设置log_temp_files来监视是否使用了临时文件(这会使速度变慢)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM