使用窗口功能加快慢速Postgres查询

Question

I'm trying to optimise a query, as the query generated by my ORM (Django) is causing timeouts. 我正在尝试优化查询，因为我的ORM（Django）生成的查询导致超时。 I've done everything possible within the ORM to run it as one query, so now I wanted to know if there are any Postgres tricks that can speed things up. 我已经完成了ORM中所有可能的操作以将其作为一个查询来运行，所以现在我想知道是否有任何Postgres技巧可以加快处理速度。

The database contains 1m+ and growing relationships (id, source and target) which I need to filter to exclude connections where the source doesn't appear at least 2 times. 数据库包含1m +和不断增长的关系（id，源和目标），我需要对其进行过滤以排除源至少出现2次的连接。

This is the current query - and the list of "target" ids can grow which leads to exponential slowdowns. 这是当前查询-“目标” ID的列表可能会增加，从而导致指数下降。

SELECT * FROM
(SELECT
    "source",
    "target",
    count("id") OVER (PARTITION BY "source") AS "count_match"
FROM
    "database_name"
WHERE
    ("database_name"."target" IN (123, 456, 789))
) AS temp_data WHERE "temp_data"."count_match" >= 2

I've read about VIEWS and temporary TABLES but that seems like a lot of setup and tear-down for a one-off query. 我已经阅读了有关VIEWS和临时TABLES但是对于一次过的查询而言，这似乎需要大量的设置和删除。

EDIT: Further info and tests on higher memory 编辑：更多信息和更高内存上的测试

Result of EXPLAIN ANALYSE : 解释EXPLAIN ANALYSE结果：

Subquery Scan on alias_test  (cost=622312.29..728296.62 rows=1177604 width=24) (actual time=10245.731..18019.237 rows=1604749 loops=1)
  Filter: (alias_test.count_match >= 2)
  Rows Removed by Filter: 2002738
  ->  WindowAgg  (cost=622312.29..684136.48 rows=3532811 width=20) (actual time=10245.687..16887.428 rows=3607487 loops=1)
        ->  Sort  (cost=622312.29..631144.32 rows=3532811 width=20) (actual time=10245.630..12455.796 rows=3607487 loops=1)
              Sort Key: database_name.source
              Sort Method: external merge  Disk: 105792kB
              ->  Bitmap Heap Scan on database_name  (cost=60934.74..238076.96 rows=3532811 width=20) (actual time=352.529..1900.162 rows=3607487 loops=1)
                    Recheck Cond: (target = ANY ('{5495502,80455548,10129504,2052517,11564026,1509187,1981101,1410001}'::bigint[]))
                    Heap Blocks: exact=33716
                    ->  Bitmap Index Scan on database_name_target_426d2f46_uniq  (cost=0.00..60051.54 rows=3532811 width=0) (actual time=336.457..336.457 rows=3607487 loops=1)
                          Index Cond: (target = ANY ('{5495502,80455548,10129504,2052517,11564026,1509187,1981101,1410001}'::bigint[]))
Planning time: 0.288 ms
Execution time: 18318.194 ms

Table structure: 表结构：

    Column     |           Type           |                                     Modifiers
---------------+--------------------------+-----------------------------------------------------------------------------------
 created_date  | timestamp with time zone | not null
 modified_date | timestamp with time zone | not null
 id            | integer                  | not null default nextval('database_name_id_seq'::regclass)
 source        | bigint                   | not null
 target        | bigint                   | not null
 active        | boolean                  | not null
Indexes:
    "database_name_pkey" PRIMARY KEY, btree (id)
    "database_name_source_24c75675_uniq" btree (source)
    "database_name_target_426d2f46_uniq" btree (target)

Hardware: 硬件：

I've tried increasing the server power to an 8GB memory instance and updated the .conf file with the following from PGTune: 我尝试将服务器功率增加到8GB内存实例，并使用PGTune中的以下内容更新了.conf文件：

max_connections = 10
shared_buffers = 2GB
effective_cache_size = 6GB
work_mem = 209715kB
maintenance_work_mem = 512MB
min_wal_size = 1GB
max_wal_size = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100

Despite the higher work_mem setting, it's still using a disk write for the merge which is confusing to me. 尽管work_mem设置较高，但仍使用磁盘写入进行合并，这令我感到困惑。 Perhaps the window function is causing this behaviour? 也许窗口功能导致了这种现象？

Answer 1

Your query is already optimal. 您的查询已经是最佳的。 There is no way to avoid scanning the whole table to get the information you need, and a sequential scan is the best way to do that. 无法避免扫描整个表以获取所需的信息，而顺序扫描是做到这一点的最佳方法。

Make sure that work_mem is big enough that the aggregationcan be done in memory – you can set log_temp_files to monitor if temporary files are used (which makes things much slower). 确保work_mem足够大，以便可以在内存中完成聚合–您可以设置log_temp_files来监视是否使用了临时文件（这会使速度变慢）。

使用窗口功能加快慢速Postgres查询

问题描述

EDIT: Further info and tests on higher memory 编辑：更多信息和更高内存上的测试

1 个解决方案

解决方案1
1 已采纳 2017-05-13 08:17:15

使用窗口功能加快慢速Postgres查询

问题描述

EDIT: Further info and tests on higher memory 编辑：更多信息和更高内存上的测试

1 个解决方案

解决方案1 1 已采纳 2017-05-13 08:17:15

解决方案1
1 已采纳 2017-05-13 08:17:15