有没有更有效的方法来编写这个 SQL？

Question

有没有办法在 Postgres 中更有效地编写这个？ 我将在其他几个查询中重新使用它。 A表很大，B表是A的1/3，C很小。

SELECT a.field1, b.field2, c.field3
FROM A a
         LEFT JOIN B b on a.ref_id = b.id
         LEFT JOIN C c on b.other_ref_id = c.id
WHERE a.field1 IN (...)

执行计划显示第一个 LEFT JOIN 的loops值很大。

解释计划：

Gather  (cost=1002.62..1290550.84 rows=856452 width=74) (actual time=0.495..1554.401 rows=850836 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=4022375 read=234277
  ->  Hash Left Join  (cost=2.62..1203905.64 rows=356855 width=74) (actual time=0.263..1441.760 rows=283612 loops=3)
        Hash Cond: (b.other_ref_id = c.id)
        Buffers: shared hit=4022375 read=234277
        ->  Nested Loop Left Join  (cost=1.13..1202967.39 rows=356855 width=44) (actual time=0.145..1402.434 rows=283612 loops=3)
              Buffers: shared hit=4022316 read=234277
              ->  Parallel Index Scan using some_existing_idx on A a  (cost=0.69..785157.53 rows=356855 width=30) (actual time=0.101..731.991 rows=283612 loops=3)
                    Index Cond: (field1 = ANY ('{1,2,3,4,5,6,7,8}'::bigint[]))
                    Buffers: shared hit=632106 read=225426
              ->  Index Scan using b_pkey on B b  (cost=0.44..1.17 rows=1 width=22) (actual time=0.002..0.002 rows=1 loops=850836)
                    Index Cond: (id = a.ref_id)
                    Buffers: shared hit=3390210 read=8851
        ->  Hash  (cost=1.22..1.22 rows=22 width=34) (actual time=0.024..0.024 rows=22 loops=3)
              Buckets: 1024  Batches: 1  Memory Usage: 10kB
              Buffers: shared hit=3
              ->  Seq Scan on C c  (cost=0.00..1.22 rows=22 width=34) (actual time=0.012..0.014 rows=22 loops=3)
                    Buffers: shared hit=3
Planning Time: 5.382 ms
Execution Time: 1581.816 ms

Answer 1

a和b之间的嵌套循环左连接可能是这里最有效的方法。 有 850000 个循环，但每次执行只需要 0.002 毫秒，总计大约 1.9 秒。 这是由三个工人并行完成的，因此实际时间约为 0.6 秒。

这与从并行索引扫描开始a 0.7 秒一起构成了执行时间。

另一种方法是在a和b之间执行 hash 连接，这需要对b和大型 hash 进行顺序扫描。 顺序扫描会更昂贵，或者work_mem配置得太小而无法包含生成的 hash。

改进的唯一机会是提高work_mem并查看执行是否会变得稍微快一些。

要测试我的分析是否正确，请尝试

SET enable_nestloop = off;

然后再次运行查询。 如果这使执行速度变慢，则 PostgreSQL 做了正确的事情。

有没有更有效的方法来编写这个 SQL？

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-09-23 21:31:45

有没有更有效的方法来编写这个 SQL？

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-09-23 21:31:45

解决方案1
0 已采纳 2019-09-23 21:31:45