简体   繁体   English

使用= any的FULL JOIN不使用索引

[英]FULL JOIN with =any doesn't use indexes

Using Postgres 9.3.5, I can't seem to get a full outer join with an =any where-clause to use the relevant indexes. 使用Postgres 9.3.5,我似乎无法获得带有=any where子句的完整外部联接以使用相关索引。

A minimal example: 一个最小的例子:

create table t1(i int primary key, j int);
create table t2(i int primary key, j int);

insert into t1 (select x,x from generate_series(1,1000000) x);
insert into t2 (select x,x from generate_series(1,1000000) x);

vacuum analyze;

explain analyze
    select * 
        from t1 full join t2 using(i) 
        where i =any (array[1,2]);

(In my real query, the array is a parameter and has variable length) (在我的实际查询中,数组是一个参数,并且具有可变长度)

I get the following seq-scanning query plan: 我得到以下seq-scanning查询计划:

 Hash Full Join  (cost=26925.00..66350.00 rows=10000 width=16) (actual time=178.308..1251.221 rows=2 loops=1)
   Hash Cond: (t1.i = t2.i)
   Filter: (COALESCE(t1.i, t2.i) = ANY ('{1,2}'::integer[]))
   Rows Removed by Filter: 999998
   ->  Seq Scan on t1  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.011..59.463 rows=1000000 loops=1)
   ->  Hash  (cost=14425.00..14425.00 rows=1000000 width=8) (actual time=178.212..178.212 rows=1000000 loops=1)
         Buckets: 131072  Batches: 1  Memory Usage: 39063kB
         ->  Seq Scan on t2  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.012..57.751 rows=1000000 loops=1)
 Total runtime: 1255.734 ms

Unsuccessful things I tried: 我尝试的失败的事情:

  • Use i in (1,2) or i=1 or i=2 instead of =any i in (1,2)使用i in (1,2)i=1 or i=2代替=any
  • set enable_seqscan to f

Simulating the full join with a left join and an anti-join works: 用左联接和反联接模拟完全联接:

explain analyze 
    select * from
        (select i, t1.j, t2.j from t1 left join t2 using(i) 
         union all
         select i, null, j from t2 
             where not exists (select 1 from t1 where t1.i = t2.i)) sub
    where i =any (array[1,2]);


 Append  (cost=0.85..51.61 rows=3 width=12) (actual time=0.007..0.018 rows=2 loops=1)
   ->  Nested Loop Left Join  (cost=0.85..29.79 rows=2 width=12) (actual time=0.007..0.010 rows=2 loops=1)
         ->  Index Scan using t1_pkey on t1  (cost=0.42..12.88 rows=2 width=8) (actual time=0.003..0.005 rows=2 loops=1)
               Index Cond: (i = ANY ('{1,2}'::integer[]))
         ->  Index Scan using t2_pkey on t2  (cost=0.42..8.44 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=2)
               Index Cond: (t1.i = i)
   ->  Nested Loop Anti Join  (cost=0.85..21.79 rows=1 width=8) (actual time=0.008..0.008 rows=0 loops=1)
         ->  Index Scan using t2_pkey on t2 t2_1  (cost=0.42..12.88 rows=2 width=8) (actual time=0.001..0.002 rows=2 loops=1)
               Index Cond: (i = ANY ('{1,2}'::integer[]))
         ->  Index Only Scan using t1_pkey on t1 t1_1  (cost=0.42..4.44 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=2)
               Index Cond: (i = t2_1.i)
               Heap Fetches: 0
 Total runtime: 0.065 ms

This approach would strongly complicate and add duplication to my real query, though. 但是,这种方法会使我的真实查询大大复杂化并增加重复项。 Is there any better way to get Postgres to use the indexes? 有没有更好的方法来使Postgres使用索引?

Pushing down the predicate into subqueries does the trick: 将谓词下推到子查询中可以达到以下目的:

EXPLAIN ANALYZE
SELECT * 
FROM      (SELECT * FROM t1 WHERE i = ANY ('{1,2}')) t1
FULL JOIN (SELECT * FROM t2 WHERE i = ANY ('{1,2}')) t2 USING (i);
 QUERY PLAN Merge Full Join (cost=0.58..25.26 rows=2 width=16) (actual time=0.084..0.100 rows=2 loops=1) Merge Cond: (t1.i = t2.i) -> Index Scan using t1_pkey on t1 (cost=0.29..12.62 rows=2 width=8) (actual time=0.044..0.048 rows=2 loops=1) Index Cond: (i = ANY ('{1,2}'::integer[])) -> Index Scan using t2_pkey on t2 (cost=0.29..12.62 rows=2 width=8) (actual time=0.028..0.033 rows=2 loops=1) Index Cond: (i = ANY ('{1,2}'::integer[])) Total runtime: 0.256 ms 

SQL Fiddle (with 100k rows). SQL Fiddle (具有10万行)。

Obviously, the query planner is not smart enough to conclude that indexes on the underlying tables can be used from a predicate on the column after the full join. 显然,查询计划程序不足以得出结论,认为完全连接可以从列上的谓词使用基础表上的索引。 This could be improved. 这可以改善。

Can't test pg 9.4 right now. 目前无法测试9.4版。 Maybe it has been improved. 也许它已经得到改善。

BTW, most clients can't deal with multiple columns in the result using the same name (even though Postgres can do this). 顺便说一句,大多数客户端不能使用相同的名称处理结果中的多个列(即使Postgres可以做到)。 Your two instances of j would be a problem and you'd have to use at least one column alias, forcing you to list columns explicitly. 您的j两个实例将是一个问题,并且您必须至少使用一个列别名,从而迫使您显式列出列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM