简体   繁体   English

postgres-xl left join 执行时间太长

[英]postgres-xl left join takes too long to execute

Postgres-XL 9.5r1.6 consists of a gtm, a coordinator and two datanodes. Postgres-XL 9.5r1.6 由一个 gtm、一个协调器和两个数据节点组成。

There are three tables a , b and c which implements a many-to-many relationship:有三个表abc实现了多对多关系:

create table a(id int, name text, uid int) distribute by hash(uid);
create table b(id int, name text, uid int) distribute by hash(uid);
create table c(id int, aname text, bname text, uid int) distribute by hash(uid);

when run following query on coordinator it takes inexplicable time of 20000 milliseconds !协调器上运行以下查询时,需要20000 毫秒的莫名其妙的时间! but on either datanodes execution time is hardly more than 20 milliseconds .但在任一数据节点上的执行时间都几乎不超过20 毫秒

select a.name, b.name

from 
       a left join c
       on a.name=c.aname

          left join b
          on c.bname=b.name
where
       a.name='cf82c96b77b8aa5277da6d55c4e4e66e';

explain plan on coordinator:解释协调员的计划:

Remote Subquery Scan on all (dn_1,dn_2)  (cost=8.33..17.78 rows=1 width=66)


 ->  Nested Loop Left Join  (cost=8.33..17.78 rows=1 width=66)
         Join Filter: ((a.name)::text = (c.aname)::text)
         ->  Remote Subquery Scan on all (dn_1,dn_2)  (cost=100.15..108.21 rows=1 width=33)
               Distribute results by H: name
               ->  Index Only Scan using code_idx on a  (cost=0.15..8.17 rows=1 width=33)
                     Index Cond: (name = 'cf82c96b77b8aa5277da6d55c4e4e66e'::text)
         ->  Materialize  (cost=108.18..109.72 rows=1 width=115)
               ->  Remote Subquery Scan on all (dn_1,dn_2)  (cost=108.18..109.72 rows=1 width=115)
                     Distribute results by H: aname
                     ->  Hash Right Join  (cost=8.18..9.60 rows=1 width=115)
                           Hash Cond: ((b.name)::text = (c.bname)::text)
                           ->  Remote Subquery Scan on all (dn_1,dn_2)  (cost=100.00..102.44 rows=30 width=33)
                                 Distribute results by H: name
                                 ->  Seq Scan on b  (cost=0.00..1.30 rows=30 width=33)
                           ->  Hash  (cost=108.41..108.41 rows=1 width=244)
                                 ->  Remote Subquery Scan on all (dn_1,dn_2)  (cost=100.15..108.41 rows=1 width=244)
                                       Distribute results by H: bname
                                       ->  Index Only Scan using code_idxcfc on c  (cost=0.15..8.17 rows=1 width=244)
                                             Index Cond: (aname = 'cf82c96b77b8aa5277da6d55c4e4e66e'::text)

some other guy already hit this problem and asked here but with no answer or hint.其他一些人已经遇到了这个问题并在这里但没有答案或提示。 I'm just hoping this time the question gets some insight.我只是希望这次问题得到一些见解。

ps: I tried to fill the three tables in a way that related rows from a and b which form table c only come from same datanode. ps:我试图以一种方式填充这三个表,即形成表c ab中的相关行仅来自同一个数据节点。 But the execution time showed no improvment.但执行时间没有改善。 Other point worth noting is that when condition in where clause ( a.name='cf82c96b77b8aa5277da6d55c4e4e66e' ) is always false, then the execution time drop low less than few milliseconds.其他值得注意的一点是,当where子句( a.name='cf82c96b77b8aa5277da6d55c4e4e66e' )中的条件始终为假时,执行时间会下降到不到几毫秒。

For this query:对于此查询:

select a.name, b.name
from a left join
     c
     on a.name = c.aname left join
     b
     on c.bname = b.name
where a.name = 'cf82c96b77b8aa5277da6d55c4e4e66e';

You want indexes on a(name) , b(name) , and c(name) .您需要a(name)b(name)c(name)a(name)索引。 Your partitions are not going to help this query and you should only keep them if the tables are really big.你的分区不会帮助这个查询,如果表真的很大,你应该只保留它们。

This is due to the nested loop, set it to false.这是由于嵌套循环,将其设置为 false。 XL will use hash join and then it will return results fastly XL会使用hash join然后快速返回结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM