简体   繁体   中英

Subquery or union of joins in postgres?

I have so-called links that can have tags assigned to them, so I store it in 3 tables:

  • tag: id, name
  • tag_in_link: tag_id, link_id
  • link: id, url

Now I need to get basic tag counts: how many times a tag was used (including 0 times). I have two queries:

select t.id, t.name, count(*)
from tag as t inner join tag_in_link as tl
    on tl.tag_id = t.id
group by t.id, t.name
union
select t.id, t.name, 0
from tag as t left outer join tag_in_link as tl
    on tl.tag_id = t.id where tl.tag_id is null

连接的联合说明

and

select t.id, t.name,
       (select count(*) from tag_in_link as tl
              where tl.tag_id = t.id
       ) as count from tag as t

相关子查询

they both give the same (up to the order of records) results and work almost as fast

Problem is that I don't have much data to test it, but I need to pick one way or another today. All I know is that, there will be:

  • up to 100 tags
  • millions of links

So my question:

  • which approach : a dependent subquery or union of joins has better performance on large tables in postgres?

The first query will be better for large data sets, because it does not force a nested loop.

But why don't you use the optimal query:

SELECT t.id, t.name, count(*)
FROM tag AS t LEFT JOIN tag_in_link AS tl
    ON tl.tag_id = t.id
GROUP BY t.id, t.name;

Consider combining UNION with a conditional aggregation, still avoiding the correlated subquery run for every row.

select t.id, t.name, 
       sum(case when tl.tag_id is null then 0 else 1 end) as tag_count
from tag as t 
left join tag_in_link as tl
    on tl.tag_id = t.id
group by t.id, t.name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM