Subquery or union of joins in postgres?

Question

I have so-called links that can have tags assigned to them, so I store it in 3 tables:

tag: id, name
tag_in_link: tag_id, link_id
link: id, url

Now I need to get basic tag counts: how many times a tag was used (including 0 times). I have two queries:

select t.id, t.name, count(*)
from tag as t inner join tag_in_link as tl
    on tl.tag_id = t.id
group by t.id, t.name
union
select t.id, t.name, 0
from tag as t left outer join tag_in_link as tl
    on tl.tag_id = t.id where tl.tag_id is null

and

select t.id, t.name,
       (select count(*) from tag_in_link as tl
              where tl.tag_id = t.id
       ) as count from tag as t

they both give the same (up to the order of records) results and work almost as fast

Problem is that I don't have much data to test it, but I need to pick one way or another today. All I know is that, there will be:

up to 100 tags
millions of links

So my question:

which approach : a dependent subquery or union of joins has better performance on large tables in postgres?

Answer 1

The first query will be better for large data sets, because it does not force a nested loop.

But why don't you use the optimal query:

SELECT t.id, t.name, count(*)
FROM tag AS t LEFT JOIN tag_in_link AS tl
    ON tl.tag_id = t.id
GROUP BY t.id, t.name;

Answer 2

Consider combining UNION with a conditional aggregation, still avoiding the correlated subquery run for every row.

select t.id, t.name, 
       sum(case when tl.tag_id is null then 0 else 1 end) as tag_count
from tag as t 
left join tag_in_link as tl
    on tl.tag_id = t.id
group by t.id, t.name

Subquery or union of joins in postgres?

Question

2 answers

solution1
1 2018-01-07 12:26:19

solution2
0 2018-01-07 14:40:35

Subquery or union of joins in postgres?

Question

2 answers

solution1 1 2018-01-07 12:26:19

solution2 0 2018-01-07 14:40:35

solution1
1 2018-01-07 12:26:19

solution2
0 2018-01-07 14:40:35