columns selected neither in GROUP BY cause or aggregate function?

Question

I have a database with cats , toys and their relationship cat_toys

To find the names of the cats with more than 5 toys, i have the following query:

select
  cats.name
from
  cats
join
  cat_toys on cats.id = cat_toys.cat_id
group by
  cats.id
having
  count(cat_toys.toy_id) > 7
order by
  cats.name

Column cats.name does not appear in the group by or be used in aggregate function, but this query works. in contrast, I cannot select anything in cat_toys table.

Is this something special with psql?

Answer 1

The error message is trying to tell you. It is a general requirement in SQL that you need to list in the group by clause all non-aggregaed columns that belong to the select clause.

Postgres, unlike most other databases, is a bit more clever about that, and understands the notion of functionaly-dependent column: since you are grouping by the primary key of the cats table, you are free to add any other column from that table (since they are functionaly dependent on the primary key). This is why your existing query works.

Now if you want to bring values from the cast_toys table, it is different. There are potentially multiple rows in this table for each row in cats , which, as a consequence, are not functionaly dependent on cats.id . If you still want one row per cat, you need to make use of an aggregate function.

As an example, this generates a comma-separated list of all toy_id s that relate to each cat:

select c.name, string_agg(ct.toy_id, ', ') toy_ids
from cats c
inner join cat_toys ct on t.id = ct.cat_id
group by c.id
having count(*) > 7
order by c.name

Side notes:

table aliases make the query easier to write and read
for this query, I recommend count(*) instead of count(cat_toys.toy_id) ; this produces the same result (unless you have null values in cat_toys.toy_id , which seems unlikely here), and incurs less work for the database (since it does not need to check each value in the column against null )

Answer 2

This is your query:

select c.name
from cats c join
     cat_toys ct
     on c.id = ct.cat_id
group by c.id
having count(ct.toy_id) > 7
order by c.name;

You are asking why it works: You are rightly observing that c.id is in the group by but not in the select -- and another column is in the select .Seems wrong. But it isn't. Postgres supports a little known part of the standard, related to functional dependency in aggregation queries.

Let me avoid the technical jargon. cats.id is the primary key of cats . That means the id is unique, so knowing the id specifies all other columns from cats . The database knows this -- that it, it knows that the value of name is always the same for a given id . So, by aggregating on the primary key, you can access the other columns without using aggregation functions -- and it is consistent with the standard.

This is explained in the documentation :

When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns , since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.

columns selected neither in GROUP BY cause or aggregate function?

Question

2 answers

solution1
2 2020-05-14 23:03:50

solution2
2 2020-05-14 23:19:23

columns selected neither in GROUP BY cause or aggregate function?

Question

2 answers

solution1 2 2020-05-14 23:03:50

solution2 2 2020-05-14 23:19:23

solution1
2 2020-05-14 23:03:50

solution2
2 2020-05-14 23:19:23