I have a database with cats
, toys
and their relationship cat_toys
To find the names of the cats with more than 5 toys, i have the following query:
select
cats.name
from
cats
join
cat_toys on cats.id = cat_toys.cat_id
group by
cats.id
having
count(cat_toys.toy_id) > 7
order by
cats.name
Column cats.name
does not appear in the group by
or be used in aggregate function, but this query works. in contrast, I cannot select anything in cat_toys
table.
Is this something special with psql?
The error message is trying to tell you. It is a general requirement in SQL that you need to list in the group by
clause all non-aggregaed columns that belong to the select
clause.
Postgres, unlike most other databases, is a bit more clever about that, and understands the notion of functionaly-dependent column: since you are grouping by the primary key of the cats
table, you are free to add any other column from that table (since they are functionaly dependent on the primary key). This is why your existing query works.
Now if you want to bring values from the cast_toys
table, it is different. There are potentially multiple rows in this table for each row in cats
, which, as a consequence, are not functionaly dependent on cats.id
. If you still want one row per cat, you need to make use of an aggregate function.
As an example, this generates a comma-separated list of all toy_id
s that relate to each cat:
select c.name, string_agg(ct.toy_id, ', ') toy_ids
from cats c
inner join cat_toys ct on t.id = ct.cat_id
group by c.id
having count(*) > 7
order by c.name
Side notes:
table aliases make the query easier to write and read
for this query, I recommend count(*)
instead of count(cat_toys.toy_id)
; this produces the same result (unless you have null
values in cat_toys.toy_id
, which seems unlikely here), and incurs less work for the database (since it does not need to check each value in the column against null
)
This is your query:
select c.name
from cats c join
cat_toys ct
on c.id = ct.cat_id
group by c.id
having count(ct.toy_id) > 7
order by c.name;
You are asking why it works: You are rightly observing that c.id
is in the group by
but not in the select
-- and another column is in the select
.Seems wrong. But it isn't. Postgres supports a little known part of the standard, related to functional dependency in aggregation queries.
Let me avoid the technical jargon. cats.id
is the primary key of cats
. That means the id
is unique, so knowing the id
specifies all other columns from cats
. The database knows this -- that it, it knows that the value of name
is always the same for a given id
. So, by aggregating on the primary key, you can access the other columns without using aggregation functions -- and it is consistent with the standard.
This is explained in the documentation :
When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns , since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.