In a recent programming interview, I was asked an SQL question to which I gave what I thought was a reasonable answer, but my answer elicited strong disapproval from the dba, and I wasn't able to figure out why.
Since then, I have thought about the problem some more, and I was unable to figure out what was so horrible about my answer, so I am seeking enlightenment here to find out the right way, or failing that, better ways of producing a report of libraries and the number of books in them from a database containing a table of libraries and a table of books.
I should note that I have changed the scenario a bit so that the wording is not identical to the interview question, but the task is the same.
Here is a minimal schema for the problem:
create table library (
id integer primary key,
name char(8)
);
create table book (
id integer primary key,
name char(8),
library_id integer,
foreign key (library_id) references library(id)
);
The task is to list names of libraries and the number of books in them for libraries with two or more books.
And, here is my proposed solution:
select
a.name as name,
b.nbooks as nbooks
from
library as a,
(
select
min(library_id) as library,
count(id) as nbooks
from
book
group by
library_id
) as b
where
( nbooks > 1 ) and (a.id = b.library)
;
On second thought, using an explicit inner join
might have been better. Other than that, could you please point out to me the potential pitfalls (either in general or in relation to a particular database) and the correct way to generate this report?
Here is a simple way of doing this:
select l.name, count(*) as numbooks
from library l join
books b
on l.id = b.library_id
group by l.name
having count(*) > 1
Your answer is technically ok. The DBA probably doesn't care about certain stylistic things that others might (such as using "a" as the alias for library rather than "l"). The subquery is unnecessary, and the min(library_id)
sticks out as unnecessary. You can apply aggregate functions to the group by columns, but that is typically not done.
The biggest problem -- which the DBA may be responding to -- is having the join condition in the WHERE
clause rather than in an ON
clause. This is dangerous, because if you leave it out or make what seems like an innocent modification, the query can become a CROSS JOIN instead of an INNER JOIN.
I see at least a few serious issues: 1) not using ANSI JOIN
syntax, 2) grouping by library_id
and also using an aggregate function on it.
I would write it like this to demonstrate that I knew how to do the query while returning additional library columns if necessary:
select l.lid, l.name, b.Count
from library l
inner join (
select library_id, count(*) as Count
from books
group by library_id
having Count > 1
) b on l.lid = b.library_id
I would also point out that I specifically did not group by library name in case two libraries had the same name.
What the DBA didn't like was likely the sub-SELECT. These should be avoided when possible, because they usually have very bad performance (they also look ugly in code form).
In this case it would have been better to use a JOIN.
SELECT library.name AS library
count( book.id ) AS books
FROM library
JOIN book ON book.library_id = library.id
GROUP BY book.id
HAVING count( book.id ) > 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.