简体   繁体   中英

In SQL, how can I create a list of libraries with a count of books from a database containing a table of libraries and a table of books?

In a recent programming interview, I was asked an SQL question to which I gave what I thought was a reasonable answer, but my answer elicited strong disapproval from the dba, and I wasn't able to figure out why.

Since then, I have thought about the problem some more, and I was unable to figure out what was so horrible about my answer, so I am seeking enlightenment here to find out the right way, or failing that, better ways of producing a report of libraries and the number of books in them from a database containing a table of libraries and a table of books.

I should note that I have changed the scenario a bit so that the wording is not identical to the interview question, but the task is the same.

Here is a minimal schema for the problem:

create table library (
  id integer primary key,
  name char(8)
);

create table book (
  id integer primary key,
  name char(8),
  library_id integer,
  foreign key (library_id) references library(id)
);

The task is to list names of libraries and the number of books in them for libraries with two or more books.

And, here is my proposed solution:

select
  a.name as name,
  b.nbooks as nbooks
from
    library as a,
    (
        select
            min(library_id) as library,
            count(id) as nbooks
        from
            book
        group by 
            library_id
    ) as b
where
    ( nbooks > 1 ) and (a.id = b.library)
;

On second thought, using an explicit inner join might have been better. Other than that, could you please point out to me the potential pitfalls (either in general or in relation to a particular database) and the correct way to generate this report?

Here is a simple way of doing this:

select l.name, count(*) as numbooks
from library l join
     books b
     on l.id = b.library_id
group by l.name
having count(*) > 1

Your answer is technically ok. The DBA probably doesn't care about certain stylistic things that others might (such as using "a" as the alias for library rather than "l"). The subquery is unnecessary, and the min(library_id) sticks out as unnecessary. You can apply aggregate functions to the group by columns, but that is typically not done.

The biggest problem -- which the DBA may be responding to -- is having the join condition in the WHERE clause rather than in an ON clause. This is dangerous, because if you leave it out or make what seems like an innocent modification, the query can become a CROSS JOIN instead of an INNER JOIN.

I see at least a few serious issues: 1) not using ANSI JOIN syntax, 2) grouping by library_id and also using an aggregate function on it.

I would write it like this to demonstrate that I knew how to do the query while returning additional library columns if necessary:

select l.lid, l.name, b.Count
from library l 
inner join (
    select library_id, count(*) as Count
    from books
    group by library_id
    having Count > 1 
) b on l.lid = b.library_id 

I would also point out that I specifically did not group by library name in case two libraries had the same name.

What the DBA didn't like was likely the sub-SELECT. These should be avoided when possible, because they usually have very bad performance (they also look ugly in code form).

In this case it would have been better to use a JOIN.

SELECT library.name AS library
       count( book.id ) AS books
  FROM library
  JOIN book ON book.library_id = library.id
  GROUP BY book.id
  HAVING count( book.id ) > 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM