简体   繁体   中英

How to remove duplicates in one column based on the value of other columns in psql

I have a database which is supposed to imitate a library management system. I want to write a query that, present a table that shows the top 3 borrowed books for each publisher, also showing their corresponding rank (so the book borrowed the most times from publisher X will show rank 1). I have a query that displays the information below - title of borrowed books together with their corresponding publisher, and the amount of times each book has been borrowed. As you can see; Bloomsbury (UK) is present 7 times (one for each Harry Potter book) but I want it to only display the 3 most popular Harry Potter books in regards to amount of times borrowed. I'm very thankful for any help.

                  title                   |       publisher        | times
------------------------------------------+------------------------+------
 Harry Potter and the Philosopher's Stone | Bloomsbury (UK)        |    2
 Harry Potter and the Deathly Hallows     | Bloomsbury (UK)        |    2
 Harry Potter the Goblet of Fire          | Bloomsbury (UK)        |    3
 The Fellowship of the Ring               | George Allen & Unwin   |    1
 Calculus                                 | Paerson Addison Wesley |    1
 Go Set a Watchman                        | HarperCollins          |    1
 Harry Potter the Half-Blood Prince       | Bloomsbury (UK)        |    4
 Harry Potter and the Chamber of Secrets  | Bloomsbury (UK)        |    3
 Harry Potter and Prisoner of Azkaban     | Bloomsbury (UK)        |    2
 Nineteen Eighty-Four                     | Secker & Warburg       |    1
 Harry Potter the Order of the Phoenix    | Bloomsbury (UK)        |    4
 To Kill a Mockingbird                    | J.B.Lippincott & Co    |    1

The query below will generate the view above.

SELECT title, publisher, COUNT(borrowed.resid) AS rank 
FROM borrowed 
  CROSS JOIN book 
  CROSS JOIN bookinfo 
WHERE borrowed.resid = book.resid 
  AND book.isbn = bookinfo.isbn 
  AND book.copynumber = borrowed.copynumber 
GROUP BY title, publisher;
SELECT title, publisher, times
FROM (
    SELECT *, RANK() OVER (PARTITION BY publisher ORDER BY times DESC) AS ranking
    FROM (
        SELECT title, publisher, COUNT(resid) AS times 
        FROM borrowed 
        JOIN book USING (resid, copynumber)
        JOIN bookinfo USING (isbn)
        GROUP BY title, publisher
    ) AS counts
) AS ranks
WHERE ranking <= 3
ORDER BY publisher, times DESC

counts is the part you wrote, adjusted to utilize USING to combine same named columns from both sides (makes it shorter)

ranks is the part that ranks per publisher using rank function (window fuction)

finally, we take the top 3 by picking ranking equal and lower than 3.

Fixing the joins and adding RANK:

select *
from 
 (
    SELECT title, publisher, COUNT(*) AS cnt,
       -- rank the counts
       rank() over (partition by publisher order by count(*) desc) as rnk 
    FROM borrowed 
      JOIN book 
        ON borrowed.resid = book.resid 
       AND book.copynumber = borrowed.copynumber 
      JOIN bookinfo 
        ON book.isbn = bookinfo.isbn 
    GROUP BY title, publisher
 ) as dt
where rnk <= 3

You might want to switch to ROW_NUMBER (exactly 3 rows) or DENSE_RANK (3 highest counts) instead of RANK (3 rows, maybe more if row #4+ got the same count as row #3).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM