简体   繁体   English

如何根据psql中其他列的值删除一列中的重复项

[英]How to remove duplicates in one column based on the value of other columns in psql

I have a database which is supposed to imitate a library management system.我有一个应该模仿图书馆管理系统的数据库。 I want to write a query that, present a table that shows the top 3 borrowed books for each publisher, also showing their corresponding rank (so the book borrowed the most times from publisher X will show rank 1).我想编写一个查询,显示一个表格,显示每个出版商借阅的前 3 本书,同时显示它们的相应排名(因此从出版商 X 借阅次数最多的书将显示排名 1)。 I have a query that displays the information below - title of borrowed books together with their corresponding publisher, and the amount of times each book has been borrowed.我有一个查询,显示以下信息 - 借书的标题及其相应的出版商,以及每本书的借阅次数。 As you can see;如你看到的; Bloomsbury (UK) is present 7 times (one for each Harry Potter book) but I want it to only display the 3 most popular Harry Potter books in regards to amount of times borrowed.布卢姆斯伯里(英国)出现了 7 次(每本《哈利波特》书籍各出现一次),但我希望它只显示 3 部最受欢迎的《哈利波特》书籍的借阅次数。 I'm very thankful for any help.我非常感谢任何帮助。

                  title                   |       publisher        | times
------------------------------------------+------------------------+------
 Harry Potter and the Philosopher's Stone | Bloomsbury (UK)        |    2
 Harry Potter and the Deathly Hallows     | Bloomsbury (UK)        |    2
 Harry Potter the Goblet of Fire          | Bloomsbury (UK)        |    3
 The Fellowship of the Ring               | George Allen & Unwin   |    1
 Calculus                                 | Paerson Addison Wesley |    1
 Go Set a Watchman                        | HarperCollins          |    1
 Harry Potter the Half-Blood Prince       | Bloomsbury (UK)        |    4
 Harry Potter and the Chamber of Secrets  | Bloomsbury (UK)        |    3
 Harry Potter and Prisoner of Azkaban     | Bloomsbury (UK)        |    2
 Nineteen Eighty-Four                     | Secker & Warburg       |    1
 Harry Potter the Order of the Phoenix    | Bloomsbury (UK)        |    4
 To Kill a Mockingbird                    | J.B.Lippincott & Co    |    1

The query below will generate the view above.下面的查询将生成上面的视图。

SELECT title, publisher, COUNT(borrowed.resid) AS rank 
FROM borrowed 
  CROSS JOIN book 
  CROSS JOIN bookinfo 
WHERE borrowed.resid = book.resid 
  AND book.isbn = bookinfo.isbn 
  AND book.copynumber = borrowed.copynumber 
GROUP BY title, publisher;
SELECT title, publisher, times
FROM (
    SELECT *, RANK() OVER (PARTITION BY publisher ORDER BY times DESC) AS ranking
    FROM (
        SELECT title, publisher, COUNT(resid) AS times 
        FROM borrowed 
        JOIN book USING (resid, copynumber)
        JOIN bookinfo USING (isbn)
        GROUP BY title, publisher
    ) AS counts
) AS ranks
WHERE ranking <= 3
ORDER BY publisher, times DESC

counts is the part you wrote, adjusted to utilize USING to combine same named columns from both sides (makes it shorter) counts是您编写的部分,已调整为利用USING来组合双方相同的命名列(使其更短)

ranks is the part that ranks per publisher using rank function (window fuction) ranks是使用rank function (窗口函数)对每个发布者进行排名的部分

finally, we take the top 3 by picking ranking equal and lower than 3.最后,我们通过选择排名等于和低于 3 来获得前 3 名。

Fixing the joins and adding RANK:修复连接并添加 RANK:

select *
from 
 (
    SELECT title, publisher, COUNT(*) AS cnt,
       -- rank the counts
       rank() over (partition by publisher order by count(*) desc) as rnk 
    FROM borrowed 
      JOIN book 
        ON borrowed.resid = book.resid 
       AND book.copynumber = borrowed.copynumber 
      JOIN bookinfo 
        ON book.isbn = bookinfo.isbn 
    GROUP BY title, publisher
 ) as dt
where rnk <= 3

You might want to switch to ROW_NUMBER (exactly 3 rows) or DENSE_RANK (3 highest counts) instead of RANK (3 rows, maybe more if row #4+ got the same count as row #3).您可能想要切换到ROW_NUMBER (正好 3 行)或DENSE_RANK (3 个最高计数)而不是RANK (3 行,如果第 4+ 行的计数与第 3 行相同,则可能更多)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM