简体   繁体   中英

Postgresql SELECT DISTINCT ON

In the query below I'm trying to select data but need it to be distinct on the columns Book.Title and the OrderDate. I have tried using DISTINCT ON but also get multiple results back which I will also post below. Sorry if the question is a bit trivial. I'm currently in the process of learning SQL so I'm a bit of a noob when it comes to what is probably something really obvious I have missed.

How could I change the query to get the results to be formatted as so:

   month   |                    title                    | quantity | total_value 
-----------+---------------------------------------------+----------+-------------
 February  | Internet and World Wide Web: How to Program |       15 |      899.70
 March     | C How To Program                            |        4 |      183.92
 March     | Core Servlets and JavaServer Pages          |       13 |      856.70
 March     | Internet and World Wide Web: How to Program |       21 |     1071.58

The query I constructed:

SELECT DISTINCT ON (OrderDate, Book.Title) Book.Title, to_char(OrderDate, 'Month') AS "Order Date", 
OrderLine.Quantity AS "Order Quantity", (SUM(Quantity*UnitSellingPrice)) AS "Total Value"
FROM Book
INNER JOIN Publisher
ON Book.PublisherID=Publisher.PublisherID
INNER JOIN OrderLine
ON Book.BookID=OrderLine.BookID
INNER JOIN ShopOrder
ON OrderLine.ShopOrderID=ShopOrder.ShopOrderID
WHERE Publisher.Name='Prentice Hall'
GROUP BY book.title, OrderDate, orderline.quantity
ORDER BY OrderDate ASC;

The results I get:

                    title                    | Order Date | Order Quantity | Total Value 
---------------------------------------------+------------+----------------+-------------
 Internet and World Wide Web: How to Program | February   |             10 |      299.90
 Internet and World Wide Web: How to Program | February   |              5 |      149.95
 C How To Program                            | March      |              3 |       68.97
 Core Servlets and JavaServer Pages          | March      |             10 |      329.50
 Internet and World Wide Web: How to Program | March      |             20 |      519.80
 C How To Program                            | March      |              1 |       22.99
 Core Servlets and JavaServer Pages          | March      |              3 |       98.85
 Internet and World Wide Web: How to Program | March      |              1 |       15.99

Thanks so much for any help!

I think you're over-complicating things here: if you want the total quantity for each title and month, then you want to group by the title and month, and sum the quantity. You don't want to group by the quantity itself, because that would mean a new row for each distinct quantity.

That's easy to do in standard SQL ( DISTINCT ON is a Postgres extension, useful in a few cases instead of GROUP BY , but unnecessary here):

SELECT
    Book.Title,
    to_char(OrderDate, 'Month') AS "Order Date", 
    SUM(OrderLine.Quantity) AS "Order Quantity", 
    SUM(Quantity*UnitSellingPrice) AS "Total Value"
FROM [...]
WHERE Publisher.Name='Prentice Hall'
GROUP BY
    book.title,
    to_char(OrderDate, 'Month')
ORDER BY to_char(OrderDate, 'Month') ASC;

The rule of thumb is that everything in your SELECT clause - and your ORDER BY clause, which is considered to be "at the same level" - should be either:

  • listed in the GROUP BY clause because you want a new row every time it varies
  • or wrapped in an aggregate function (here SUM() ) because you want to "roll it up" according to some rule.

To see what is going on, consider:

SELECT DISTINCT ON (OrderDate, Book.Title)
    Book.Title,
    OrderDate as "Actual Order Date",
    to_char(OrderDate, 'Month') AS "Order Month", ..

That is, the result in the post SELECTs a different "Order Date", really the order month, but is not the actual OrderDate value used in the DISTINCT/GROUP BY process which explains both the duplicates and incorrect aggregate results.

I suspect the desired query goes something like the following, and I've yet to find a need to use DISTINCT with GROUP BY.

SELECT
  -- These are in the GROUP BY and will be DISTINCT in the result
  Book.Title,
  to_char(OrderDate, 'Month') AS "Order Month", 
  -- Non GROUP BY values should be aggregated;
  -- In this case the aggregates are over Title/Month
  SUM(OrderLine.Quantity) AS "Order Quantity",
  SUM(Quantity*UnitSellingPrice) AS "Total Value"
FROM Book
INNER JOIN Publisher
ON Book.PublisherID=Publisher.PublisherID
INNER JOIN OrderLine
ON Book.BookID=OrderLine.BookID
INNER JOIN ShopOrder
ON OrderLine.ShopOrderID=ShopOrder.ShopOrderID
WHERE Publisher.Name='Prentice Hall'
-- For each Title/Month
GROUP BY book.title,
         to_char(OrderDate, 'Month')       -- also grouped on Month
ORDER BY to_char(OrderDate, 'Month') ASC;  -- and sorted on Month

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM