简体   繁体   中英

SQL: Any straightforward way to order results FIRST, THEN group by another column?

I see that in SQL, the GROUP BY has to precede ORDER BY expression. Does this imply that ordering is done after grouping discards identical rows/columns?

Because I seem to need to order rows by a timestamp column A first, THEN discarding rows with identical value in column A. Not sure how to accomplish this...

I am using MySQL 5.1.41

create table
(
    A int,
    B timestamp
)

The data could be:

+-----+-----------------------+
|  A  |  B                    |
+-----+-----------------------+
|  1  |  today                |
|  1  |  yesterday            |
|  2  |  yesterday            |
|  2  |  tomorrow             |
+-----+-----------------------+

The results I am aiming for would be:

+-----+-----------------------+
|  A  |  B                    |
+-----+-----------------------+
|  1  |  today                |
|  2  |  tomorrow             |
+-----+-----------------------+

Basically, I want the rows with the latest timestamp in column B (think ORDER BY), and only one row for each value in column A (think DISTINCT or GROUP BY).

My actual project details, if you need these:

In real life, I have two tables - users and payment_receipts .

create table users
(
    phone_nr int(10) unsigned not null,
    primary key (phone_nr)
)

create table payment_receipts
(
    phone_nr int(10) unsigned not null,
    payed_ts timestamp default current_timestamp not null,
    payed_until_ts timestamp not null,
    primary key (phone_nr, payed_ts, payed_until_ts)
)

The tables may include other columns, I omitted all that IMO is irrelevant here. As part of a mobile-payment scheme, I have to send SMS to users across the mobile cell network in periodic intervals, depending of course on whether the payment is due or not. The payment is actualized when the SMS is sent, which is premium-taxed. I keep records of all payments done with the payment_receipts table, for book-keeping, which simulates a real shop where both a buyer and seller get a copy of the receipt of purchase, for reference. This table stores my (sellers) copy of each receipt. The customers receipt is the received SMS itself. Each time an SMS is sent (and thus a payment is accomplished), the table is inserted a receipt record, stating who payed, when and "until when". To explain the latter, imagine a subscription service, but one which spans indefinitely until a user opt-out explicitly, at which point the user record is removed. A payment is made a month in advance, so as a rule, the difference between the payed_ts and payed_until_ts is 30 days worth of time.

Naturally I have a batch job that executes every day and needs to select a list of users that are due monthly payment as part of automatic subscription renewal. To link this to the dummy example earlier, the phone number column phone_nr is a and payed_until_ts is b , but in actual code there are two tables, which bring me to the following behavior and its implications: when a user record is removed, the receipt remains, for bookkeeping. So, not only do I need to group payments by date and discard all but the latest payment receipt date, I also need to watch out not to select receipts where there no longer is a matching user record.

I am solving the problem of selecting records that are due payment by finding the receipts with the latest payed_until_ts value (as in most cases there will be several receipts for each phone number) for each phone_nr and out of those rows I further need to leave only those phone_numbers where the payed_until_ts is earlier than the time the batch job executes. I loop over the list of these numbers and send out payments, storing a new receipt for each sent SMS, where payed_ts is now() and payed_until_ts is now() + interval 30 days .

Select a,b from (select a,b from table order by b) as c group by a;

Yes, grouping is done first, and it affects a single select whereas ordering affects all the results from all select statements in a union , such as:

select a, 'max', max(b) from tbl group by a
union all select a, 'min', min(b) from tbl group by a
order by 1, 2

(using field numbers in order by since I couldn't be bothered to name my columns). Each group by affects only its select , the order by affects the combined result set.

It seems that what you're after can be achieved with:

select A, max(B) from tbl group by A

This uses the max aggregation function to basically do your pre-group ordering (it doesn't actually sort it in any decent DBMS, rather it will simply choose the maximum from an suitable index if available).

According to your new rules (tested with PostgreSQL)


Query You'd Want:

SELECT    pr.phone_nr, pr.payed_ts, pr.payed_until_ts 
FROM      payment_receipts pr
JOIN      users
          ON (pr.phone_nr = users.phone_nr)
   JOIN      (select phone_nr, max(payed_until_ts) as payed_until_ts 
              from payment_receipts 
              group by phone_nr
             ) sub
             ON (    pr.phone_nr       = sub.phone_nr 
                 AND pr.payed_until_ts = sub.payed_until_ts)
ORDER BY  pr.phone_nr, pr.payed_ts, pr.payed_until_ts;


Original Answer (with updates):

CREATE TABLE foo (a NUMERIC, b TEXT, DATE);

INSERT INTO foo VALUES 
   (1,'a','2010-07-30'),
   (1,'b','2010-07-30'),
   (1,'c','2010-07-31'),
   (1,'d','2010-07-31'),
   (1,'a','2010-07-29'),
   (1,'c','2010-07-29'),
   (2,'a','2010-07-29'),
   (2,'a','2010-08-01');

-- table contents
SELECT * FROM foo ORDER BY c,a,b;
 a | b |     c      
---+---+------------
 1 | a | 2010-07-29
 1 | c | 2010-07-29
 2 | a | 2010-07-29
 1 | a | 2010-07-30
 1 | b | 2010-07-30
 1 | c | 2010-07-31
 1 | d | 2010-07-31
 2 | a | 2010-08-01

-- The following solutions both retrieve records based on the latest date
--    they both return the same result set, solution 1 is faster, solution 2
--    is easier to read

-- Solution 1: 
SELECT    foo.a, foo.b, foo.c 
FROM      foo
JOIN      (select a, max(c) as c from foo group by a) bar
  ON      (foo.a=bar.a and foo.c=bar.c)
ORDER BY  foo.a, foo.b, foo.c;

-- Solution 2: 
SELECT    a, b, MAX(c) AS c 
FROM      foo main
GROUP BY  a, b
HAVING    MAX(c) = (select max(c) from foo sub where main.a=sub.a group by a)
ORDER BY  a, b;

 a | b |     c      
---+---+------------
 1 | c | 2010-07-31
 1 | d | 2010-07-31
 2 | a | 2010-08-01
(3 rows)  


Comment:
1 is returned twice because their are multiple b values. This is acceptable (and advised). Your data should never have this problem, because c is based on b 's value.

SELECT DISTINCT a,b
FROM tbl t
WHERE b = (SELECT MAX(b) FROM tbl WHERE tbl.a = t.a);
create table user_payments
(
    phone_nr int NOT NULL,
    payed_until_ts datetime NOT NULL
)

insert into user_payments
(phone_nr, payed_until_ts)
values
(1, '2016-01-28'), -- today
(1, '2016-01-27'), -- yesterday  
(2, '2016-01-27'), -- yesterday 
(2, '2016-01-29')  -- tomorrow

select phone_nr, MAX(payed_until_ts) as latest_payment
from user_payments
group by phone_nr

-- OUTPUT:
-- phone_nr latest_payment
-- 1        2016-01-28 00:00:00.000
-- 2        2016-01-29 00:00:00.000

In the above example, I have used datetime column but similar query should work for timestamp column.

The MAX function will basically do the "ORDER BY" payed_until_ts column and pick the latest value for each phone_nr. Also, you will get only one value for each phone_nr due to "GROUP BY" clause.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM