简体   繁体   中英

Select row with most recent date per user

I have a table ("lms_attendance") of users' check-in and out times that looks like this:

id  user    time    io (enum)
1   9   1370931202  out
2   9   1370931664  out
3   6   1370932128  out
4   12  1370932128  out
5   12  1370933037  in

I'm trying to create a view of this table that would output only the most recent record per user id, while giving me the "in" or "out" value, so something like:

id  user    time    io
2   9   1370931664  out
3   6   1370932128  out
5   12  1370933037  in

I'm pretty close so far, but I realized that views won't accept subquerys, which is making it a lot harder. The closest query I got was:

select 
    `lms_attendance`.`id` AS `id`,
    `lms_attendance`.`user` AS `user`,
    max(`lms_attendance`.`time`) AS `time`,
    `lms_attendance`.`io` AS `io` 
from `lms_attendance` 
group by 
    `lms_attendance`.`user`, 
    `lms_attendance`.`io`

But what I get is:

id  user    time    io
3   6   1370932128  out
1   9   1370931664  out
5   12  1370933037  in
4   12  1370932128  out

Which is close, but not perfect. I know that last group by shouldn't be there, but without it, it returns the most recent time, but not with it's relative IO value.

Any ideas? Thanks!

Query:

SQLFIDDLEExample

SELECT t1.*
FROM lms_attendance t1
WHERE t1.time = (SELECT MAX(t2.time)
                 FROM lms_attendance t2
                 WHERE t2.user = t1.user)

Result:

| ID | USER |       TIME |  IO |
--------------------------------
|  2 |    9 | 1370931664 | out |
|  3 |    6 | 1370932128 | out |
|  5 |   12 | 1370933037 |  in |

Solution which gonna work everytime:

SQLFIDDLEExample

SELECT t1.*
FROM lms_attendance t1
WHERE t1.id = (SELECT t2.id
                 FROM lms_attendance t2
                 WHERE t2.user = t1.user            
                 ORDER BY t2.id DESC
                 LIMIT 1)

No need to trying reinvent the wheel, as this is commongreatest-n-per-group problem . Very nice solution is presented .

I prefer the most simplistic solution ( see SQLFiddle, updated Justin's ) without subqueries (thus easy to use in views):

SELECT t1.*
FROM lms_attendance AS t1
LEFT OUTER JOIN lms_attendance AS t2
  ON t1.user = t2.user 
        AND (t1.time < t2.time 
         OR (t1.time = t2.time AND t1.Id < t2.Id))
WHERE t2.user IS NULL

This also works in a case where there are two different records with the same greatest value within the same group - thanks to the trick with (t1.time = t2.time AND t1.Id < t2.Id) . All I am doing here is to assure that in case when two records of the same user have same time only one is chosen. Doesn't actually matter if the criteria is Id or something else - basically any criteria that is guaranteed to be unique would make the job here.

Based in @TMS answer, I like it because there's no need for subqueries but I think ommiting the 'OR' part will be sufficient and much simpler to understand and read.

SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
  ON t1.user = t2.user 
        AND t1.time < t2.time
WHERE t2.user IS NULL

if you are not interested in rows with null times you can filter them in the WHERE clause:

SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
  ON t1.user = t2.user 
        AND t1.time < t2.time
WHERE t2.user IS NULL and t1.time IS NOT NULL

Already solved, but just for the record, another approach would be to create two views...

CREATE TABLE lms_attendance
(id int, user int, time int, io varchar(3));

CREATE VIEW latest_all AS
SELECT la.user, max(la.time) time
FROM lms_attendance la 
GROUP BY la.user;

CREATE VIEW latest_io AS
SELECT la.* 
FROM lms_attendance la
JOIN latest_all lall 
    ON lall.user = la.user
    AND lall.time = la.time;

INSERT INTO lms_attendance 
VALUES
(1, 9, 1370931202, 'out'),
(2, 9, 1370931664, 'out'),
(3, 6, 1370932128, 'out'),
(4, 12, 1370932128, 'out'),
(5, 12, 1370933037, 'in');

SELECT * FROM latest_io;

Click here to see it in action at SQL Fiddle

If your on MySQL 8.0 or higher you can use Window functions :

Query:

DBFiddleExample

SELECT DISTINCT
FIRST_VALUE(ID) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS ID,
FIRST_VALUE(USER) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS USER,
FIRST_VALUE(TIME) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS TIME,
FIRST_VALUE(IO) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS IO
FROM lms_attendance;

Result:

| ID | USER |       TIME |  IO |
--------------------------------
|  2 |    9 | 1370931664 | out |
|  3 |    6 | 1370932128 | out |
|  5 |   12 | 1370933037 |  in |

The advantage I see over using the solution proposed by Justin is that it enables you to select the row with the most recent data per user (or per id, or per whatever) even from subqueries without the need for an intermediate view or table.

And in case your running a HANA it is also ~7 times faster :D

select b.* from 

    (select 
        `lms_attendance`.`user` AS `user`,
        max(`lms_attendance`.`time`) AS `time`
    from `lms_attendance` 
    group by 
        `lms_attendance`.`user`) a

join

    (select * 
    from `lms_attendance` ) b

on a.user = b.user
and a.time = b.time
 select result from (
     select vorsteuerid as result, count(*) as anzahl from kreditorenrechnung where kundeid = 7148
     group by vorsteuerid
 ) a order by anzahl desc limit 0,1

好吧,这可能是黑客攻击或容易出错,但不知何故这也有效-

SELECT id, MAX(user) as user, MAX(time) as time, MAX(io) as io FROM lms_attendance GROUP BY id;

I have done same thing like below

SELECT t1.* FROM lms_attendance t1 WHERE t1.id in (SELECT max(t2.id) as id FROM lms_attendance t2 group BY t2.user)

This will also reduce memory utilization.

Thanks.

I have tried one solution which works for me

    SELECT user, MAX(TIME) as time
      FROM lms_attendance
      GROUP by user
      HAVING MAX(time)

I have a very large table and all of the other suggestions here were taking a very long time to execute. I came up with this hacky method that was much faster. The downside is, if the max(date) row has a duplicate date for that user, it will return both of them.

SELECT * FROM mb_web.devices_log WHERE CONCAT(dtime, '-', user_id) in (
    SELECT concat(max(dtime), '-', user_id) FROM mb_web.devices_log GROUP BY user_id
)

Possibly you can do group by user and then order by time desc. Something like as below

  SELECT * FROM lms_attendance group by user order by time desc;

Try this query:

  select id,user, max(time), io 
  FROM lms_attendance group by user;

This worked for me:

SELECT user, time FROM 
(
    SELECT user, time FROM lms_attendance --where clause
) AS T 
WHERE (SELECT COUNT(0) FROM table WHERE user = T.user AND time > T.time) = 0
ORDER BY user ASC, time DESC

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM