简体   繁体   中英

SQL request to both group responses and retrieve only the last line of each

I have a table without IDs. it has 3 columns: the name of a computer, its status (on/off) at the moment of the poll, and the timestamp of the insertion.

if I run

select * from computers group by name;

I get a line for each computer (there are 200 different ones), but these lines don't always hold the latest entry for it. I then tried

select computers group by name order by timestamp asc;

But I get incoherent responses (some recent timestamps, some old ones... no idea why).

It's basically the same problem as here : SQL: GROUP BY records and then get last record from each group? , but I don't have ids to help :(

You can write:

SELECT computers.name,
       computers.status,
       computers.timestamp
  FROM ( SELECT name,
                MAX(timestamp) AS max_timestamp
           FROM computers
          GROUP
             BY name
       ) AS t
  JOIN computers
    ON computers.name = t.name
   AND computers.timestamp = t.max_timestamp
;

The above uses this subquery to finds the greatest timestamp for each name :

SELECT name
       MAX(timestamp) AS max_timestamp
  FROM computers
 GROUP
    BY name
;

and then it gathers fields from computers whose name and timestamp match something that the subquery returned.

The reason that your order by clause has no effect is that it comes too "late": it's used to order records that are going to be returned, after it's already determined that they will be returned. To quote from §11.16.3 " GROUP BY and HAVING with Hidden Columns" in the MySQL 5.6 Reference Manual on this subject:

The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses.

Another way is to write a correlated subquery, and dispense with the GROUP BY entirely. This:

SELECT name, status, timestamp
  FROM computers AS c1
 WHERE NOT EXISTS
        ( SELECT 1
            FROM computers
           WHERE name = c1.name
             AND timestamp > c1.timestamp
        )
;

finds all rows in computers that haven't been superseded by more-recent rows with the same name . The same approach can be done with a join:

SELECT c1.name, c1.status, c1.timestamp
  FROM computers AS c1
  LEFT
 OUTER
  JOIN computers AS c2
    ON c2.name = c1.name
   AND c2.timestamp > c1.timestamp
 WHERE c2.name IS NULL
;

which is less clear IMHO, but may perform better.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM