简体   繁体   中英

Selecting the latest values given data with missing records

... where "missing records" are identical to the last recorded value, hence no record.

This may be subjective, but I'm hoping there's a standardised way of doing this.

So, let's say I have a bunch of analytics in a MySQL table. There is some missing information, but as mentioned above, that's because their previous value is the same as the current value.

table "table":

id    value      datetime
1     5          1285891200    // Today
1     4          1285804800    // Yesterday
2     18         1285804800    // Yesterday
2     16         1285771094    // The day before yesterday

As you can see, I don't have a value for today for id 2.

If I wanted to pull the "most recent value" from this table (that is, 1's "today", and 2's "yesterday", how do I do that? I've achieved it by running the following query:

SELECT id, value FROM (SELECT * FROM table ORDER BY datetime DESC) as bleh GROUP BY id

Which utilizes a subquery to order the data first, and then I rely on "GROUP BY" to pick the first value (which, since it is ordered, is the most recent) from each id. However, I don't know if shoving a subquery in there is the best way to get the most recent value.

How would you do it?

The desired table:

id    value      datetime
1     5          1285891200    // Today
2     18         1285804800    // Yesterday

Thanks...

Gotta love MySQL for allowing an order by in a subquery. That's not allowed by the SQL standard :)

You could rewrite the query in a standards complaint way like:

select  *
from    YourTable a
where   not exists
        (
        select  *
        from    YourTable b
        where   a.id = b.id
        and     a.datetime < b.datetime
        )

In case there are duplicates that you can't split apart in the subquery, you can group by and then pick an arbitrary value:

select  a.id
,       max(a.value)
,       max(a.datetime)
from    YourTable a
where   not exists
        (
        select  *
        from    YourTable b
        where   a.id = b.id
        and     a.datetime < b.datetime
        )
group by
        a.id

This chooses the maximum a.value sharing the latest datetime . Now datetime is the same for all duplicate rows, but standard SQL doesn't know that, so you have to specify a way to pick from the equal days. Here, I'm using max , but min or even avg would work just as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM