简体   繁体   中英

How do I join on “most recent” records?

I've got two tables in a SQL Server 2000 database joined by a parent child relationship. In the child database, the unique key is made up of the parent id and the datestamp.

I'm needing to do a join on these tables such that only the most recent entry for each child is joined.

Can anyone give me any hints how I can go about this?

Here's the most optimized way I've found to do this. I tested it against several structures and this way had the lowest IO compared to other approaches.

This sample would get the last revision to an article

SELECT t.*
FROM ARTICLES AS t
    --Join the the most recent history entries
        INNER JOIN  REVISION lastHis ON t.ID = lastHis.FK_ID
        --limits to the last history in the WHERE statement
            LEFT JOIN REVISION his2 on lastHis.FK_ID = his2.FK_ID and lastHis.CREATED_TIME < his2.CREATED_TIME
WHERE his2.ID is null

If you had a table which just contained the most recent entry for each parent, and the parent's id, then it would be easy, right?

You can make a table like that by joining the child table on itself, taking only the maximum datestamp for each parent id. Something like this (your SQL dialect may vary):

   SELECT t1.*
     FROM child AS t1
LEFT JOIN child AS t2
       ON (t1.parent_id = t2.parent_id and t1.datestamp < t2.datestamp)
    WHERE t2.datestamp IS NULL

That gets you all of the rows in the child table for which no higher timestamp exists, for that parent id. You can use that table in a subquery to join to:

   SELECT *
     FROM parent
     JOIN ( SELECT t1.*
              FROM child AS t1
         LEFT JOIN child AS t2
                ON (t1.parent_id = t2.parent_id and t1.datestamp < t2.datestamp)
             WHERE t2.datestamp IS NULL ) AS most_recent_children
       ON (parent.id = most_recent_children.parent_id

or join the parent table directly into it:

   SELECT parent.*, t1.*
     FROM parent
     JOIN child AS t1
       ON (parent.id = child.parent_id)
LEFT JOIN child AS t2
       ON (t1.parent_id = t2.parent_id and t1.datestamp < t2.datestamp)
    WHERE t2.datestamp IS NULL

Use this query as a basis Note that the CTE definition is not part of query-So the solution is simple

use test;
with parent as (
select 123 pid union all select 567 union all
select 125 union all 
select 789),
child as(
select 123 pid,CAST('1/12/2010' as DATE) stdt union all
select 123 ,CAST('1/15/2010' AS DATE) union all
select 567 ,CAST('5/12/2010' AS DATE) union all
select 567 ,CAST('6/15/2010' AS DATE) union all
select 125 ,CAST('4/15/2010' AS DATE) 
)
select pid,stdt from(
select a.pid,b.stdt,ROW_NUMBER() over(partition by a.pid order by b.stdt desc) selector
from parent as a
left outer join child as b
on a.pid=b.pid) as x
where x.selector=1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM