简体   繁体   中英

Selecting “latest” row (up to a date) from a table (Sql Server 2008)

I have quite a few stored procedures following the pattern of selecting the row for which a date column is the latest up to a certain date, inclusive. I see two forms in use:

select top 1 x, y, z from sometable where a=b and date <= @date order by date desc

or

select x, y, z from sometable where a=b and date=(select max(date) from sometable where a=b and date <= @date)

I can imagine a derivation of the second form that uses a join instead of a subquery too.

We can ignore the case where the second form may return multiple rows. Assume it never will.

Since this is used in a lot of places, some of which against large numbers of rows in performance critical code, I want to standardise on whichever is the more optimal solution (which may be some other suggestion).

Some googling has turned up numerous comparisons of TOP 1 vs MAX, but generally for a single value, and no subquery. In that case MAX is the clear winner, but I'm not sure if the subquery changes that.

I'd appreciate the views of those more knowledgable than I in this area (which should be most of you!).

Your results may vary depending on table design, but generally speaking, the TOP 1 / Order by technique is 2 times better when there is no index on date because SQL server has to do a scan for each query - first to find the max date, then to look up the rest of the values based on it. When there is an index on date (whether it covers the query or not) the plan is the same.

The most important thing to consider here is indexing. If this query is to be executed a lot, you'll want to make sure you index the date field.

Both from the standpoint of the TOP 1 being optimal in certain circumstances and the issue you already touched on: the MAX could return more than 1 row (don't assume it won't someday, by the way, unless there is a unique index on date), I definitely prefer the TOP 1 technique - it is the technique I use for all such queries.

The query optimizer has a lot of freedom, and it can execute both MAX or TOP 1 in various ways. Exactly what it does depends on the source query, the available indexes, and the statistics for your table, among other things. Tomorrow it might choose a different appreach, as the size of your table or its distribution change.

So I don't think there's one optimal solution. Wait for actual performance issues and optimize them one by one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM