简体   繁体   English

在大型SQL Server表中查询最新版本的最快方法?

[英]Fastest way to query latest version in large SQL Server table?

What is the fastest way to query for "latest version" on a large SQL table using an update timestamp in SQL Server? 使用SQL Server中的更新时间戳在大型SQL表上查询“最新版本”的最快方法是什么?

I'm currently using the inner join approach, on very large SQL Server weather forecast table by Date, City, Hour, Temperature, UpdateTimestamp. 我目前在按日期,城市,小时,温度,UpdateTimestamp的超大型SQL Server天气预报表上使用inner join联接方法。 To get the latest temperature forecast, I created a view using inner join on Date, City, and Hour + max(UpdateTimestamp), such as in this other posting . 为了获得最新的温度预测,我在Date,City和Hour + max(UpdateTimestamp)上使用inner join创建了一个视图,例如在此其他发布中

However as the dataset is growing on the original table, the view query is getting slower and slower over time. 但是,随着数据集在原始表上的增长,视图查询随着时间而变得越来越慢。

Wonder if others have encountered similar situation, and what's the best way to speed up this query (one alternative solution I'm considering is having a stored procedure run each day creating a separate table of the "latest version" only, which then will be very quick to access). 想知道其他人是否也遇到过类似情况,以及加快此查询速度的最佳方法是什么(我正在考虑的另一种解决方案是每天运行一个存储过程,仅创建一个单独的“最新版本”表,然后将其快速访问)。

EDIT 4/4 - I've found the best solution so far (thanks Vikram) was to add a clustered index to my table on 3 fields "TSUnix", "CityId", "DTUnix", which sped up performance by ~4x (from 25 seconds to 4 seconds) 编辑4/4-到目前为止,我发现最好的解决方案(感谢Vikram)是在3个字段“ TSUnix”,“ CityId”,“ DTUnix”的表上添加聚集索引,这使性能提高了约4倍(从25秒到4秒)

Also I've tried to use row_number solution (below query sample) , although appears bit slower than the "inner join" approach. 我也尝试过使用row_number解决方案(在查询示例下面),尽管它看起来比“内部联接”方法要慢一些。 Both queries + index creation are below : 查询和索引创建都在下面:

Index Creation: 索引创建:

USE [<My DB>]
GO
CREATE NONCLUSTERED INDEX [index_WeatherForecastData]
ON [dbo].[<WeatherForecastData>] ([TSUnix], [CityId], [DTUnix])
INCLUDE ([Temperature], [TemperatureMin], [TemperatureMax], [Humidity], [WindSpeed], [Rain], [Snow])
GO

Query: 查询:

-- Inner Join Version

SELECT W.TSUnix, W.CityId, W.DTUnix, W.Temperature, W.*

FROM WeatherForecastData W

INNER JOIN (
    SELECT max(TSUnix) Latest, CityId, DTUnix 
    FROM WeatherForecastData 
    GROUP BY CityId, DTUnix
    ) L
    ON L.Latest = W.TSUnix
    AND L.CityID = W.CityID
    AND L.DTUnix = W.DTUnix

-- Row Number Version

SELECT W.TSUnix, W.CityId, W.DTUnix, W.Temperature, W.*

FROM 
    (
    select 
        *, ROW_NUMBER() over (partition by DTUnix, CityId order by TSUnix desc) as RowNumber
    from WeatherForecastData
    ) W

WHERE
    W.RowNumber = 1

Thanks! 谢谢!

Use ROW_NUMBER with an index as shown below. 如下所示,将ROW_NUMBER与索引一起使用。

The specific index that will make this fast is an index that has Date, City, Hour and UpdateTimestamp descending. 可以使速度更快的特定索引是日期,城市,小时和UpdateTimestamp降序的索引。 This requires a single pass over the table rather than multiple passes an INNER JOIN would likely require. 这需要在表上进行一次传递,而不是可能需要INNER JOIN进行多次传递。

Working code: http://sqlfiddle.com/#!18/8c0b4/1 工作代码: http : //sqlfiddle.com/#!18/8c0b4/1

SELECT Date, City, Hour, Temperature 
FROM
    (SELECT 
         Date, City, Hour, Temperature,
         ROW_NUMBER() OVER(PARTITION BY Date, City, Hour
                           ORDER BY UpdateTimestamp DESC) AS RowNumber
     FROM
         Test) AS t  
WHERE 
    t.RowNumber = 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM