簡體   English   中英

在SQL Server 2012中正確使用MAX聚合窗口功能

[英]Using the MAX aggregation window function correctly in SQL Server 2012

我有一個日志表,其中記錄了各種后台作業的運行歷史記錄。

現在,我需要顯示每個作業的最新運行以及一些數據。

這是我的解決方案:

SELECT BackgroundJobId, bjl.LogId, ExecStartTime, ExecEndTime, ErrorDescription, Debug
FROM BackgroundJobLog bjl
JOIN (
    SELECT LogId, ROW_NUMBER() OVER (PARTITION BY BackgroundJobId ORDER BY ExecStartTime DESC) rowNumber
    FROM BackgroundJobLog
    WHERE BackgroundJobStatusId IN (1, 3)
) AS bjl2 ON bjl.LogId = bjl2.LogId AND bjl2.rowNumber = 1

它按預期返回157行,每行包含一個不同的BackgroundJobId以及該作業最近一次運行的信息。

但是,性能是一個問題。 現在,該日志表大約有25,000,000行滿足嵌套的SELECT語句。 當我需要的是具有最新ExecStartTime的行時,加入25,000,000行似乎是很浪費的。

因此,我認為可以使用MAX聚合窗口功能。 但是對於我的一生,我不知道如何。 以下語句:

SELECT BackgroundJobId, LogId, MAX(ExecStartTime) OVER (PARTITION BY BackgroundJobId) ExecStartTime
FROM BackgroundJobLog
WHERE BackgroundJobStatusId IN (1, 3)

嘗試返回相同的25,000,000行。 的確,對於相同的BackgroundJobId將返回最新的ExecStartTime值,但它會重復多少次,就像存在具有相同BackgroundJobId行一樣! 當然,每一行都有自己的LogId 而我只希望在同一BackgroundJobId具有最新ExecStartTime的行。

我如何有效地做到這一點?

編輯

伙計們,嵌套選擇是一個嵌套選擇。 無論是顯式加入還是作為CTE或直接選擇,都幾乎沒有區別。 只要有嵌套選擇,性能就很差。

編輯2

BackgroundJobStatusId上有一個索引:

CREATE NONCLUSTERED INDEX IX_BackgroundJobLog_BackgroundJobStatusId ON [BackgroundJobLog] ([BackgroundJobStatusId]) INCLUDE ([LogId],[BackgroundJobId],[ExecStartTime])

編輯3

該表的架構為:

CREATE TABLE BackgroundJobLog
(
    LogId uniqueidentifier NOT NULL,
    BackgroundJobId int NOT NULL,
    ExecStartTime datetime NULL,
    ExecEndTime datetime NULL,
    ErrorDescription ntext NULL,
    BackgroundJobStatusId int NOT NULL,
    Debug ntext NULL,
    LogEntryId int IDENTITY(1,1) NOT NULL
    CONSTRAINT PK_LogEntryId PRIMARY KEY CLUSTERED (LogEntryId),
    CONSTRAINT IX_BackgroundJobLog UNIQUE NONCLUSTERED (LogId)
)

編輯4

請在下方找到Hamlet Hakobyan的回答的執行計划: 在此處輸入圖片說明

編輯5

請在下面的執行計划中找到基里爾·佐林的答案: 在此處輸入圖片說明

為了使此查詢快速運行,您需要做兩件事:

  • 不同的BackgroundJobId的列表

  • BackgroundJobLog (BackgroundJobId, ExecStartTime) INCLUDE (BackgroundJobStatusId)上的復合索引BackgroundJobLog (BackgroundJobId, ExecStartTime) INCLUDE (BackgroundJobStatusId)

如果您有一個單獨的帶有作業的表,請使用它:

SELECT  bl.*
FROM    job
CROSS APPLY
        (
        SELECT  TOP 1
                *
        FROM    BackgroundJobLog
        WHERE   BackgroundJobId = job.id
                AND BackgroundJobStatusId IN (1, 3)
        ORDER BY
                ExecStartTime DESC
        ) bl

如果沒有,則可以創建索引視圖以獲取這樣的列表:

CREATE VIEW job
WITH SCHEMABINDING
AS
SELECT  backgroundJobId, COUNT_BIG(*) cnt
FROM    BackgroundJobLog
GROUP BY
        backgroundJobId
GO

CREATE UNIQUE CLUSTERED INDEX
        ux_job
ON      job (backgroundJobId)
GO

然后重復上一個查詢,添加NOEXPAND

SELECT  bl.*
FROM    job WITH (NOEXPAND)
CROSS APPLY
        (
        SELECT  TOP 1
                *
        FROM    BackgroundJobLog
        WHERE   BackgroundJobId = job.id
                AND BackgroundJobStatusId IN (1, 3)
        ORDER BY
                ExecStartTime DESC
        ) bl

或者,您可以在CTE中建立這樣的列表:

WITH    job (id) AS
        (
        SELECT  MIN(BackgroundJobId)
        FROM    BackgroundJobLog
        UNION ALL
        SELECT  (
                SELECT  backgroundJobId
                FROM    (
                        SELECT  backgroundJobId,
                                ROW_NUMBER() OVER (ORDER BY backgroundJobId) rn
                        FROM    BackgroundJobLog bl
                        WHERE   bl.backgroundJobId > job.id
                        ) q
                WHERE   rn = 1
                )
        FROM    job
        WHERE   id IS NOT NULL
        )
SELECT  bl.*
FROM    job
CROSS APPLY
        (
        SELECT  TOP 1
                *
        FROM    BackgroundJobLog
        WHERE   BackgroundJobId = job.id
                AND BackgroundJobStatusId IN (1, 3)
        ORDER BY
                ExecStartTime DESC
        ) bl
WHERE   job.id IS NOT NULL

我認為沒有必要加入。

;WITH CTE
AS
(
    SELECT *,
       ROW_NUMBER() OVER (PARTITION BY BackgroundJobId ORDER BY ExecStartTime DESC) rn
    FROM BackgroundJobLog
    WHERE BackgroundJobStatusId IN (1, 3)
)
SELECT BackgroundJobId
      , LogId
      , ExecStartTime
      , ExecEndTime
      , ErrorDescription
      , Debug
FROM CTE
WHERE rn = 1

令我驚訝的是,您應該能夠完全避開聯接,方法是在內部查詢中選擇所需的所有內容,再加上行號,在外部查詢中選擇除行號以外的所有內容,然后將where子句更改為僅包含行號。 那么,這種執行計划看起來更好嗎?

SELECT BackgroundJobId, LogId, ExecStartTime, ExecEndTime, ErrorDescription, Debug
FROM (
    SELECT BackgroundJobId, LogId, ExecStartTime, ExecEndTime, ErrorDescription, Debug
        , ROW_NUMBER() OVER (PARTITION BY BackgroundJobId ORDER BY ExecStartTime DESC) rowNumber
    FROM BackgroundJobLog
    WHERE BackgroundJobStatusId IN (1, 3)
    ) bjl
WHERE rowNumber = 1

如果要使用MAX(),可以嘗試以下操作:

SELECT BackgroundJobId, LogId
FROM
(
SELECT BackgroundJobId, LogId, ExecStartTime, MAX(ExecStartTime) OVER (PARTITION BY BackgroundJobId) MaxExecStartTime
FROM BackgroundJobLog
WHERE BackgroundJobStatusId IN (1, 3)
)
WHERE ExecStartTime = MaxExecStartTime

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM