简体   繁体   English

针对包含 7500 万条记录的表优化存储过程查询

[英]Optimizing stored procedure query for a table that contains 75 million records

I have a table AFW_Coverage that contains 75 million rows.我有一个包含 7500 万行的表AFW_Coverage There is also another table AFW_BasicPolInfo that contains about 3 million rows.还有另一个表AFW_BasicPolInfo包含大约 300 万行。

I have written the following stored procedure to get records from the table:我编写了以下存储过程来从表中获取记录:

CREATE PROCEDURE [ams360].[GetPolicyCoverages]
    @PageStart INT = 0,
    @PageSize INT = 50000,
    @RowVersion TIMESTAMP = NULL
AS
    SET NOCOUNT ON;

    ;WITH LatestCoverage AS
    (
        SELECT 
            PolId,
            MAX(EffDate) AS CoverageEffectiveDate 
        FROM 
            ams360.AFW_Coverage 
        GROUP BY 
            PolId
    ),
    Coverages AS
    (
        SELECT 
            cov.PolId,
            cov.LobId,
            cov.CoverageId,
            cov.EffDate, 
            cov.CoverageCode,
            cov.isCoverage,
            cov.FullTermPrem,
            cov.Limit1,
            cov.Limit2,
            cov.Limit3,
            cov.Deduct1,
            cov.Deduct2,
            cov.Deduct3,
            cov.ChangedDate,
            cov.RowVersion,
        FROM
            ams360.AFW_Coverage cov
        INNER JOIN
            LatestCoverage mcov ON cov.PolId = mcov.PolId
                                AND cov.EffDate = mcov.CoverageEffectiveDate
        WHERE
            cov.Status IN ('A', 'C')
    )
    SELECT
        BPI.PolId,
        BPI.PolEffDate,
        BPI.PolExpDate,
        BPI.PolTypeLOB,
        cov.LobId,
        cov.CoverageId,
        cov.EffDate,
        cov.CoverageCode,
        cov.isCoverage,
        cov.FullTermPrem,
        cov.Limit1,
        cov.Limit2,
        cov.Limit3,
        cov.Deduct1,
        cov.Deduct2,
        cov.Deduct3,
        cov.ChangedDate,
        cov.RowVersion,
    FROM 
        ams360.AFW_BasicPolInfo BPI 
    INNER JOIN 
        Coverages cov ON bpi.PolId = cov.PolId
    WHERE 
        BPI.Status IN ('A','C')
        AND BPI.PolTypeLOB IN ('Homeowners', 'Dwelling Fire')
        AND BPI.PolSubType = 'P'
        AND BPI.RenewalRptFlag IN ('A', 'R', 'I', 'N')
        AND GETDATE() BETWEEN BPI.PolEffDate AND BPI.PolExpDate
        AND (@RowVersion IS NULL OR cov.RowVersion > @RowVersion)
    GROUP BY 
        BPI.PolId,
        BPI.PolEffDate,
        BPI.PolExpDate,
        BPI.PolTypeLOB,
        cov.LobId,
        cov.CoverageId,
        cov.EffDate,
        cov.CoverageCode,
        cov.isCoverage,
        cov.FullTermPrem,
        cov.Limit1, cov.Limit2, cov.Limit3,
        cov.Deduct1, cov.Deduct2, cov.Deduct3,
        cov.ChangedDate,
        cov.RowVersion,
    ORDER BY 
        cov.RowVersion
    OFFSET 
        @PageStart ROWS
    FETCH NEXT 
        @PageSize ROWS ONLY
GO

However, I find that the above stored procedure is pegging the database at a 100% although I have added the following indexes which I see that they are used in the execution plan:但是,我发现上述存储过程将数据库固定在 100%,尽管我添加了以下索引,我看到它们在执行计划中使用:

CREATE NONCLUSTERED INDEX [IX_AFW_Coverage_PolId_EffDate] 
ON [ams360].[AFW_Coverage] ([PolId] ASC, [EffDate] ASC)
            WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_AFW_Coverage_PolId_EffDate_Status_LobId_CoverageId] 
ON [ams360].[AFW_Coverage] ([PolId] ASC, [EffDate] ASC, [Status] ASC, [LobId] ASC, [CoverageId] ASC)
INCLUDE ([CoverageCode], [IsCoverage], [FullTermPrem], [Limit1], [Limit2],[Limit3], [Deduct1], [Deduct2], [Deduct3], [ChangedDate], [RowVersion]) 
        WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
GO

The execution time of the stored procedure varies anywhere between 6 mins to 20 mins or 50 mins (depending on the server traffic and usage)存储过程的执行时间在 6 分钟到 20 分钟或 50 分钟之间变化(取决于服务器流量和使用情况)

My Question: How do I optimize this query in the stored procedure keeping in mind of the fact that the coverage table contains 75 million records?我的问题:如何在存储过程中优化此查询,同时记住覆盖表包含 7500 万条记录这一事实? I am not a dba and I have no prior experience of optimizing slow running queries.我不是 dba,也没有优化运行缓慢的查询的经验。 Any insight on how to solve this problem would be helpful.有关如何解决此问题的任何见解都会有所帮助。 Thanks in advance.提前致谢。

First, chaining common table expression may lead to complex execution plan.首先,链接公用表表达式可能会导致复杂的执行计划。 We want the plans to be simple and easy for the engine to optimize.我们希望计划简单易行,便于引擎优化。

So, let's start with removing the first one:所以,让我们从删除第一个开始:

DROP TABLE IF EXISTS #LatestCoverage;

CREATE TABLE #LatestCoverage
(
    PolId BIGINT PRIMARY KEY
   ,CoverageEffectiveDate DATETIME2(0)
);

INSERT INTO #LatestCoverage
SELECT 
    PolId,
    MAX(EffDate) AS CoverageEffectiveDate 
FROM 
    ams360.AFW_Coverage 
GROUP BY 
    PolId;

If there are many columns in the ams360.AFW_Coverage table an index on the queried columns may improved the performance:如果ams360.AFW_Coverage表中有很多列,则查询列上的索引可能会提高性能:

CREATE INDEX IX_AFW_Coverage_EffDate  ON ams360.AFW_Coverage 
(
    polID
    ,EffDate            
)

Then, you are reading a lot of data that is lately cut.然后,您正在阅读大量最近被剪切的数据。 What you can try is to filter the data in advanced and then read the row details.您可以尝试高级过滤数据,然后读取行详细信息。 Something like this:像这样的东西:

DROP TABLE if exists #CoveragesFiltered 

CREATE TABLE #CoveragesFiltered
(
     PolId BIGINT PRIMARY KEY
    ,RowVersion ??
);

INSERT INTO #CoveragesFiltered
SELECT 
    cov.PolId,       
    cov.RowVersion,
FROM ams360.AFW_Coverage cov
INNER JOIN #LatestCoverage mcov 
    ON cov.PolId = mcov.PolId
    AND cov.EffDate = mcov.CoverageEffectiveDate
WHERE
    cov.Status IN ('A', 'C')
    AND BPI.Status IN ('A','C')
    AND BPI.PolTypeLOB IN ('Homeowners', 'Dwelling Fire')
    AND BPI.PolSubType = 'P'
    AND BPI.RenewalRptFlag IN ('A', 'R', 'I', 'N')
    AND GETDATE() BETWEEN BPI.PolEffDate AND BPI.PolExpDate
    AND (@RowVersion IS NULL OR cov.RowVersion > @RowVersion)
ORDER BY 
    cov.RowVersion
OFFSET 
    @PageStart ROWS
FETCH NEXT 
    @PageSize ROWS ONLY;

Here you can debug and optimize the filter query itself, creating indexes only for the columns you need.在这里,您可以调试和优化过滤器查询本身,只为您需要的列创建索引。

Then, having the rows that need to be returned, extract their details - as we are using paging I believe it will performed well and cost less IO.然后,有需要返回的行,提取它们的详细信息 - 因为我们正在使用分页,我相信它会表现良好并且成本更低 IO。

Based on the execution plans, your query only looks at less than 1% of rows from Coverage table since your are only interested in rows having latest EffDate .根据执行计划,您的查询仅查看Coverage表中不到 1% 的行,因为您只对具有最新EffDate的行感兴趣。 If possible, you can create a separate table to capture only the latest rows based on EffDate and use this table in your query instead of Coverage .如果可能,您可以创建一个单独的表以仅捕获基于EffDate的最新行,并在查询中使用此表而不是Coverage You may want to insert into/update this new table whenever rows are inserted into/updated in Coverage table.每当在Coverage表中插入/更新行时,您可能希望插入/更新这个新表。

Without seeing execution plan, it is very difficult to tell the problem.没有看到执行计划,很难说出问题。 Below are my suggestions:以下是我的建议:

  • I see that you are not having any indexes on table AFW_BasicPolInfo.我看到您在表 AFW_BasicPolInfo 上没有任何索引。 You need to have indexes on them as well.您还需要对它们进行索引。 If possible, create clustered index on PolId, as it seems like a unique, narrow, increasing, notnull column.如果可能,请在 PolId 上创建聚集索引,因为它看起来是唯一的、窄的、递增的、非空列。

  • I see that you are not having clustered index on AFW_Coverage.我看到您在 AFW_Coverage 上没有聚集索引。 I would suggest you to create clustered index on PolId, EffDate combination.我建议您在 PolId、EffDate 组合上创建聚集索引。 I think it could be unique combination.我认为这可能是独特的组合。 Also, PolId being used in the JOINs, it could make the JOINS faster.此外,在 JOIN 中使用 PolId,它可以使 JOINS 更快。 It would also make the CTE faster.它还会使 CTE 更快。

  • I seriously doubt, whether you need GROUP By.我严重怀疑,你是否需要 GROUP By。 If you need GROUP BY for sure then, try to have CTEs at the level of grouping you need and then JOIN them.如果您确定需要 GROUP BY,请尝试将 CTE 设置为您需要的分组级别,然后加入它们。 GROUP BY could be very costly operation. GROUP BY 可能是非常昂贵的操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM