Optimizing stored procedure query for a table that contains 75 million records

Question

I have a table AFW_Coverage that contains 75 million rows. There is also another table AFW_BasicPolInfo that contains about 3 million rows.

I have written the following stored procedure to get records from the table:

CREATE PROCEDURE [ams360].[GetPolicyCoverages]
    @PageStart INT = 0,
    @PageSize INT = 50000,
    @RowVersion TIMESTAMP = NULL
AS
    SET NOCOUNT ON;

    ;WITH LatestCoverage AS
    (
        SELECT 
            PolId,
            MAX(EffDate) AS CoverageEffectiveDate 
        FROM 
            ams360.AFW_Coverage 
        GROUP BY 
            PolId
    ),
    Coverages AS
    (
        SELECT 
            cov.PolId,
            cov.LobId,
            cov.CoverageId,
            cov.EffDate, 
            cov.CoverageCode,
            cov.isCoverage,
            cov.FullTermPrem,
            cov.Limit1,
            cov.Limit2,
            cov.Limit3,
            cov.Deduct1,
            cov.Deduct2,
            cov.Deduct3,
            cov.ChangedDate,
            cov.RowVersion,
        FROM
            ams360.AFW_Coverage cov
        INNER JOIN
            LatestCoverage mcov ON cov.PolId = mcov.PolId
                                AND cov.EffDate = mcov.CoverageEffectiveDate
        WHERE
            cov.Status IN ('A', 'C')
    )
    SELECT
        BPI.PolId,
        BPI.PolEffDate,
        BPI.PolExpDate,
        BPI.PolTypeLOB,
        cov.LobId,
        cov.CoverageId,
        cov.EffDate,
        cov.CoverageCode,
        cov.isCoverage,
        cov.FullTermPrem,
        cov.Limit1,
        cov.Limit2,
        cov.Limit3,
        cov.Deduct1,
        cov.Deduct2,
        cov.Deduct3,
        cov.ChangedDate,
        cov.RowVersion,
    FROM 
        ams360.AFW_BasicPolInfo BPI 
    INNER JOIN 
        Coverages cov ON bpi.PolId = cov.PolId
    WHERE 
        BPI.Status IN ('A','C')
        AND BPI.PolTypeLOB IN ('Homeowners', 'Dwelling Fire')
        AND BPI.PolSubType = 'P'
        AND BPI.RenewalRptFlag IN ('A', 'R', 'I', 'N')
        AND GETDATE() BETWEEN BPI.PolEffDate AND BPI.PolExpDate
        AND (@RowVersion IS NULL OR cov.RowVersion > @RowVersion)
    GROUP BY 
        BPI.PolId,
        BPI.PolEffDate,
        BPI.PolExpDate,
        BPI.PolTypeLOB,
        cov.LobId,
        cov.CoverageId,
        cov.EffDate,
        cov.CoverageCode,
        cov.isCoverage,
        cov.FullTermPrem,
        cov.Limit1, cov.Limit2, cov.Limit3,
        cov.Deduct1, cov.Deduct2, cov.Deduct3,
        cov.ChangedDate,
        cov.RowVersion,
    ORDER BY 
        cov.RowVersion
    OFFSET 
        @PageStart ROWS
    FETCH NEXT 
        @PageSize ROWS ONLY
GO

However, I find that the above stored procedure is pegging the database at a 100% although I have added the following indexes which I see that they are used in the execution plan:

CREATE NONCLUSTERED INDEX [IX_AFW_Coverage_PolId_EffDate] 
ON [ams360].[AFW_Coverage] ([PolId] ASC, [EffDate] ASC)
            WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_AFW_Coverage_PolId_EffDate_Status_LobId_CoverageId] 
ON [ams360].[AFW_Coverage] ([PolId] ASC, [EffDate] ASC, [Status] ASC, [LobId] ASC, [CoverageId] ASC)
INCLUDE ([CoverageCode], [IsCoverage], [FullTermPrem], [Limit1], [Limit2],[Limit3], [Deduct1], [Deduct2], [Deduct3], [ChangedDate], [RowVersion]) 
        WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
GO

The execution time of the stored procedure varies anywhere between 6 mins to 20 mins or 50 mins (depending on the server traffic and usage)

My Question: How do I optimize this query in the stored procedure keeping in mind of the fact that the coverage table contains 75 million records? I am not a dba and I have no prior experience of optimizing slow running queries. Any insight on how to solve this problem would be helpful. Thanks in advance.

Answer 1

First, chaining common table expression may lead to complex execution plan. We want the plans to be simple and easy for the engine to optimize.

So, let's start with removing the first one:

DROP TABLE IF EXISTS #LatestCoverage;

CREATE TABLE #LatestCoverage
(
    PolId BIGINT PRIMARY KEY
   ,CoverageEffectiveDate DATETIME2(0)
);

INSERT INTO #LatestCoverage
SELECT 
    PolId,
    MAX(EffDate) AS CoverageEffectiveDate 
FROM 
    ams360.AFW_Coverage 
GROUP BY 
    PolId;

If there are many columns in the ams360.AFW_Coverage table an index on the queried columns may improved the performance:

CREATE INDEX IX_AFW_Coverage_EffDate  ON ams360.AFW_Coverage 
(
    polID
    ,EffDate            
)

Then, you are reading a lot of data that is lately cut. What you can try is to filter the data in advanced and then read the row details. Something like this:

DROP TABLE if exists #CoveragesFiltered 

CREATE TABLE #CoveragesFiltered
(
     PolId BIGINT PRIMARY KEY
    ,RowVersion ??
);

INSERT INTO #CoveragesFiltered
SELECT 
    cov.PolId,       
    cov.RowVersion,
FROM ams360.AFW_Coverage cov
INNER JOIN #LatestCoverage mcov 
    ON cov.PolId = mcov.PolId
    AND cov.EffDate = mcov.CoverageEffectiveDate
WHERE
    cov.Status IN ('A', 'C')
    AND BPI.Status IN ('A','C')
    AND BPI.PolTypeLOB IN ('Homeowners', 'Dwelling Fire')
    AND BPI.PolSubType = 'P'
    AND BPI.RenewalRptFlag IN ('A', 'R', 'I', 'N')
    AND GETDATE() BETWEEN BPI.PolEffDate AND BPI.PolExpDate
    AND (@RowVersion IS NULL OR cov.RowVersion > @RowVersion)
ORDER BY 
    cov.RowVersion
OFFSET 
    @PageStart ROWS
FETCH NEXT 
    @PageSize ROWS ONLY;

Here you can debug and optimize the filter query itself, creating indexes only for the columns you need.

Then, having the rows that need to be returned, extract their details - as we are using paging I believe it will performed well and cost less IO.

Answer 2

Based on the execution plans, your query only looks at less than 1% of rows from Coverage table since your are only interested in rows having latest EffDate . If possible, you can create a separate table to capture only the latest rows based on EffDate and use this table in your query instead of Coverage . You may want to insert into/update this new table whenever rows are inserted into/updated in Coverage table.

Answer 3

Without seeing execution plan, it is very difficult to tell the problem. Below are my suggestions:

I see that you are not having any indexes on table AFW_BasicPolInfo. You need to have indexes on them as well. If possible, create clustered index on PolId, as it seems like a unique, narrow, increasing, notnull column.
I see that you are not having clustered index on AFW_Coverage. I would suggest you to create clustered index on PolId, EffDate combination. I think it could be unique combination. Also, PolId being used in the JOINs, it could make the JOINS faster. It would also make the CTE faster.
I seriously doubt, whether you need GROUP By. If you need GROUP BY for sure then, try to have CTEs at the level of grouping you need and then JOIN them. GROUP BY could be very costly operation.

Optimizing stored procedure query for a table that contains 75 million records

Question

3 answers

solution1
2 2020-06-23 06:00:51

solution2
1 2020-06-23 13:19:28

solution3
0 2020-06-23 05:01:32

Optimizing stored procedure query for a table that contains 75 million records

Question

3 answers

solution1 2 2020-06-23 06:00:51

solution2 1 2020-06-23 13:19:28

solution3 0 2020-06-23 05:01:32

solution1
2 2020-06-23 06:00:51

solution2
1 2020-06-23 13:19:28

solution3
0 2020-06-23 05:01:32