简体   繁体   中英

How to speed up this TSQL query?

I have a TSQL select query that is running "slow"

SELECT 
    CustomerKey
    ,ProductKey
    ,RepresentativeKey
    ,ReportingDateKey   
    ,SUM(i.InvoiceQuantity) AS InvoiceQuantity
    ,SUM(i.InvoiceQuantityKg) AS InvoiceQuantityKg
    ,SUM(i.BrutoInvoiceLineAmount) AS BrutoInvoiceLineAmount
    ,SUM(i.EndOfYearDiscount) AS EndOfYearDiscount
    ,SUM(i.NettoInvoiceLineAmount) AS NettoInvoiceLineAmount
    ,SUM(i.TotalLineCostPrice) AS CostPrice
    ,SUM(i.MarginAmount) AS MarginAmount

FROM FactInvoices i

WHERE 
    i.DossierKey =2
    AND i.ReportingDate BETWEEN '2016-01-01' AND '2017-12-31'
GROUP BY
    CustomerKey
    ,ProductKey
    ,RepresentativeKey
    ,ReportingDateKey

I'm running the query in SSMS 32bit. Execution time is 17-21s, I have tested adding non clustered indexs on DossierKey and ReportingDate, but this is only slowing down the query.

The table has about 6.04M record and this result set is giving back 1M records. It's running on SQL 2016 Developers edition. Server specs: 8core 16gb ram and HDD => Virual server.

Looking at the execution plan, I can't find any improvements. How do I speed up? More hardware? But I don't think that will help because the server is not fully used when running this query.

Edit: Execution Plan: 在此处输入图片说明

Index:

CREATE NONCLUSTERED INDEX [_dx1]
ON [dbo].[FactInvoices] ([DossierKey],[ReportingDate])
INCLUDE ([CustomerKey],[ProductKey],[ReportingDateKey],[RepresentativeKey],[InvoiceQuantity],[InvoiceQuantityKg],[BrutoInvoiceLineAmount],[NettoInvoiceLineAmount],[MarginAmount],[EndOfYearDiscount],[TotalLineCostPrice])

Thanks.

For this query:

SELECT CustomerKey, ProductKey, RepresentativeKey, ReportingDateKey,
       . . .
FROM FactInvoices i
WHERE i.DossierKey = 2 AND
      i.ReportingDate BETWEEN '2016-01-01' AND '2017-12-31'
GROUP BY CustomerKey, ProductKey, RepresentativeKey, ReportingDateKey;

I would recommend an index on FactInvoices(DossierKey, ReportingDate, CustomerKey, ProductKey, RepresentativeKey) . The first two are the primary elements of the index used for the WHERE clause. The remaining three columns may be useful for the aggregation. You could also include all the additional columns used in the query.

This is an article I wrote on speeding up a query.

If your query is slow, you can check the execution plan for possible areas of speed up.

Well, I have done that and find that it does not always help. The same execution plan can take seconds to run or go off into never never land and be killed after 7 minutes.

I solved this recently using a variety of techniques that I haven't seen mentioned in one place before and wanted to help anyone else in the same situation. The solution usually returned within 2 seconds.

Here is what I did.

Starting Query

This is a fairly basic query. It reports sales orders and allows the user to specify up to 6 optional where criteria.

• If the user does not enter a criteria for a value, for example Country, its criteria string is set to '' and Country is not checked.

• If the user does enter a criteria for a value, its criteria string is bracketed by '%..%'. For example, if the user enters 'Tin', strCountry is set to '%Tin%' and all countries with 'Tin' in its name are selected. (Argentina and Martinique for example.)

SELECT Top 1000
    SalesHeader.InvoiceNumber
    ,SalesHeader.CompanyName
    ,SalesHeader.Street
    ,SalesHeader.City
    ,SalesHeader.Region
    ,SalesHeader.Country
    ,SalesHeader.SalesDate
    ,SalesHeader.InvoiceTotal
    ,SalesLineItem.LineItemNbr
    ,SalesLineItem.PartNumber
    ,SalesLineItem.Quantity
    ,SalesLineItem.UnitPrice
    ,SalesLineItem.Quantity * SalesLineItem.UnitPrice    as ExtPrice
    ,PartMaster.UnitWeight
    ,SalesLineItem.Quantity * PartMaster.UnitWeight      as ExtWeight
FROM dbo.SalesHeader 
left join dbo.SalesLineItem    on SalesHeader.InvoiceNumber    = SalesLineItem.InvoiceNumber
left join dbo.PartMaster       on SalesLineItem.PartNumber        = PartMaster.PartNumber
where
    (@strCountry = ''        or Country like @strCountry)
    and
    (@strCompanyName = ''    or CompanyName like @strCompanyName)
    and
    (@strPartNumber = ''     or SalesLineItem.PartNumber like @strPartNumber)
    and
    (@strInvoiceNumber = ''  or SalesHeader.InvoiceNumber like @strInvoiceNumber)
    and
    (@strRegion = ''         or Region like @strRegion)
    and
    (@mnyExtPrice = 0        or (SalesLineItem.Quantity * SalesLineItem.UnitPrice) > @mnyExtPrice)
Order By
    InvoiceNumber,
    Region,
    ExtPrice

I am taking this from a data warehouse I worked on. There were 260,000 records in the full query. We limited the return to 1,000 records as a user would never want more than that.

Sometimes the query would take 10 seconds or less and sometimes we would have to kill it after over 7 minutes had gone by. A user is not going to wait 7 minutes.

What We Came Up With

There are different techniques to speed up a query. The following is our resulting query. I go over each of the techniques used below.

This new query generally returned results in 2 seconds or less.

SELECT 
    InvoiceNumber
    ,Company
    ,Street
    ,City
    ,Region
    ,Country
    ,SalesDate
    ,InvoiceTotal
    ,LineItemNbr
    ,PartNumber
    ,Quantity
    ,UnitPrice
    ,ExtPrice
    ,UnitWeight
    ,ExtWeight
FROM 
(
    SELECT top 1000
        IdentityID,
        ROW_NUMBER() OVER (ORDER BY [SalesDate], [Country], [Company], [PartNumber]) as RowNbr
    FROM dbo.SalesCombined with(index(NCI_SalesDt))
    where
        (@strCountry = ''        or Country like @strCountry)
        and
        (@strCompany = ''        or Company like @strCompany)
        and
        (@strPartNumber = ''     or PartNumber like @strPartNumber)
        and
        (@strInvoiceNumber = ''  or InvoiceNumber like @strInvoiceNumber)
        and
        (@strRegion = ''         or Region like @strRegion)
        and
        (@mnyExtPrice = 0        or ExtPrice > @mnyExtPrice)
) SubSelect
Inner Join dbo.SalesCombined on SubSelect.IdentityID = SalesCombined.IdentityID
Order By
    RowNbr

Technique 1 - Denormalize the data.

I was fortunate in two ways:

• The data was small enough to create a second copy of it.

• The data did not change very often. This meant I could structure the second copy optimized for querying and allow updating to take a while.

The SalesHeader, SalesLineItem and PartMaster tables were merged into the single SalesCombined table.

The calculated values were stored in the SalesCombined table as well.

Note that I left the original tables in place. All code to update those tables was still valid. I had to create additional code to then propagate the changes to the SalesCombined table.

Technique 2 - Created An Integer Identity Value

The first field of this denormalized table is an integer identity value. This was called IdentityID.

Even if we had not denormalized the data, an integer identity value in SalesHeader could have been used for the join between it and SalesLineItem and speeded the original query up a bit.

Technique 3 - Created A Clustered Index On This Integer Identity Value

I created a clustered index on this IdentityID value. This is the fastest way to find a record.

Technique 4 - Created A Unique, Non-Clustered Index On The Sort Fields

The query's output is sorted on four fields, SalesDate, Country, Company, PartNumber. So I created an index on these fields SalesDate, Country, Company and PartNumber.

Then I added the IdentityID to this index. This index was noted as Unique. This allowed SQL Server to go from the sort fields to the address, essentially, of the actual record as quickly as possible.

Technique 5: Include In the Non-Clustered Index All 'Where Clause' Fields

A SQL Server index can include fields that are not part of the sort. (Who thought of this? It's a great idea.) If you include all where clause fields in the index, SQL Server does not have to look up the actual record to obtain this data.

This is the normal look up process: 1) Read the index from disk. 2) Go to the first entry on the index. 3) Find the address of the first record from that entry. 4) Read that record from disk. 5) Find any fields that are part of the where clause and apply the criteria. 6) Decide if that record is included in the query.

If you include the where clause fields in the index: 1) Read the index from disk. 2) Go to the first entry on the index. 3) Find any fields that are part of the where clause (stored in the index) and apply the criteria. 4) Decide if that record is included in the query.

CREATE UNIQUE NONCLUSTERED INDEX [NCI_InvcNbr] ON [dbo].[SalesCombined]
(
    [SalesDate] ASC,
    [Country] ASC,
    [CompanyName] ASC,
    [PartNumber] ASC,
    [IdentityID] ASC
)
INCLUDE [InvoiceNumber],
    [City],
    [Region],
    [ExtPrice]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
                IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, 
                ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]

The execution plan for the original query.

Click Here To See Original Query Execution Plan

The execution plan for our final query is much simpler - to start, it just reads the index.

Click Here To See Final Query Execution Plan

Technique 6: Created A Sub-Query To Find The IdentityID Of Each Record To Output And Their Sort Order

I created a sub-query to find the records to output and the order in which to output them. Note the following:

Technique 7 - It explicitly says to use the NCI_InvcNbr index that has all of the fields needed in it.

Technique 8 - It uses the Row_Number function to generate an integer for each row that will be output. These values are generated 1, 2 and 3 in the order given by the fields in the ORDER BY section of that line.

Technique 9: Create An Enclosing Query With All Of The Values

This query specifies the values to print. It uses the Row_Number values to know the order in which to print. Note that the inner join is done on the IdentityID field which uses the Clustered index to find each record to print.

Techniques That Did Not Help

There were two techniques that we tried that did not speed up the query. These statements are both added to the end of a query.

• OPTION (MAXDOP 1) limits the number of processors to one. This will prevent any parallelism from being done. We tried this when we were experimenting with the query and had parallelism in the execution plan.

• OPTION (RECOMPILE) causes the execution plan to be recreated every time the query is run. This can be useful when different user selections can vary the query results.

Hope this can be of use.

If you already indexed this query and it still with a bad performance, you can try partition your table by DossierKey.

and change

WHERE i.DossierKey = 2

to

WHERE $PARTITION.partition_function_name( 2) 

https://www.cathrinewilhelmsen.net/2015/04/12/table-partitioning-in-sql-server/

https://docs.microsoft.com/en-us/sql/t-sql/functions/partition-transact-sql

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM