简体   繁体   English

如何提高SQL脚本性能

[英]How to improve sql script performance

The following script is very slow when its run. 以下脚本在运行时非常慢。

I have no idea how to improve the performance of the script. 我不知道如何提高脚本的性能。 Even with a view takes more than quite a lot minutes. 即使有一个视图也要花费很多时间。 Any idea please share to me. 任何想法请分享给我。

SELECT DISTINCT
        ( id )
FROM    ( SELECT DISTINCT
                    ct.id AS id
          FROM      [Customer].[dbo].[Contact] ct
                    LEFT JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
          WHERE     hnci.customer_id IN (
                    SELECT DISTINCT
                            ( [Customer_ID] )
                    FROM    [Transactions].[dbo].[Transaction_Header]
                    WHERE   actual_transaction_date > '20120218' )
          UNION
          SELECT DISTINCT
                    contact_id AS id
          FROM      [Customer].[dbo].[Restaurant_Attendance]
          WHERE     ( created > '2012-02-18 00:00:00.000'
                      OR modified > '2012-02-18 00:00:00.000'
                    )
                    AND ( [Fifth_Floor_London] = 1
                          OR [Fourth_Floor_Leeds] = 1
                          OR [Second_Floor_Bristol] = 1
                        )
          UNION
          SELECT DISTINCT
                    ( ct.id )
          FROM      [Customer].[dbo].[Contact] ct
                    INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
                                                              AND startconnection > '2012-02-17'
          UNION
          SELECT DISTINCT
                    comdt.id AS id
          FROM      [Customer].[dbo].[Complete_dataset] comdt
                    LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
          WHERE     agsc.contact_id IS NULL
                    AND ( opt_out_Mail <> 1
                          OR opt_out_email <> 1
                          OR opt_out_SMS <> 1
                          OR opt_out_Mail IS NULL
                          OR opt_out_email IS NULL
                          OR opt_out_SMS IS NULL
                        )
                    AND ( address_1 IS NOT NULL
                          OR email IS NOT NULL
                          OR mobile IS NOT NULL
                        )
          UNION
          SELECT DISTINCT
                    ( contact_id ) AS id
          FROM      [Customer].[dbo].[VIP_Card_Holders]
          WHERE     VIP_Card_number IS NOT NULL
        ) AS tbl

Wow, where to start... 哇,从哪里开始...

--this distinct does nothing.  Union is already distinct
--SELECT DISTINCT
--        ( id )
--FROM    ( 
SELECT DISTINCT [Customer_ID] as ID
          FROM     [Transactions].[dbo].[Transaction_Header] 
               where actual_transaction_date > '20120218' )
          UNION
          SELECT 
                    contact_id AS id
          FROM      [Customer].[dbo].[Restaurant_Attendance]
-- not sure that you are getting the date range you want.  Should these be >= 
-- if you want everything that occurred on the 18th or after you want >= '2012-02-18 00:00:00.000'
-- if you want everything that occurred on the 19th or after you want >= '2012-02-19 00:00:00.000'
-- the way you have it now, you will get everything on the 18th unless it happened exactly at midnight
          WHERE     ( created > '2012-02-18 00:00:00.000'
                      OR modified > '2012-02-18 00:00:00.000'
                    )
                    AND ( [Fifth_Floor_London] = 1
                          OR [Fourth_Floor_Leeds] = 1
                          OR [Second_Floor_Bristol] = 1
                        )
-- all of this does nothing because we already have every id in the contact table from the first query
--          UNION
--          SELECT 
--                    ( ct.id )
--          FROM      [Customer].[dbo].[Contact] ct
--                    INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
--                                                             AND startconnection > '2012-02-17'
          UNION
-- cleaned this up with isnull function and coalesce
          SELECT 
                    comdt.id AS id
          FROM      [Customer].[dbo].[Complete_dataset] comdt
                    LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
          WHERE     agsc.contact_id IS NULL
                    AND ( isnull(opt_out_Mail,0) <> 1
                          OR isnull(opt_out_email,0) <> 1
                          OR isnull(opt_out_SMS,0) <> 1
                        )
                    AND coalesce(address_1 , email, mobile) IS NOT NULL
          UNION
          SELECT 
                    ( contact_id ) AS id
          FROM      [Customer].[dbo].[VIP_Card_Holders]
          WHERE     VIP_Card_number IS NOT NULL
--        ) AS tbl

As stated in a comment optimize one at a time. 如评论中所述,一次优化一个。 See which one takes the longest and focus on that one. 看看哪一个花费的时间最长,并专注于那一个。

union will remove duplicates so you don't need the distinct on the individual queries 联合将删除重复项,因此您无需在各个查询中使用唯一性

On you first I would try this: 首先,您可以尝试以下操作:

The left join is killed by the WHERE hnci.customer_id IN so you might as well have a join. 左联接被WHERE hnci.customer_id IN杀死,因此您不妨进行联接。

The sub-query is not efficient as cannot use an index on the IN. 子查询效率不高,因为无法在IN上使用索引。
The query optimizer does not know what in ( select .. ) will return so it cannot optimize use of indexes. 查询优化器不知道in(select ..)中将返回什么,因此它无法优化索引的使用。

SELECT ct.id AS id
  FROM [Customer].[dbo].[Contact] ct
  JOIN [Customer].[dbo].[Customer_ids] hnci 
    ON ct.id = hnci.contact_id
  JOIN [Transactions].[dbo].[Transaction_Header] th 
    on hnci.customer_id = th.[Customer_ID] 
   and th.actual_transaction_date > '20120218'

On that second join the query optimizer has the opportunity of which condition to apply first. 在第二个联接上,查询优化器将有机会先应用哪个条件。 Let say [Customer].[dbo].[Customer_ids].[customer_id] and [Transactions].[dbo].[Transaction_Header] each have indexes. 假设[Customer]。[dbo]。[Customer_ids]。[customer_id]和[Transactions]。[dbo]。[Transaction_Header]都有索引。 The query optimizer has the option to apply that before [Transactions].[dbo].[Transaction_Header].[actual_transaction_date]. 查询优化器可以选择在[Transactions]。[dbo]。[Transaction_Header]。[actual_transaction_date]之前应用该选项。 If [actual_transaction_date] is not indexed then for sure it would do the other ID join first. 如果未为[actual_transaction_date]编制索引,则可以确定它将首先执行其他ID联接。

With your in ( select ... ) the query optimizer has no option but to apply the actual_transaction_date > '20120218' first. 使用in(选择...)时,查询优化器别无选择,只能先应用actual_transaction_date>'20120218'。 OK some times query optimizer is smart enough to use an index inside the in outside the in but why make it hard for the query optimizer. 有时候查询优化器足够聪明,可以在in中使用in之外的索引,但是为什么对查询优化器来说很难。 I have found the query optimizer make better decisions if you make the decisions easier. 我发现,如果您使决策更容易,则查询优化器将做出更好的决策。

A join on a sub-query has the same problem. 子查询上的联接具有相同的问题。 You take options away from the query optimizer. 您可以从查询优化器中删除选项。 Give the query optimizer room to breathe. 给查询优化器腾出空间。

Where exists is generally faster than in as well. 通常,那里比那里更快。

Or conditions are generally slower as well, use more union statements instead. 或者条件通常也较慢,请改用更多的并集语句。 And learn to use left joins correctly. 并学习正确使用左联接。 If you have a where condition (other than where id is null) on the table on teh right side of a left join, it will convert to an inner join. 如果在左联接右侧的表上具有where条件(而不是where id为null),它将转换为内部联接。 If this is not what you want, then your code is currently giving you an incorrect result set. 如果这不是您想要的,则您的代码当前为您提供了错误的结果集。

See http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN for an explanation of how to fix. 有关如何修复的说明,请参见http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN

try this, temptable should help you: 试试这个,临时表应该可以帮助您:

    IF OBJECT_ID('Tempdb..#Temp1') IS NOT NULL 
        DROP TABLE #Temp1

    --Low perfomance because of using "WHERE  hnci.customer_id IN ( .... ) " - loop join must be
    --and this "where" condition will apply to two tables after left join, 
    --so result will be same as with two inner joints but with bad perfomance

    --SELECT DISTINCT
    --        ct.id AS id
    --INTO    #temp1
    --FROM    [Customer].[dbo].[Contact] ct
    --        LEFT JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
    --WHERE   hnci.customer_id IN (
    --        SELECT DISTINCT
    --                ( [Customer_ID] )
    --        FROM    [Transactions].[dbo].[Transaction_Header]
    --        WHERE   actual_transaction_date > '20120218' )    
    --------------------------------------------------------------------------------
    --this will give the same result but with better perfomance then previouse one
    --------------------------------------------------------------------------------
    SELECT DISTINCT
            ct.id AS id
    INTO    #temp1
    FROM    [Customer].[dbo].[Contact] ct
            JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
            JOIN ( SELECT DISTINCT
                            ( [Customer_ID] )
                   FROM     [Transactions].[dbo].[Transaction_Header]
                   WHERE    actual_transaction_date > '20120218'
                 ) T ON hnci.customer_id = T.[Customer_ID]
    --------------------------------------------------------------------------------
    --------------------------------------------------------------------------------              
    INSERT  INTO #temp1
            ( id
            )
            SELECT DISTINCT
                    contact_id AS id
            FROM    [Customer].[dbo].[Restaurant_Attendance]
            WHERE   ( created > '2012-02-18 00:00:00.000'
                      OR modified > '2012-02-18 00:00:00.000'
                    )
                    AND ( [Fifth_Floor_London] = 1
                          OR [Fourth_Floor_Leeds] = 1
                          OR [Second_Floor_Bristol] = 1
                        )
    INSERT  INTO #temp1
            ( id
            )
            SELECT DISTINCT
                    ( ct.id )
            FROM    [Customer].[dbo].[Contact] ct
                    INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
                                                                  AND startconnection > '2012-02-17'
    INSERT  INTO #temp1
            ( id
            )
            SELECT DISTINCT
                    comdt.id AS id
            FROM    [Customer].[dbo].[Complete_dataset] comdt
                    LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
            WHERE   agsc.contact_id IS NULL
                    AND ( opt_out_Mail <> 1
                          OR opt_out_email <> 1
                          OR opt_out_SMS <> 1
                          OR opt_out_Mail IS NULL
                          OR opt_out_email IS NULL
                          OR opt_out_SMS IS NULL
                        )
                    AND ( address_1 IS NOT NULL
                          OR email IS NOT NULL
                          OR mobile IS NOT NULL
                        )
    INSERT  INTO #temp1
            ( id
            )
            SELECT DISTINCT
                    ( contact_id ) AS id
            FROM    [Customer].[dbo].[VIP_Card_Holders]
            WHERE   VIP_Card_number IS NOT NULL

    SELECT DISTINCT
            id
    FROM    #temp1 AS T

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM