简体   繁体   English

SQL 2005查询优化

[英]SQL 2005 Query Optimisation

I have a SQL 2005 table consisting of around 10million records (dbo.Logs). 我有一个SQL 2005表,包含大约一千万条记录(dbo.Logs)。

I have another table, dbo.Rollup that matches distinct dbo.Logs.URL to a FileId column in a third table, dbo.Files. 我有另一个表dbo.Rollup,该表将不同的dbo.Logs.URL匹配到第三个表dbo.Files中的FileId列。 The dbo.Rollup table forms the basis of various aggregate reports we run at a later stage. dbo.Rollup表构成了我们稍后阶段运行的各种汇总报告的基础。

Suffice to say for now, the problem I am having is in populating dbo.Rollup efficiently. 到目前为止,我遇到的问题是有效填充dbo.Rollup。

By definition, dbo.Logs has potentially tens of thousands of rows which all share the same URL field value. 根据定义,dbo.Logs可能有成千上万的行,它们共享相同的URL字段值。 In our application, one URL can be matched to one dbo.Files.FileId. 在我们的应用程序中,一个URL可以与一个dbo.Files.FileId匹配。 IE There is a many-to-one relationship between dbo.Logs.URL and dbo.Files.FileId (we parse the values of dbo.Logs to determine what the appropriate FileId is for a given URL). IE dbo.Logs.URL和dbo.Files.FileId之间存在多对一的关系(我们解析dbo.Logs的值来确定给定URL的适当FileId)。

My goal is to significantly reduce the amount of time it takes the first of three stored procedures that run in order to create meaningful statistics from our raw log data. 我的目标是显着减少运行三个存储过程中的第一个所花费的时间,以便从原始日志数据中创建有意义的统计信息。

What I need is a specific example of how to refactor this SQL query to be much more efficient: 我需要的是一个具体示例,说明如何将这个SQL查询重构为更加有效的方法:

sp-Rollup-Step1: sp-Rollup-Step1:

INSERT INTO dbo.Rollup ([FileURL], [FileId])

SELECT 
 logs.RequestedFile As [URL],
 FileId = dbo.fn_GetFileIdFromURL(l.RequestedFile, l.CleanFileName)

FROM
 dbo.Logs l (readuncommitted) 

WHERE    

NOT EXISTS (
    SELECT
     FileURL
    FROM
     dbo.Rollup
    WHERE
     FileUrl = RequestedFile
)

fn_GetFileIdFromURL() : fn_GetFileIdFromURL()

CREATE FUNCTION [dbo].[fn_GetFileIdFromURL] 
(       
    @URL nvarchar(500),
    @CleanFileName nvarchar(255)
)
RETURNS uniqueidentifier
AS
BEGIN

     DECLARE @id uniqueidentifier

     if (exists(select FileURL from dbo.[Rollup] where [FileUrl] = @URL))
     begin
        -- This URL has been seen before in dbo.Rollup.
            -- Retrieve the FileId from the dbo.Rollup table.
        set @id = (select top 1 FileId from dbo.[Rollup] where [FileUrl] = @URL)        
     end
     else
     begin
        -- This is a new URL. Hunt for a matching URL in our list of files,
            -- and return a FileId if a match is found.
        Set @id = (

            SELECT TOP 1
            f.FileId

            FROM
            dbo.[Files] f

            INNER JOIN
            dbo.[Servers] s on s.[ServerId] = f.[ServerId]

            INNER JOIN
            dbo.[URLs] u on 
                   u.[ServerId] = f.[ServerId]

            WHERE
                Left(u.[PrependURLProtocol],4) = left(@URL, 4)
            AND @CleanFileName = f.FileName  
     )

     end

     return @id

END

Key considerations: 关键注意事项:

  • dbo.Rollup should contain only one entry for each DISTINCT/unique URL found in dbo.tLogs. 对于dbo.tLogs中找到的每个DISTINCT /唯一URL,dbo.Rollup应该仅包含一个条目。
  • I would like to omit records from being inserted into dbo.[Rollup] where the FileId is NULL. 我想省略将记录插入dbo。[Rollup]的FileId为NULL的记录。

In my own observations, it seems the slowest part of the query by far is in the stored procedure: the "NOT EXISTS" clause (I am not sure at this point whether that continually refreshes the table or not). 根据我自己的观察,到目前为止,查询中最慢的部分似乎是存储过程:“ NOT EXISTS”子句(在这一点上,我不确定这是否会持续刷新表)。

I'm looking for a specific solution (with examples using either pseudo-code or by modifying my procedures shown here) - answer will be awarded to those who provide it! 我正在寻找一种特定的解决方案(使用伪代码或通过修改此处显示的过程提供示例)-答案将被授予提供该解决方案的人!

Thanks in advance for any assistance you can provide. 在此先感谢您提供的任何帮助。

/Richard. /理查德。

Short answer is you have a CURSOR here. 简短的答案是您在这里有一个游标。 The scalar UDF is run per row of output. 每行输出都会运行标量UDF。

The udf could be 2 LEFT JOINs onto derived tables. udf可以是派生表上的2个LEFT JOIN。 A rough outline: 粗略的轮廓:

...
COALESCE (F.xxx, L.xxx) --etc
...
FROM
 dbo.Logs l (readuncommitted)
 LEFT JOIN
 (select DISTINCT --added after comment
FileId, FileUrl from dbo.[Rollup]) R ON L.FileUrl = R.FileUrl
 LEFT JOIN
 (SELECT DISTINCT --added after comment
                f.FileId,
FileName ,
left(@PrependURLProtocol, 4) + '%' AS Left4
                FROM
                dbo.[Files] f

                INNER JOIN
                dbo.[Servers] s on s.[ServerId] = f.[ServerId]

                INNER JOIN
                dbo.[URLs] u on 
                           u.[ServerId] = f.[ServerId]
) F ON L.CleanFileName = R.FileName AND L.FileURL LIKE F.Left4
...

I'm also not sure if you need the NOT EXISTS because of how the udf works. 由于udf的工作方式,我也不确定您是否需要NOT EXISTS。 If you do, make sure the columns are indexed. 如果这样做,请确保对列进行索引。

I think your hotspot is located here: 我认为您的热点位于:

Left(u.[PrependURLProtocol],4) = left(@URL, 4)

This will cause the server to do a scan on the url table. 这将导致服务器对url表进行扫描。 You should not use a function on a field in a join clause. 您不应在join子句中的字段上使用函数。 try to rewrite that to something like 尝试将其重写为类似

... where PrependURLProtocol like left(@URL, 4) +"%"

And make sure you have an index on the field. 并确保您在该字段上有一个索引。

INSERT INTO dbo.Rollup ([FileURL], [FileId])
SELECT  
 logs.RequestedFile As [URL], 
 FileId = dbo.fn_GetFileIdFromURL(l.RequestedFile, l.CleanFileName)
FROM dbo.Logs l (readuncommitted) LEFT OUTER JOIN dbo.Rollup
 on FileUrl = RequestedFile
WHERE FileUrl IS NULL

The logic here is that if dbo.Rollup does not exist for the given FileUrl, then the left outer join will turn up null. 这里的逻辑是,如果给定的FileUrl不存在dbo.Rollup,则左外部联接将变为null。 The NOT EXISTS now becomes an IS NULL, which is faster. 现在,NOT EXISTS变为IS NULL,这更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM