具有一对多关系的多表联接

Question

Using SQL Server 2008. 使用SQL Server 2008。

I have multiple Locations which each contain multiple Departments which each contain multiple Items which can have zero to many Scans. 我有多个位置，每个位置包含多个部门，每个部门包含多个项目，这些项目的扫描次数可能为零。 Each Scan relates to a specific Operation which may or may not have a cutoff time. 每次扫描都涉及一个特定的操作，该操作可能有也可能没有截止时间。 Each Item also belongs to a specific Package which belongs to a specific Job. 每个项目还属于一个特定的包，该特定的包属于一个特定的作业。 Each job contains one or more Packages which contains one or more items. 每个作业包含一个或多个包含一个或多个项目的程序包。

+=============+                         +=============+
|  Locations  |                         |     Jobs    |
+=============+                         +=============+
      ^                                       ^
      |                                       |
+=============+     +=============+     +=============+
| Departments | <-- |    Items    | --> |   Packages  |
+=============+     +=============+     +=============+
                          ^
                          |
                    +=============+     +=============+
                    |    Scans    | --> | Operations  |
                    +=============+     +=============+

What I am attempting to do is get a count of Scans for a Job grouped by Location and Scan date. 我想做的是获取按位置和扫描日期分组的作业扫描计数。 The tricky part is that I only want to count the first Scan by date/time per Item where the cutoff time for the Operation is not null. 棘手的部分是，我只想按操作的截止时间不为空的每个项目的日期/时间来计算第一次扫描。 (NOTE: the scans definitely will NOT be in date/time order in the table.) （注意：扫描肯定不会按表中的日期/时间顺序进行。）

The query I have is getting me the correct results but it is painfully slow when the number of Items for a Job reaches 75,000 or so. 我的查询为我提供了正确的结果，但是当一项作业的项目数达到75,000左右时，它的速度将非常缓慢。 I am pushing for a new server -- I know our hardware is lacking -- but I am wondering if there is something I am doing in the query that is bogging it down as well. 我正在努力寻找新服务器-我知道我们的硬件不足-但我想知道查询中是否正在做某些事情，也正在使它陷入困境 。

From what little I can glean from the execution plan, most of the cost of the query seems to be in the sub-query to find the first Scan for each Item. 从执行计划中我所能收集的几乎没有，查询的大部分成本似乎都在子查询中以查找每个项目的第一个扫描。 It does an index scan (0%) on an Operations table index (ID, Cutoff) and then a lazy spool (19%). 它对操作表索引（ID，截止）进行索引扫描（0％），然后对懒惰假脱机（19％）进行索引扫描。 It does an index seek (61%) on a Scans table index (ItemID, DateTime, OperationID, ID). 它对扫描表索引（ItemID，DateTime，OperationID，ID）进行索引搜索（占61％）。 The subsequent nested loops (inner join) is only 2% and the Top operator is 0%. 随后的嵌套循环（内部联接）仅为2％，Top运算符为0％。 (Not that I really understand much of what I just typed but I am trying to provide as much info as possible...) （并不是说我真的很了解我刚才键入的内容，但我正在尝试提供尽可能多的信息...）

Here is the query: 这是查询：

SELECT
    Departments.LocationID
    , DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime))
    , COUNT(Scans.ItemID) AS [COUNT]
FROM
    Items           
    INNER JOIN Scans
        ON Scans.ID = 
    (
        SELECT TOP 1
            Scans.ID 
        FROM
            Scans
        INNER JOIN Operations
            ON Scans.OperationID = Operations.ID
        WHERE
            Operations.Cutoff IS NOT NULL
            AND Scans.ItemID = Items.ID             
        ORDER BY
            Scans.DateTime
    )
    INNER JOIN Operations
        ON Scans.OperationID = Operations.ID
    INNER JOIN Packages
        ON Items.PackageID = Packages.ID
    INNER JOIN Departments
        ON Items.DepartmentID = Departments.ID      
WHERE
    Packages.JobID = @ID        
GROUP BY
    Departments.LocationID 
    , DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime));

Which will return a sampling of results like so: 它将返回结果样本，如下所示：

8   2012-06-08 00:00:00.000 11842
21  2012-06-07 00:00:00.000 502
11  2012-06-12 00:00:00.000 1841
15  2012-06-11 00:00:00.000 4314
16  2012-06-09 00:00:00.000 278
23  2012-06-12 00:00:00.000 1345
6   2012-06-06 00:00:00.000 2005
20  2012-06-08 00:00:00.000 352
14  2012-06-07 00:00:00.000 2408
8   2012-06-11 00:00:00.000 290
19  2012-06-10 00:00:00.000 85
20  2012-06-11 00:00:00.000 5484
7   2012-06-10 00:00:00.000 2389
16  2012-06-06 00:00:00.000 6762
18  2012-06-09 00:00:00.000 4473
14  2012-06-10 00:00:00.000 2364
1   2012-06-11 00:00:00.000 1531
22  2012-06-08 00:00:00.000 14534
5   2012-06-10 00:00:00.000 11908
9   2012-06-12 00:00:00.000 47
19  2012-06-07 00:00:00.000 559
7   2012-06-07 00:00:00.000 2576

Here's the execution plan (not sure what I changed since the original post but the cost % are slightly different. The bottleneck still seems to be in the same area though): 这是执行计划（不确定自从原始帖子以来我做了什么更改，但成本％略有不同。瓶颈似乎仍然在同一区域）：

Answer 1

I am a little leery about marking this as the answer as I am sure we can still squeeze a little juice out of the query. 我不太乐意将其标记为答案，因为我相信我们仍然可以从查询中榨取一些汁液。 But this did knock my test run from 22 seconds down to 6 seconds (with an added index on Scans: OperationID including DateTime and ItemID): 但这确实使我的测试运行时间从22秒降低到了6秒（在Scans上添加了索引：OperationID包括DateTime和ItemID）：

WITH CTE AS 
(
    SELECT
        Items.ItemID AS ID          
        , Scans.DateTime AS [DateTime]
        , Operations.Cutoff AS Cutoff           
        , ROW_NUMBER() OVER (PARTITION BY Items.ID ORDER BY Scans.DateTime) AS RN
        FROM
            Items
            INNER JOIN Scans            
                ON Items.ID = Scans.ItemID
            INNER JOIN Operations
                ON Scans.OperationID = Operations.ID
            INNER JOIN Packages
                ON Items.PackageID = Packages.ID
        WHERE
            Operations.Cutoff IS NOT NULL
            AND Packages.JobID = @ID                        
)
SELECT
    Departments.LocationID
    , CTE.DateTime
    , COUNT(Items.ID) AS COUNT
FROM
    Items           
    INNER JOIN CTE
        ON Items.ID = CTE.ID
        AND CTE.RN = 1
    INNER JOIN Packages
        ON Items.PackageID = Packages.ID
    INNER JOIN Departments
        ON Items.DepartmentID = Departments.ID      
WHERE
    Packages.JobID = @ID
GROUP BY
    Departments.LocationID 
    , CTE.DateTime

Answer 2

Its hard to say for sure, but something like this may behave better. 很难确定，但是类似这样的行为可能会更好。 I replaced your nested lookup with a ROW_NUMBER call. 我将您的嵌套查询替换为ROW_NUMBER调用。 The problem in your original query is that nested lookup- its killing you. 您原始查询中的问题是嵌套查找-杀死您。

Note I don't have SQL in front of me, so I cannot test it, but I think it is logically equivalent. 注意：我前面没有SQL，因此无法测试它，但我认为它在逻辑上是等效的。

SELECT
    Departments.LocationID
    , DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime))
    , COUNT(Scans.ItemID) AS [COUNT]
FROM
    Items           
    INNER JOIN Scans
        ON Scans.ItemID = Items.ID
    INNER JOIN Operations
        ON Scans.OperationID = Operations.ID
    INNER JOIN Packages
        ON Items.PackageID = Packages.ID
    INNER JOIN Departments
        ON Items.DepartmentID = Departments.ID      
WHERE
    Operations.Cutoff IS NOT NULL
    AND
    Packages.JobID = @ID
    AND
    ROW_NUMBER () OVER (PARTITION BY Items.ID ORDER BY Scans.DateTime) = 1
GROUP BY
    Departments.LocationID 
    , DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime));

Answer 3

I'm curious - could you please run CROSS APPLY version? 我很好奇-您能运行CROSS APPLY版本吗？

SELECT
    Departments.LocationID
    , DATEADD(dd, 0, DATEDIFF(dd, 0, CA_Scans.DateTime))
    , COUNT(CA_Scans.ItemID) AS [COUNT]
FROM
    Items 
    CROSS APPLY
    (
        SELECT TOP 1
            Scans.ID,
            Scans.OperationID,
            Scans.DateTime
        FROM
            Scans
        INNER JOIN Operations
            ON Scans.OperationID = Operations.ID
        WHERE
            Operations.Cutoff IS NOT NULL
            AND Scans.ItemID = Items.ID             
        ORDER BY
            Scans.DateTime
    ) CA_Scans
    INNER JOIN Operations
        ON CA_Scans.OperationID = Operations.ID
    INNER JOIN Packages
        ON Items.PackageID = Packages.ID
    INNER JOIN Departments
        ON Items.DepartmentID = Departments.ID      
WHERE
    Packages.JobID = @ID        
GROUP BY
    Departments.LocationID 
    , DATEADD(dd, 0, DATEDIFF(dd, 0, CA_Scans.DateTime));

具有一对多关系的多表联接

问题描述

3 个解决方案

解决方案1
1 已采纳 2012-06-12 22:52:30

解决方案2
0 2012-06-12 20:15:15

解决方案3
0 2012-06-12 23:23:47

具有一对多关系的多表联接

问题描述

3 个解决方案

解决方案1 1 已采纳 2012-06-12 22:52:30

解决方案2 0 2012-06-12 20:15:15

解决方案3 0 2012-06-12 23:23:47

解决方案1
1 已采纳 2012-06-12 22:52:30

解决方案2
0 2012-06-12 20:15:15

解决方案3
0 2012-06-12 23:23:47