[英]Multiple Table Joins With One-to-Many Relationships
Using SQL Server 2008. 使用SQL Server 2008。
I have multiple Locations which each contain multiple Departments which each contain multiple Items which can have zero to many Scans. 我有多个位置,每个位置包含多个部门,每个部门包含多个项目,这些项目的扫描次数可能为零。 Each Scan relates to a specific Operation which may or may not have a cutoff time.
每次扫描都涉及一个特定的操作,该操作可能有也可能没有截止时间。 Each Item also belongs to a specific Package which belongs to a specific Job.
每个项目还属于一个特定的包,该特定的包属于一个特定的作业。 Each job contains one or more Packages which contains one or more items.
每个作业包含一个或多个包含一个或多个项目的程序包。
+=============+ +=============+
| Locations | | Jobs |
+=============+ +=============+
^ ^
| |
+=============+ +=============+ +=============+
| Departments | <-- | Items | --> | Packages |
+=============+ +=============+ +=============+
^
|
+=============+ +=============+
| Scans | --> | Operations |
+=============+ +=============+
What I am attempting to do is get a count of Scans for a Job grouped by Location and Scan date. 我想做的是获取按位置和扫描日期分组的作业扫描计数。 The tricky part is that I only want to count the first Scan by date/time per Item where the cutoff time for the Operation is not null.
棘手的部分是,我只想按操作的截止时间不为空的每个项目的日期/时间来计算第一次扫描。 (NOTE: the scans definitely will NOT be in date/time order in the table.)
(注意:扫描肯定不会按表中的日期/时间顺序进行。)
The query I have is getting me the correct results but it is painfully slow when the number of Items for a Job reaches 75,000 or so. 我的查询为我提供了正确的结果,但是当一项作业的项目数达到75,000左右时,它的速度将非常缓慢。 I am pushing for a new server -- I know our hardware is lacking -- but I am wondering if there is something I am doing in the query that is bogging it down as well.
我正在努力寻找新服务器-我知道我们的硬件不足-但我想知道查询中是否正在做某些事情,也正在使它陷入困境 。
From what little I can glean from the execution plan, most of the cost of the query seems to be in the sub-query to find the first Scan for each Item. 从执行计划中我所能收集的几乎没有,查询的大部分成本似乎都在子查询中以查找每个项目的第一个扫描。 It does an index scan (0%) on an Operations table index (ID, Cutoff) and then a lazy spool (19%).
它对操作表索引(ID,截止)进行索引扫描(0%),然后对懒惰假脱机(19%)进行索引扫描。 It does an index seek (61%) on a Scans table index (ItemID, DateTime, OperationID, ID).
它对扫描表索引(ItemID,DateTime,OperationID,ID)进行索引搜索(占61%)。 The subsequent nested loops (inner join) is only 2% and the Top operator is 0%.
随后的嵌套循环(内部联接)仅为2%,Top运算符为0%。 (Not that I really understand much of what I just typed but I am trying to provide as much info as possible...)
(并不是说我真的很了解我刚才键入的内容,但我正在尝试提供尽可能多的信息...)
Here is the query: 这是查询:
SELECT
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime))
, COUNT(Scans.ItemID) AS [COUNT]
FROM
Items
INNER JOIN Scans
ON Scans.ID =
(
SELECT TOP 1
Scans.ID
FROM
Scans
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
WHERE
Operations.Cutoff IS NOT NULL
AND Scans.ItemID = Items.ID
ORDER BY
Scans.DateTime
)
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
INNER JOIN Packages
ON Items.PackageID = Packages.ID
INNER JOIN Departments
ON Items.DepartmentID = Departments.ID
WHERE
Packages.JobID = @ID
GROUP BY
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime));
Which will return a sampling of results like so: 它将返回结果样本,如下所示:
8 2012-06-08 00:00:00.000 11842
21 2012-06-07 00:00:00.000 502
11 2012-06-12 00:00:00.000 1841
15 2012-06-11 00:00:00.000 4314
16 2012-06-09 00:00:00.000 278
23 2012-06-12 00:00:00.000 1345
6 2012-06-06 00:00:00.000 2005
20 2012-06-08 00:00:00.000 352
14 2012-06-07 00:00:00.000 2408
8 2012-06-11 00:00:00.000 290
19 2012-06-10 00:00:00.000 85
20 2012-06-11 00:00:00.000 5484
7 2012-06-10 00:00:00.000 2389
16 2012-06-06 00:00:00.000 6762
18 2012-06-09 00:00:00.000 4473
14 2012-06-10 00:00:00.000 2364
1 2012-06-11 00:00:00.000 1531
22 2012-06-08 00:00:00.000 14534
5 2012-06-10 00:00:00.000 11908
9 2012-06-12 00:00:00.000 47
19 2012-06-07 00:00:00.000 559
7 2012-06-07 00:00:00.000 2576
Here's the execution plan (not sure what I changed since the original post but the cost % are slightly different. The bottleneck still seems to be in the same area though): 这是执行计划(不确定自从原始帖子以来我做了什么更改,但成本%略有不同。瓶颈似乎仍然在同一区域):
I am a little leery about marking this as the answer as I am sure we can still squeeze a little juice out of the query. 我不太乐意将其标记为答案,因为我相信我们仍然可以从查询中榨取一些汁液。 But this did knock my test run from 22 seconds down to 6 seconds (with an added index on Scans: OperationID including DateTime and ItemID):
但这确实使我的测试运行时间从22秒降低到了6秒(在Scans上添加了索引:OperationID包括DateTime和ItemID):
WITH CTE AS
(
SELECT
Items.ItemID AS ID
, Scans.DateTime AS [DateTime]
, Operations.Cutoff AS Cutoff
, ROW_NUMBER() OVER (PARTITION BY Items.ID ORDER BY Scans.DateTime) AS RN
FROM
Items
INNER JOIN Scans
ON Items.ID = Scans.ItemID
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
INNER JOIN Packages
ON Items.PackageID = Packages.ID
WHERE
Operations.Cutoff IS NOT NULL
AND Packages.JobID = @ID
)
SELECT
Departments.LocationID
, CTE.DateTime
, COUNT(Items.ID) AS COUNT
FROM
Items
INNER JOIN CTE
ON Items.ID = CTE.ID
AND CTE.RN = 1
INNER JOIN Packages
ON Items.PackageID = Packages.ID
INNER JOIN Departments
ON Items.DepartmentID = Departments.ID
WHERE
Packages.JobID = @ID
GROUP BY
Departments.LocationID
, CTE.DateTime
Its hard to say for sure, but something like this may behave better. 很难确定,但是类似这样的行为可能会更好。 I replaced your nested lookup with a ROW_NUMBER call.
我将您的嵌套查询替换为ROW_NUMBER调用。 The problem in your original query is that nested lookup- its killing you.
您原始查询中的问题是嵌套查找-杀死您。
Note I don't have SQL in front of me, so I cannot test it, but I think it is logically equivalent. 注意:我前面没有SQL,因此无法测试它,但我认为它在逻辑上是等效的。
SELECT
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime))
, COUNT(Scans.ItemID) AS [COUNT]
FROM
Items
INNER JOIN Scans
ON Scans.ItemID = Items.ID
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
INNER JOIN Packages
ON Items.PackageID = Packages.ID
INNER JOIN Departments
ON Items.DepartmentID = Departments.ID
WHERE
Operations.Cutoff IS NOT NULL
AND
Packages.JobID = @ID
AND
ROW_NUMBER () OVER (PARTITION BY Items.ID ORDER BY Scans.DateTime) = 1
GROUP BY
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime));
I'm curious - could you please run CROSS APPLY version? 我很好奇-您能运行CROSS APPLY版本吗?
SELECT
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, CA_Scans.DateTime))
, COUNT(CA_Scans.ItemID) AS [COUNT]
FROM
Items
CROSS APPLY
(
SELECT TOP 1
Scans.ID,
Scans.OperationID,
Scans.DateTime
FROM
Scans
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
WHERE
Operations.Cutoff IS NOT NULL
AND Scans.ItemID = Items.ID
ORDER BY
Scans.DateTime
) CA_Scans
INNER JOIN Operations
ON CA_Scans.OperationID = Operations.ID
INNER JOIN Packages
ON Items.PackageID = Packages.ID
INNER JOIN Departments
ON Items.DepartmentID = Departments.ID
WHERE
Packages.JobID = @ID
GROUP BY
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, CA_Scans.DateTime));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.