简体   繁体   English

加快SQL Server中的累积总和计算

[英]Speeding up cumulative sum calculation in SQL Server

As part of some solution building, I had to implement a view which is performing a running total (cumulative sum calculation). 作为一些解决方案构建的一部分,我必须实现一个正在执行总计(累计和计算)的视图。 I took the most simple and basic approach of joining table on table with list of dates but it seems that the view is still fairly slow. 我采用了最简单,最基本的方法来将表与具有日期列表的表连接起来,但看来视图仍然相当慢。 Addition of indexes on the table didn't help much, even though the table itself have just 15K rows or so. 即使表本身只有约15K行,在表上添加索引也无济于事。 I was wondering if someone could advice on what would be the right approach to speed it up? 我想知道是否有人可以建议哪种方法可以加快速度?

There are several considerations: 有几个注意事项:

  1. I need to calculate cumulative sum up to a date for specific ProjectID and ContractorID . 我需要计算截至特定ProjectIDContractorID的日期的累计总和。 So for same date I may have a lot of ProjectIDs and ContractorIds combinations, but combination of Date, ProjectID and ContractorID is always unique 因此对于同一日期,我可能会有很多ProjectID和ContractorId组合,但是Date,ProjectID和ContractorID的组合始终是唯一的

  2. There is a master table with dates, projectids (but no contractorids) and I need a cumulative sum for each date, projectid in this master dates table 有一个包含日期,项目编号(但没有合同编号)的主表,并且在此主日期表中,我需要每个日期的累积总和,项目编号

  3. I need to calculate a cumulative sum of several columns at the same time, not just of one 我需要同时计算几列的总和,而不仅仅是一列

To walk you through the situation slightly more, the tables I have are: 为了让您稍微了解一下情况,我提供的表格是:

  • dbo.Project_Reporting_Schedule which holds a master list of projectid, dates. dbo.Project_Reporting_Schedule ,其中包含projectid,日期的主列表。 For each of this combinations I need to calculate a cumulative sum based on another table. 对于每种组合,我都需要根据另一张表计算累计和。 Please note it has no contractorid! 请注意,它没有承包商!

  • Project_value_delivery is a table where I have actual value columns to perform a cumulative sum calculation. Project_value_delivery是一张表,其中有一些实际值列用于执行累积总和计算。 It has its own set of dates which may or may not match dates in Project_Reporting_Schedule , hence we can't just join the table on itself. 它有自己的日期集,该日期集可能与Project_Reporting_Schedule中的日期匹配,也可能不匹配,因此我们不能仅将表格本身连接起来。 Please also note it has contractorid! 另请注意,它具有Contractorid!

Currently I have the following select which is rather self-explanatory and just joins table with values on table with master date list and does the summation. 目前,我有以下选择,这是不言自明的,只是将表与具有主日期列表的表中的值连接起来并进行求和。 Select works fine, but even with just 15K records it takes almost 5 seconds to run which is fairly slow. Select可以正常工作,但是即使只有15K记录,它也需要花费近5秒钟来运行,这相当慢。

select 
    pv2.ProjectID,
    pv2.ContractorID,
    pv1.Date, 
    sum(pv2.ValuePlanned) as PlannedCumulative, 
    sum(pv2.ValueActual) as ActualCumulative,
    sum(pv2.MobilizationPlanned) as MobilizationPlanned,
    sum(pv2.MobilizationActual) as MobilizationActual,
    sum(pv2.EngineeringPlanned) as EngineeringPlanned,
    sum(pv2.EngineeringActual) as EngineeringActual,
    sum(pv2.ProcurementPlanned) as ProcurementPlanned,
    sum(pv2.ProcurementActual) as ProcurementActual,
    sum(pv2.ConstructionPlanned) as ConstructionPlanned,
    sum(pv2.ConstructionActual) as ConstructionActual,
    sum(pv2.CommisioningTestingPlanned) as CommisioningTestingPlanned,
    sum(pv2.CommisioningTestingActual) as CommisioningTestingActual
from 
    dbo.Project_Reporting_Schedule as pv1
join 
    dbo.Project_value_delivery as pv2 on pv1.Date >= pv2.Date and pv1.ProjectID = pv2.ProjectID
group by 
    pv2.ProjectID, pv2.ContractorID, pv1.Date

UPDATE 更新

For further clarifications, I put execution plan here: https://www.brentozar.com/pastetheplan/?id=H12t-O1PS 为了进一步说明,我将执行计划放在此处: https : //www.brentozar.com/pastetheplan/?id=H12t-O1PS

Indexes created are same and on both tables I have them for Projectid, Date combination as well as standalone indexes on ProjectID and Date columns. 创建的索引是相同的,在两个表上我都有Projectid,Date组合以及ProjectID和Date列上的独立索引。

All indexes are Unique Nonclustered where applicable or just Nonclustered where applicable. 所有索引在适用情况下都是唯一非聚集索引,或在适用情况下仅是非聚集索引。

We can see it does 'non-clustered index seek' which costs most of the execution. 我们可以看到它执行了“非聚集索引查找”,这花费了大部分执行时间。 Maybe index needs to be adjusted? 也许指数需要调整?

Take the comparison out of the JOIN clause and move it to a WHERE clause: JOIN子句中取出比较并将其移至WHERE子句:

select 
       pv2.ProjectID,
       pv2.ContractorID,
       pv1.Date, 
       sum(pv2.ValuePlanned) as PlannedCumulative, 
       sum(pv2.ValueActual) as ActualCumulative,
       sum(pv2.MobilizationPlanned) as MobilizationPlanned,
       sum(pv2.MobilizationActual) as MobilizationActual,
       sum(pv2.EngineeringPlanned) as EngineeringPlanned,
       sum(pv2.EngineeringActual) as EngineeringActual,
       sum(pv2.ProcurementPlanned) as ProcurementPlanned,
       sum(pv2.ProcurementActual) as ProcurementActual,
       sum(pv2.ConstructionPlanned) as ConstructionPlanned,
       sum(pv2.ConstructionActual) as ConstructionActual,
       sum(pv2.CommisioningTestingPlanned) as CommisioningTestingPlanned,
       sum(pv2.CommisioningTestingActual) as CommisioningTestingActual
       FROM
       dbo.Project_Reporting_Schedule as pv1
       join dbo.Project_value_delivery as pv2 on pv1.ProjectID = pv2.ProjectID
       WHERE pv1.Date >= pv2.Date
       GROUP BY pv2.ProjectID, pv2.ContractorID, pv1.Date

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM