[英]SQL Server Rewrite Hierarchical CTE Function to a regular Select
I've been tasked with migrating a script that traverses a hierarchy and expands it. 我的任务是迁移遍历层次结构并扩展它的脚本。 Firstly the script is running extremely slow and secondly we are moving into a far more controlled server so I need to eliminate functions.
首先,脚本运行速度非常慢,其次我们正在进入一个更加受控制的服务器,因此我需要消除功能。 I was wondering if someone could perhaps assist in integrating what the function is doing in the second statement and calling the entire script inside the selection statement of the first script.
我想知道是否有人可以协助在第二个语句中集成函数正在执行的操作,并在第一个脚本的选择语句中调用整个脚本。
I understand that split between the two may be far better performance wise however this is the only function that exists and the only select statement that is using it so I would much rather prefer to integrate the two rather than go through the process of getting it approved and added. 我理解两者之间的分离可能会更好地表现,但是这是唯一存在的功能和使用它的唯一选择语句所以我更愿意整合两者而不是通过获得批准的过程并补充说。 Secondly, if anyone could see a more optimal way to achieve this it would be great and I am open to suggestions, keeping in mind this goes about 11 levels deep.
其次,如果有人能够看到一种更优化的方式来实现这一目标,那将是很好的,我愿意接受建议,记住这大约有11个级别。
The first part of the script is the select statement where the function is called and obviously returned to a table: 脚本的第一部分是select语句,其中函数被调用并显然返回到表:
DECLARE @RootNode INT = 1
DECLARE @Level1 INT = 2
DECLARE @Level2 INT = 3
DECLARE @Level3 INT = 4
DECLARE @Level4 INT = 5
TRUNCATE TABLE [...].[Hierarchy]
--
INSERT INTO [...].[Hierarchy]
SELECT Nodes.NodeId,
NodeTypeValues.Value AS HierarchyValue,
(select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @RootNode)) AS RootLevel,
(select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level1)) AS Level1,
(select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level2)) AS Level2,
(select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level3)) AS Level3,
(select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level4)) AS Level4
--Level 5...
--Level 6...
--Level 7...
FROM [...].[Nodes] Nodes
INNER JOIN [...].NodeTypes NodeTypes ON NodeTypes.NodeTypeId = Nodes.NodeTypeId
INNER JOIN [...].NodeTypeValues NodeTypeValues ON NodeTypeValues.NodeTypeValueId = Nodes.NodeTypeValueId
WHERE NodeTypes.HierarchyTypeId = 1
The second part is the actual function that is being called, the function is meant to traverse and return a tabled result back to the main query for storage: 第二部分是被调用的实际函数,该函数用于遍历并将表结果返回给主查询进行存储:
FUNCTION [...].[Function_GetTheParentNodesForTheSelectedNodeType]
( @NodeId int,
@NodeTypeId int
)
RETURNS
@ReturnData TABLE
(
NodeTypeValue NVARCHAR(100),
NodeId INT
)
AS
BEGIN
WITH NodeSubTreesUpwards AS
(
SELECT SubRootNode.NodeId AS SubRootNodeId,
SubRootNode.*,
NULL AS ChildNodeId,
0 AS HierarchyLevel
FROM [...].[Nodes] AS SubRootNode
WHERE SubRootNode.NodeId = @NodeId
UNION ALL
SELECT NodeSubTreesUpwards.SubRootNodeId,
ParentNode.*,
Parent.ChildNodeId, (NodeSubTreesUpwards.HierarchyLevel) - 1 AS HierarchyLevel
FROM NodeSubTreesUpwards
INNER JOIN [...].[ParentChildNodes] AS Parent ON Parent.ChildNodeId = NodeSubTreesUpwards.NodeId
INNER JOIN [...].[Nodes] AS ParentNode ON ParentNode.NodeId = Parent.ParentNodeId
)
INSERT INTO @ReturnData
SELECT TOP 1 NodeTypeValues.Value, NodeSubTreesUpwards.NodeId
FROM NodeSubTreesUpwards NodeSubTreesUpwards
INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = n.NodeTypeId
INNER JOIN [...].[NodeTypeValues] NodeTypeValues ON NodeTypeValues.NodeTypeValueId = n.NodeTypeValueId
WHERE NodeType.NodeTypeId = @NodeTypeId
RETURN
I have really attempted to split this out but been struggling to do so, I'm most likely missing something stupid or its purely just not understanding the process of creating a hierarchy, I've sat on this for a day or two now. 我真的试图把它分开,但一直在努力这样做,我很可能错过了一些愚蠢的东西,或者纯粹只是不理解创建层次结构的过程,我现在已经坐了一两天了。 I would be more than happy to use the same function just without calling it and rather doing it in the main select statement in place of the function being called but not sure if due to the recursion this will be an issue?
我很乐意在不调用它的情况下使用相同的函数,而是在主select语句中代替被调用的函数,但不确定是否由于递归这将是一个问题?
Try to use an inline table-valued function (ITVF) as they have better execution plans. 尝试使用内联表值函数(ITVF),因为它们具有更好的执行计划。 There is a great article at MSDN about query performance issues of multi-statement table valued functions:
MSDN上有一篇关于多语句表值函数的查询性能问题的文章 :
- Multi-statement TVF, in general, gives a very low cardinality estimate.
通常,多语句TVF给出非常低的基数估计。
- if you use multi-statement TVF, it's treated as just like another table.
如果你使用多语句TVF,它就像另一个表一样对待。 Because there are no statistics available, SQL Server has to make some assumptions and in general provide a low estimate.
由于没有可用的统计信息,SQL Server必须做出一些假设,并且通常会提供较低的估计值。 If your TVF returns only a few rows, it will be fine.
如果您的TVF只返回几行,那就没问题了。 But if you intend to populate the TVF with thousands of rows and if this TVF is joined with other tables, the inefficient plan can result from low cardinality estimate.
但是如果你打算用数千行填充TVF,并且如果这个TVF与其他表连接,那么效率低的计划可能是由于低基数估计造成的。
So just make two inline table functions from your multiline statement function Function_GetTheParentNodesForTheSelectedNodeType
: 因此,只需从多行语句函数
Function_GetTheParentNodesForTheSelectedNodeType
创建两个内联表函数:
CREATE FUNCTION dbo.ufn_NodeSubTreesUpwards
( @NodeId int )
RETURNS table
AS
RETURN (
SELECT SubRootNode.NodeId AS SubRootNodeId,
SubRootNode.*,
NULL AS ChildNodeId,
0 AS HierarchyLevel
FROM [...].[Nodes] AS SubRootNode
WHERE SubRootNode.NodeId = @NodeId
UNION ALL
SELECT NodeSubTreesUpwards.SubRootNodeId,
ParentNode.*,
Parent.ChildNodeId, (NodeSubTreesUpwards.HierarchyLevel) - 1 AS HierarchyLevel
FROM NodeSubTreesUpwards
INNER JOIN [...].[ParentChildNodes] AS Parent
ON Parent.ChildNodeId = NodeSubTreesUpwards.NodeId
INNER JOIN [...].[Nodes] AS ParentNode ON ParentNode.NodeId = Parent.ParentNodeId
)
and another function which will be used in your INSERT
query: 以及将在
INSERT
查询中使用的另一个函数:
CREATE FUNCTION dbo.ufn_GetTheParentNodesForTheSelectedNodeType
( @NodeId int,
@NodeTypeId int )
RETURNS table
AS
RETURN (
SELECT
TOP 1
NodeTypeValues.Value
, NodeSubTreesUpwards.NodeId
FROM ufn_NodeSubTreesUpwards(@NodeId) NodeSubTreesUpwards
INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = n.NodeTypeId
INNER JOIN [...].[NodeTypeValues] NodeTypeValues
ON NodeTypeValues.NodeTypeValueId = n.NodeTypeValueId
WHERE NodeType.NodeTypeId = @NodeTypeId
)
UPDATE - an example of using recursive cte in an inline table functions: UPDATE - 在内联表函数中使用递归cte的示例:
create function SequenceList ( @variable int )
returns table
as
return (
with cte as
(
select id = 1
union all
select id = cte.id+1
from cte
where id < @variable
)
select id from cte
--option ( maxrecursion 0 )
)
SELECT * FROM dbo.SequenceList(5)
The whole script is in fact very poorly written performance-wise. 事实上,整个剧本的表现非常糟糕。 Each function call generates all parent relationships from a particular node but only returns 1 row corresponding to the node type filter (it uses a
TOP 1
and doesn't have an ORDER BY
, so they are assuming that the variable filter with produce the wanted row). 每个函数调用都会从特定节点生成所有父关系,但只返回与节点类型过滤器对应的1行(它使用
TOP 1
并且没有ORDER BY
,因此他们假设变量过滤器生成所需行)。
The script that does the insert is just "pivoting" the parent levels of a node, this is why there are N calls to the function, each to retrieve a higher level. 执行插入的脚本只是“旋转”节点的父级,这就是为什么有N个函数调用,每个调用更高级别。
I mixed the first SELECT
(without the INSERT
nor the variables) with the implementation of the function to work massively and in 1 go for all the appropriate records, in the following SQL. 我将第一个
SELECT
(没有INSERT
和变量)与函数的实现混合在一起,并在下面的SQL中用1表示所有相应的记录。 A brief description of each CTE is below. 每个CTE的简要说明如下。
For any further corrections I'll need a full replicable DML + DDL, I did what I could without having the proper schema. 对于任何进一步的更正,我需要一个完全可复制的DML + DDL,我没有正确的架构就做了我能做的事。
;WITH RecursionInputNodes AS
(
SELECT DISTINCT
Nodes.NodeId
FROM
[...].[Nodes] Nodes
INNER JOIN [...].NodeTypes NodeTypes ON NodeTypes.NodeTypeId = Nodes.NodeTypeId
INNER JOIN [...].NodeTypeValues NodeTypeValues ON NodeTypeValues.NodeTypeValueId = Nodes.NodeTypeValueId
WHERE
NodeTypes.HierarchyTypeId = 1
),
RecursiveCTE AS
(
-- CTE Anchor: Start with all input nodes at lvl 0
SELECT
SubRootNode.NodeId AS NodeId,
NULL AS ChildNodeId,
0 AS HierarchyLevel,
SubRootNode.NodeTypeId AS NodeTypeId,
NodeTypeValues.Value AS NodeTypeValue
FROM
RecursionInputNodes AS RI
INNER JOIN [...].[Nodes] AS SubRootNode ON RI.NodeID = RI.NodeId
INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = RI.NodeTypeId
INNER JOIN [...].[NodeTypeValues] NodeTypeValues ON NodeTypeValues.NodeTypeValueId = RI.NodeTypeValueId
UNION ALL
-- CTE Recursion: Add each node's parent and decrease lvl by 1 each time
SELECT
R.NodeId,
Parent.ChildNodeId,
R.HierarchyLevel - 1 AS HierarchyLevel,
ParentNode.NodeTypeId AS NodeTypeId,
NodeTypeValues.Value AS NodeTypeValue
FROM
RecursiveCTE AS R
INNER JOIN [...].[ParentChildNodes] AS Parent ON Parent.ChildNodeId = R.NodeId
INNER JOIN [...].[Nodes] AS ParentNode ON ParentNode.NodeId = Parent.ParentNodeId
INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = ParentNode.NodeTypeId
INNER JOIN [...].[NodeTypeValues] NodeTypeValues ON NodeTypeValues.NodeTypeValueId = ParentNode.NodeTypeValueId
),
Just1RowByNodeTypeByNode AS
(
SELECT
R.NodeId,
R.NodeTypeId,
NodeTypeValue = MAX(R.NodeTypeValue) -- I'm "imitating" the TOP 1 from the function here
FROM
RecursiveCTE AS R
GROUP BY
R.NodeId,
R.NodeTypeId
)
SELECT
Nodes.NodeId,
NodeTypeValues.Value AS HierarchyValue,
L1.NodeTypeValue AS RootLevel,
L2.NodeTypeValue AS Level1, -- Note that the alias Level 1 here actually corresponds to the value 2 for NodeTypeId
L3.NodeTypeValue AS Level2,
L4.NodeTypeValue AS Level3,
L5.NodeTypeValue AS Level4
--Level 5...
--Level 6...
--Level 7...
FROM
RecursionInputNodes Nodes
INNER JOIN [...].NodeTypes NodeTypes ON NodeTypes.NodeTypeId = Nodes.NodeTypeId
INNER JOIN [...].NodeTypeValues NodeTypeValues ON NodeTypeValues.NodeTypeValueId = Nodes.NodeTypeValueId
LEFT JOIN Just1RowByNodeTypeByNode AS L1 ON Nodes.NodeId = L1.NodeId AND L1.NodeTypeId = 1
LEFT JOIN Just1RowByNodeTypeByNode AS L2 ON Nodes.NodeId = L2.NodeId AND L2.NodeTypeId = 2
LEFT JOIN Just1RowByNodeTypeByNode AS L3 ON Nodes.NodeId = L3.NodeId AND L3.NodeTypeId = 3
LEFT JOIN Just1RowByNodeTypeByNode AS L4 ON Nodes.NodeId = L4.NodeId AND L4.NodeTypeId = 4
LEFT JOIN Just1RowByNodeTypeByNode AS L5 ON Nodes.NodeId = L5.NodeId AND L5.NodeTypeId = 5
RecursionInputNodes
holds the input Node list for the recursion. RecursionInputNodes
保存RecursionInputNodes
的输入节点列表。 RecursiveCTE
is the set of all the input nodes with their parent relationships, until there are no more. RecursiveCTE
是具有父关系的所有输入节点的集合,直到不再存在。 The parent relationship is given through Parent.ChildNodeId = R.NodeId
. Parent.ChildNodeId = R.NodeId
。 I also added NodeTypeId
and NodeTypeValue
because we need to filter them on the next CTE. NodeTypeId
和NodeTypeValue
因为我们需要在下一个CTE上过滤它们。 Just1RowByNodeTypeByNode
is used to determine, by each NodeId
and NodeTypeId
, the wanted value of NodeTypeValue
, which is what the caller wants from the function. Just1RowByNodeTypeByNode
来确定,每个NodeId
和NodeTypeId
的通缉值NodeTypeValue
,这是主叫方从功能所需要的。 The NodeTypeId
is gonna get filtered (it's the parameter from the original function). NodeTypeId
将被过滤(它是原始函数的参数)。 This step "mimics" the TOP 1
from the original function. TOP 1
。 I'd recommend executing each CTE one by one in order (each with the previous one, as they are referenced) to understand how the last SELECT
gets all together. 我建议按顺序逐个执行每个CTE(每个CTE都有前一个,因为它们被引用)以了解最后一个
SELECT
如何一起使用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.