I've been tasked with migrating a script that traverses a hierarchy and expands it. Firstly the script is running extremely slow and secondly we are moving into a far more controlled server so I need to eliminate functions. I was wondering if someone could perhaps assist in integrating what the function is doing in the second statement and calling the entire script inside the selection statement of the first script.
I understand that split between the two may be far better performance wise however this is the only function that exists and the only select statement that is using it so I would much rather prefer to integrate the two rather than go through the process of getting it approved and added. Secondly, if anyone could see a more optimal way to achieve this it would be great and I am open to suggestions, keeping in mind this goes about 11 levels deep.
The first part of the script is the select statement where the function is called and obviously returned to a table:
DECLARE @RootNode INT = 1
DECLARE @Level1 INT = 2
DECLARE @Level2 INT = 3
DECLARE @Level3 INT = 4
DECLARE @Level4 INT = 5
TRUNCATE TABLE [...].[Hierarchy]
--
INSERT INTO [...].[Hierarchy]
SELECT Nodes.NodeId,
NodeTypeValues.Value AS HierarchyValue,
(select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @RootNode)) AS RootLevel,
(select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level1)) AS Level1,
(select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level2)) AS Level2,
(select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level3)) AS Level3,
(select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level4)) AS Level4
--Level 5...
--Level 6...
--Level 7...
FROM [...].[Nodes] Nodes
INNER JOIN [...].NodeTypes NodeTypes ON NodeTypes.NodeTypeId = Nodes.NodeTypeId
INNER JOIN [...].NodeTypeValues NodeTypeValues ON NodeTypeValues.NodeTypeValueId = Nodes.NodeTypeValueId
WHERE NodeTypes.HierarchyTypeId = 1
The second part is the actual function that is being called, the function is meant to traverse and return a tabled result back to the main query for storage:
FUNCTION [...].[Function_GetTheParentNodesForTheSelectedNodeType]
( @NodeId int,
@NodeTypeId int
)
RETURNS
@ReturnData TABLE
(
NodeTypeValue NVARCHAR(100),
NodeId INT
)
AS
BEGIN
WITH NodeSubTreesUpwards AS
(
SELECT SubRootNode.NodeId AS SubRootNodeId,
SubRootNode.*,
NULL AS ChildNodeId,
0 AS HierarchyLevel
FROM [...].[Nodes] AS SubRootNode
WHERE SubRootNode.NodeId = @NodeId
UNION ALL
SELECT NodeSubTreesUpwards.SubRootNodeId,
ParentNode.*,
Parent.ChildNodeId, (NodeSubTreesUpwards.HierarchyLevel) - 1 AS HierarchyLevel
FROM NodeSubTreesUpwards
INNER JOIN [...].[ParentChildNodes] AS Parent ON Parent.ChildNodeId = NodeSubTreesUpwards.NodeId
INNER JOIN [...].[Nodes] AS ParentNode ON ParentNode.NodeId = Parent.ParentNodeId
)
INSERT INTO @ReturnData
SELECT TOP 1 NodeTypeValues.Value, NodeSubTreesUpwards.NodeId
FROM NodeSubTreesUpwards NodeSubTreesUpwards
INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = n.NodeTypeId
INNER JOIN [...].[NodeTypeValues] NodeTypeValues ON NodeTypeValues.NodeTypeValueId = n.NodeTypeValueId
WHERE NodeType.NodeTypeId = @NodeTypeId
RETURN
I have really attempted to split this out but been struggling to do so, I'm most likely missing something stupid or its purely just not understanding the process of creating a hierarchy, I've sat on this for a day or two now. I would be more than happy to use the same function just without calling it and rather doing it in the main select statement in place of the function being called but not sure if due to the recursion this will be an issue?
Try to use an inline table-valued function (ITVF) as they have better execution plans. There is a great article at MSDN about query performance issues of multi-statement table valued functions:
- Multi-statement TVF, in general, gives a very low cardinality estimate.
- if you use multi-statement TVF, it's treated as just like another table. Because there are no statistics available, SQL Server has to make some assumptions and in general provide a low estimate. If your TVF returns only a few rows, it will be fine. But if you intend to populate the TVF with thousands of rows and if this TVF is joined with other tables, the inefficient plan can result from low cardinality estimate.
So just make two inline table functions from your multiline statement function Function_GetTheParentNodesForTheSelectedNodeType
:
CREATE FUNCTION dbo.ufn_NodeSubTreesUpwards
( @NodeId int )
RETURNS table
AS
RETURN (
SELECT SubRootNode.NodeId AS SubRootNodeId,
SubRootNode.*,
NULL AS ChildNodeId,
0 AS HierarchyLevel
FROM [...].[Nodes] AS SubRootNode
WHERE SubRootNode.NodeId = @NodeId
UNION ALL
SELECT NodeSubTreesUpwards.SubRootNodeId,
ParentNode.*,
Parent.ChildNodeId, (NodeSubTreesUpwards.HierarchyLevel) - 1 AS HierarchyLevel
FROM NodeSubTreesUpwards
INNER JOIN [...].[ParentChildNodes] AS Parent
ON Parent.ChildNodeId = NodeSubTreesUpwards.NodeId
INNER JOIN [...].[Nodes] AS ParentNode ON ParentNode.NodeId = Parent.ParentNodeId
)
and another function which will be used in your INSERT
query:
CREATE FUNCTION dbo.ufn_GetTheParentNodesForTheSelectedNodeType
( @NodeId int,
@NodeTypeId int )
RETURNS table
AS
RETURN (
SELECT
TOP 1
NodeTypeValues.Value
, NodeSubTreesUpwards.NodeId
FROM ufn_NodeSubTreesUpwards(@NodeId) NodeSubTreesUpwards
INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = n.NodeTypeId
INNER JOIN [...].[NodeTypeValues] NodeTypeValues
ON NodeTypeValues.NodeTypeValueId = n.NodeTypeValueId
WHERE NodeType.NodeTypeId = @NodeTypeId
)
UPDATE - an example of using recursive cte in an inline table functions:
create function SequenceList ( @variable int )
returns table
as
return (
with cte as
(
select id = 1
union all
select id = cte.id+1
from cte
where id < @variable
)
select id from cte
--option ( maxrecursion 0 )
)
SELECT * FROM dbo.SequenceList(5)
The whole script is in fact very poorly written performance-wise. Each function call generates all parent relationships from a particular node but only returns 1 row corresponding to the node type filter (it uses a TOP 1
and doesn't have an ORDER BY
, so they are assuming that the variable filter with produce the wanted row).
The script that does the insert is just "pivoting" the parent levels of a node, this is why there are N calls to the function, each to retrieve a higher level.
I mixed the first SELECT
(without the INSERT
nor the variables) with the implementation of the function to work massively and in 1 go for all the appropriate records, in the following SQL. A brief description of each CTE is below.
For any further corrections I'll need a full replicable DML + DDL, I did what I could without having the proper schema.
;WITH RecursionInputNodes AS
(
SELECT DISTINCT
Nodes.NodeId
FROM
[...].[Nodes] Nodes
INNER JOIN [...].NodeTypes NodeTypes ON NodeTypes.NodeTypeId = Nodes.NodeTypeId
INNER JOIN [...].NodeTypeValues NodeTypeValues ON NodeTypeValues.NodeTypeValueId = Nodes.NodeTypeValueId
WHERE
NodeTypes.HierarchyTypeId = 1
),
RecursiveCTE AS
(
-- CTE Anchor: Start with all input nodes at lvl 0
SELECT
SubRootNode.NodeId AS NodeId,
NULL AS ChildNodeId,
0 AS HierarchyLevel,
SubRootNode.NodeTypeId AS NodeTypeId,
NodeTypeValues.Value AS NodeTypeValue
FROM
RecursionInputNodes AS RI
INNER JOIN [...].[Nodes] AS SubRootNode ON RI.NodeID = RI.NodeId
INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = RI.NodeTypeId
INNER JOIN [...].[NodeTypeValues] NodeTypeValues ON NodeTypeValues.NodeTypeValueId = RI.NodeTypeValueId
UNION ALL
-- CTE Recursion: Add each node's parent and decrease lvl by 1 each time
SELECT
R.NodeId,
Parent.ChildNodeId,
R.HierarchyLevel - 1 AS HierarchyLevel,
ParentNode.NodeTypeId AS NodeTypeId,
NodeTypeValues.Value AS NodeTypeValue
FROM
RecursiveCTE AS R
INNER JOIN [...].[ParentChildNodes] AS Parent ON Parent.ChildNodeId = R.NodeId
INNER JOIN [...].[Nodes] AS ParentNode ON ParentNode.NodeId = Parent.ParentNodeId
INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = ParentNode.NodeTypeId
INNER JOIN [...].[NodeTypeValues] NodeTypeValues ON NodeTypeValues.NodeTypeValueId = ParentNode.NodeTypeValueId
),
Just1RowByNodeTypeByNode AS
(
SELECT
R.NodeId,
R.NodeTypeId,
NodeTypeValue = MAX(R.NodeTypeValue) -- I'm "imitating" the TOP 1 from the function here
FROM
RecursiveCTE AS R
GROUP BY
R.NodeId,
R.NodeTypeId
)
SELECT
Nodes.NodeId,
NodeTypeValues.Value AS HierarchyValue,
L1.NodeTypeValue AS RootLevel,
L2.NodeTypeValue AS Level1, -- Note that the alias Level 1 here actually corresponds to the value 2 for NodeTypeId
L3.NodeTypeValue AS Level2,
L4.NodeTypeValue AS Level3,
L5.NodeTypeValue AS Level4
--Level 5...
--Level 6...
--Level 7...
FROM
RecursionInputNodes Nodes
INNER JOIN [...].NodeTypes NodeTypes ON NodeTypes.NodeTypeId = Nodes.NodeTypeId
INNER JOIN [...].NodeTypeValues NodeTypeValues ON NodeTypeValues.NodeTypeValueId = Nodes.NodeTypeValueId
LEFT JOIN Just1RowByNodeTypeByNode AS L1 ON Nodes.NodeId = L1.NodeId AND L1.NodeTypeId = 1
LEFT JOIN Just1RowByNodeTypeByNode AS L2 ON Nodes.NodeId = L2.NodeId AND L2.NodeTypeId = 2
LEFT JOIN Just1RowByNodeTypeByNode AS L3 ON Nodes.NodeId = L3.NodeId AND L3.NodeTypeId = 3
LEFT JOIN Just1RowByNodeTypeByNode AS L4 ON Nodes.NodeId = L4.NodeId AND L4.NodeTypeId = 4
LEFT JOIN Just1RowByNodeTypeByNode AS L5 ON Nodes.NodeId = L5.NodeId AND L5.NodeTypeId = 5
RecursionInputNodes
holds the input Node list for the recursion. RecursiveCTE
is the set of all the input nodes with their parent relationships, until there are no more. The parent relationship is given through Parent.ChildNodeId = R.NodeId
. I also added NodeTypeId
and NodeTypeValue
because we need to filter them on the next CTE. Just1RowByNodeTypeByNode
is used to determine, by each NodeId
and NodeTypeId
, the wanted value of NodeTypeValue
, which is what the caller wants from the function. The NodeTypeId
is gonna get filtered (it's the parameter from the original function). This step "mimics" the TOP 1
from the original function. I'd recommend executing each CTE one by one in order (each with the previous one, as they are referenced) to understand how the last SELECT
gets all together.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.