简体   繁体   中英

SQL Server Rewrite Hierarchical CTE Function to a regular Select

I've been tasked with migrating a script that traverses a hierarchy and expands it. Firstly the script is running extremely slow and secondly we are moving into a far more controlled server so I need to eliminate functions. I was wondering if someone could perhaps assist in integrating what the function is doing in the second statement and calling the entire script inside the selection statement of the first script.

I understand that split between the two may be far better performance wise however this is the only function that exists and the only select statement that is using it so I would much rather prefer to integrate the two rather than go through the process of getting it approved and added. Secondly, if anyone could see a more optimal way to achieve this it would be great and I am open to suggestions, keeping in mind this goes about 11 levels deep.

The first part of the script is the select statement where the function is called and obviously returned to a table:

DECLARE @RootNode INT = 1
DECLARE @Level1 INT = 2
DECLARE @Level2 INT = 3
DECLARE @Level3 INT = 4
DECLARE @Level4 INT = 5


TRUNCATE TABLE [...].[Hierarchy]
--
INSERT INTO [...].[Hierarchy]
SELECT Nodes.NodeId, 
       NodeTypeValues.Value AS HierarchyValue, 
       (select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @RootNode)) AS RootLevel,
       (select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level1)) AS Level1,
       (select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level2)) AS Level2,
       (select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level3)) AS Level3,
       (select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level4)) AS Level4
       --Level 5...
       --Level 6...
       --Level 7...
  FROM [...].[Nodes] Nodes
       INNER JOIN [...].NodeTypes NodeTypes ON NodeTypes.NodeTypeId = Nodes.NodeTypeId
       INNER JOIN [...].NodeTypeValues NodeTypeValues ON NodeTypeValues.NodeTypeValueId = Nodes.NodeTypeValueId
WHERE NodeTypes.HierarchyTypeId = 1

The second part is the actual function that is being called, the function is meant to traverse and return a tabled result back to the main query for storage:

FUNCTION [...].[Function_GetTheParentNodesForTheSelectedNodeType]

    ( @NodeId int,
      @NodeTypeId int
    )
    RETURNS 
      @ReturnData TABLE 
    (
      NodeTypeValue NVARCHAR(100),
      NodeId INT
    )

AS
BEGIN

    WITH NodeSubTreesUpwards AS 
    (
       SELECT SubRootNode.NodeId AS SubRootNodeId, 
              SubRootNode.*,
              NULL AS ChildNodeId, 
              0 AS HierarchyLevel
        FROM [...].[Nodes] AS SubRootNode
        WHERE SubRootNode.NodeId = @NodeId

      UNION ALL

       SELECT NodeSubTreesUpwards.SubRootNodeId, 
              ParentNode.*,
              Parent.ChildNodeId, (NodeSubTreesUpwards.HierarchyLevel) - 1 AS HierarchyLevel
        FROM NodeSubTreesUpwards
        INNER JOIN [...].[ParentChildNodes] AS Parent ON Parent.ChildNodeId = NodeSubTreesUpwards.NodeId
        INNER JOIN [...].[Nodes] AS ParentNode ON ParentNode.NodeId = Parent.ParentNodeId
    )

    INSERT INTO @ReturnData
    SELECT TOP 1 NodeTypeValues.Value,  NodeSubTreesUpwards.NodeId
          FROM NodeSubTreesUpwards NodeSubTreesUpwards
                   INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = n.NodeTypeId
                   INNER JOIN [...].[NodeTypeValues] NodeTypeValues ON NodeTypeValues.NodeTypeValueId = n.NodeTypeValueId
     WHERE NodeType.NodeTypeId = @NodeTypeId

   RETURN 

I have really attempted to split this out but been struggling to do so, I'm most likely missing something stupid or its purely just not understanding the process of creating a hierarchy, I've sat on this for a day or two now. I would be more than happy to use the same function just without calling it and rather doing it in the main select statement in place of the function being called but not sure if due to the recursion this will be an issue?

Try to use an inline table-valued function (ITVF) as they have better execution plans. There is a great article at MSDN about query performance issues of multi-statement table valued functions:

  1. Multi-statement TVF, in general, gives a very low cardinality estimate.
  2. if you use multi-statement TVF, it's treated as just like another table. Because there are no statistics available, SQL Server has to make some assumptions and in general provide a low estimate. If your TVF returns only a few rows, it will be fine. But if you intend to populate the TVF with thousands of rows and if this TVF is joined with other tables, the inefficient plan can result from low cardinality estimate.

So just make two inline table functions from your multiline statement function Function_GetTheParentNodesForTheSelectedNodeType :

CREATE FUNCTION dbo.ufn_NodeSubTreesUpwards
     ( @NodeId int )
RETURNS table
AS
RETURN (
        SELECT SubRootNode.NodeId AS SubRootNodeId, 
              SubRootNode.*,
              NULL AS ChildNodeId, 
              0 AS HierarchyLevel
        FROM [...].[Nodes] AS SubRootNode
        WHERE SubRootNode.NodeId = @NodeId

      UNION ALL

       SELECT NodeSubTreesUpwards.SubRootNodeId, 
              ParentNode.*,
              Parent.ChildNodeId, (NodeSubTreesUpwards.HierarchyLevel) - 1 AS HierarchyLevel
        FROM NodeSubTreesUpwards
        INNER JOIN [...].[ParentChildNodes] AS Parent 
            ON Parent.ChildNodeId = NodeSubTreesUpwards.NodeId
        INNER JOIN [...].[Nodes] AS ParentNode ON ParentNode.NodeId = Parent.ParentNodeId
       )

and another function which will be used in your INSERT query:

CREATE FUNCTION dbo.ufn_GetTheParentNodesForTheSelectedNodeType
     ( @NodeId int,
       @NodeTypeId int )
RETURNS table
AS
RETURN (
    SELECT 
     TOP 1 
     NodeTypeValues.Value
    , NodeSubTreesUpwards.NodeId
    FROM ufn_NodeSubTreesUpwards(@NodeId) NodeSubTreesUpwards
    INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = n.NodeTypeId
    INNER JOIN [...].[NodeTypeValues] NodeTypeValues 
        ON NodeTypeValues.NodeTypeValueId = n.NodeTypeValueId
        WHERE NodeType.NodeTypeId = @NodeTypeId
       )

UPDATE - an example of using recursive cte in an inline table functions:

create function SequenceList ( @variable int )
returns table
as
return (
with cte as
(
select id = 1
union all
select id = cte.id+1
from cte
where id < @variable
)
select id from cte
--option ( maxrecursion 0 )
)

SELECT * FROM dbo.SequenceList(5)

The whole script is in fact very poorly written performance-wise. Each function call generates all parent relationships from a particular node but only returns 1 row corresponding to the node type filter (it uses a TOP 1 and doesn't have an ORDER BY , so they are assuming that the variable filter with produce the wanted row).

The script that does the insert is just "pivoting" the parent levels of a node, this is why there are N calls to the function, each to retrieve a higher level.

I mixed the first SELECT (without the INSERT nor the variables) with the implementation of the function to work massively and in 1 go for all the appropriate records, in the following SQL. A brief description of each CTE is below.

For any further corrections I'll need a full replicable DML + DDL, I did what I could without having the proper schema.

;WITH RecursionInputNodes AS
(
    SELECT DISTINCT
        Nodes.NodeId
    FROM 
        [...].[Nodes] Nodes
        INNER JOIN [...].NodeTypes NodeTypes ON NodeTypes.NodeTypeId = Nodes.NodeTypeId
        INNER JOIN [...].NodeTypeValues NodeTypeValues ON NodeTypeValues.NodeTypeValueId = Nodes.NodeTypeValueId
    WHERE 
        NodeTypes.HierarchyTypeId = 1
),
RecursiveCTE AS
(
    -- CTE Anchor: Start with all input nodes at lvl 0
    SELECT 
        SubRootNode.NodeId AS NodeId, 
        NULL AS ChildNodeId,
        0 AS HierarchyLevel,
        SubRootNode.NodeTypeId AS NodeTypeId,
        NodeTypeValues.Value AS NodeTypeValue
    FROM
        RecursionInputNodes AS RI
        INNER JOIN [...].[Nodes] AS SubRootNode ON RI.NodeID = RI.NodeId
        INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = RI.NodeTypeId
        INNER JOIN [...].[NodeTypeValues] NodeTypeValues ON NodeTypeValues.NodeTypeValueId = RI.NodeTypeValueId

    UNION ALL

    -- CTE Recursion: Add each node's parent and decrease lvl by 1 each time
    SELECT 
        R.NodeId,
        Parent.ChildNodeId,
        R.HierarchyLevel - 1 AS HierarchyLevel,
        ParentNode.NodeTypeId AS NodeTypeId,
        NodeTypeValues.Value AS NodeTypeValue
    FROM 
        RecursiveCTE AS R
        INNER JOIN [...].[ParentChildNodes] AS Parent ON Parent.ChildNodeId = R.NodeId
        INNER JOIN [...].[Nodes] AS ParentNode ON ParentNode.NodeId = Parent.ParentNodeId
        INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = ParentNode.NodeTypeId
        INNER JOIN [...].[NodeTypeValues] NodeTypeValues ON NodeTypeValues.NodeTypeValueId = ParentNode.NodeTypeValueId
),
Just1RowByNodeTypeByNode AS
(
    SELECT
        R.NodeId,
        R.NodeTypeId,
        NodeTypeValue = MAX(R.NodeTypeValue) -- I'm "imitating" the TOP 1 from the function here
    FROM
        RecursiveCTE AS R
    GROUP BY
        R.NodeId,
        R.NodeTypeId
)
SELECT 
    Nodes.NodeId, 
    NodeTypeValues.Value AS HierarchyValue,
    L1.NodeTypeValue AS RootLevel,
    L2.NodeTypeValue AS Level1, -- Note that the alias Level 1 here actually corresponds to the value 2 for NodeTypeId
    L3.NodeTypeValue AS Level2,
    L4.NodeTypeValue AS Level3,
    L5.NodeTypeValue AS Level4
    --Level 5...
    --Level 6...
    --Level 7...
FROM 
    RecursionInputNodes Nodes
    INNER JOIN [...].NodeTypes NodeTypes ON NodeTypes.NodeTypeId = Nodes.NodeTypeId
    INNER JOIN [...].NodeTypeValues NodeTypeValues ON NodeTypeValues.NodeTypeValueId = Nodes.NodeTypeValueId

    LEFT JOIN Just1RowByNodeTypeByNode AS L1 ON Nodes.NodeId = L1.NodeId AND L1.NodeTypeId = 1
    LEFT JOIN Just1RowByNodeTypeByNode AS L2 ON Nodes.NodeId = L2.NodeId AND L2.NodeTypeId = 2
    LEFT JOIN Just1RowByNodeTypeByNode AS L3 ON Nodes.NodeId = L3.NodeId AND L3.NodeTypeId = 3
    LEFT JOIN Just1RowByNodeTypeByNode AS L4 ON Nodes.NodeId = L4.NodeId AND L4.NodeTypeId = 4
    LEFT JOIN Just1RowByNodeTypeByNode AS L5 ON Nodes.NodeId = L5.NodeId AND L5.NodeTypeId = 5
  • RecursionInputNodes holds the input Node list for the recursion.
  • RecursiveCTE is the set of all the input nodes with their parent relationships, until there are no more. The parent relationship is given through Parent.ChildNodeId = R.NodeId . I also added NodeTypeId and NodeTypeValue because we need to filter them on the next CTE.
  • Just1RowByNodeTypeByNode is used to determine, by each NodeId and NodeTypeId , the wanted value of NodeTypeValue , which is what the caller wants from the function. The NodeTypeId is gonna get filtered (it's the parameter from the original function). This step "mimics" the TOP 1 from the original function.

I'd recommend executing each CTE one by one in order (each with the previous one, as they are referenced) to understand how the last SELECT gets all together.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM