简体   繁体   English

SQL Server将分层CTE函数重写为常规选择

[英]SQL Server Rewrite Hierarchical CTE Function to a regular Select

I've been tasked with migrating a script that traverses a hierarchy and expands it. 我的任务是迁移遍历层次结构并扩展它的脚本。 Firstly the script is running extremely slow and secondly we are moving into a far more controlled server so I need to eliminate functions. 首先,脚本运行速度非常慢,其次我们正在进入一个更加受控制的服务器,因此我需要消除功能。 I was wondering if someone could perhaps assist in integrating what the function is doing in the second statement and calling the entire script inside the selection statement of the first script. 我想知道是否有人可以协助在第二个语句中集成函数正在执行的操作,并在第一个脚本的选择语句中调用整个脚本。

I understand that split between the two may be far better performance wise however this is the only function that exists and the only select statement that is using it so I would much rather prefer to integrate the two rather than go through the process of getting it approved and added. 我理解两者之间的分离可能会更好地表现,但是这是唯一存在的功能和使用它的唯一选择语句所以我更愿意整合两者而不是通过获得批准的过程并补充说。 Secondly, if anyone could see a more optimal way to achieve this it would be great and I am open to suggestions, keeping in mind this goes about 11 levels deep. 其次,如果有人能够看到一种更优化的方式来实现这一目标,那将是很好的,我愿意接受建议,记住这大约有11个级别。

The first part of the script is the select statement where the function is called and obviously returned to a table: 脚本的第一部分是select语句,其中函数被调用并显然返回到表:

DECLARE @RootNode INT = 1
DECLARE @Level1 INT = 2
DECLARE @Level2 INT = 3
DECLARE @Level3 INT = 4
DECLARE @Level4 INT = 5


TRUNCATE TABLE [...].[Hierarchy]
--
INSERT INTO [...].[Hierarchy]
SELECT Nodes.NodeId, 
       NodeTypeValues.Value AS HierarchyValue, 
       (select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @RootNode)) AS RootLevel,
       (select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level1)) AS Level1,
       (select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level2)) AS Level2,
       (select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level3)) AS Level3,
       (select NodeTypeValue from [...].[Function_GetTheParentNodesForTheSelectedNodeType] (abc.NodeId, @Level4)) AS Level4
       --Level 5...
       --Level 6...
       --Level 7...
  FROM [...].[Nodes] Nodes
       INNER JOIN [...].NodeTypes NodeTypes ON NodeTypes.NodeTypeId = Nodes.NodeTypeId
       INNER JOIN [...].NodeTypeValues NodeTypeValues ON NodeTypeValues.NodeTypeValueId = Nodes.NodeTypeValueId
WHERE NodeTypes.HierarchyTypeId = 1

The second part is the actual function that is being called, the function is meant to traverse and return a tabled result back to the main query for storage: 第二部分是被调用的实际函数,该函数用于遍历并将表结果返回给主查询进行存储:

FUNCTION [...].[Function_GetTheParentNodesForTheSelectedNodeType]

    ( @NodeId int,
      @NodeTypeId int
    )
    RETURNS 
      @ReturnData TABLE 
    (
      NodeTypeValue NVARCHAR(100),
      NodeId INT
    )

AS
BEGIN

    WITH NodeSubTreesUpwards AS 
    (
       SELECT SubRootNode.NodeId AS SubRootNodeId, 
              SubRootNode.*,
              NULL AS ChildNodeId, 
              0 AS HierarchyLevel
        FROM [...].[Nodes] AS SubRootNode
        WHERE SubRootNode.NodeId = @NodeId

      UNION ALL

       SELECT NodeSubTreesUpwards.SubRootNodeId, 
              ParentNode.*,
              Parent.ChildNodeId, (NodeSubTreesUpwards.HierarchyLevel) - 1 AS HierarchyLevel
        FROM NodeSubTreesUpwards
        INNER JOIN [...].[ParentChildNodes] AS Parent ON Parent.ChildNodeId = NodeSubTreesUpwards.NodeId
        INNER JOIN [...].[Nodes] AS ParentNode ON ParentNode.NodeId = Parent.ParentNodeId
    )

    INSERT INTO @ReturnData
    SELECT TOP 1 NodeTypeValues.Value,  NodeSubTreesUpwards.NodeId
          FROM NodeSubTreesUpwards NodeSubTreesUpwards
                   INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = n.NodeTypeId
                   INNER JOIN [...].[NodeTypeValues] NodeTypeValues ON NodeTypeValues.NodeTypeValueId = n.NodeTypeValueId
     WHERE NodeType.NodeTypeId = @NodeTypeId

   RETURN 

I have really attempted to split this out but been struggling to do so, I'm most likely missing something stupid or its purely just not understanding the process of creating a hierarchy, I've sat on this for a day or two now. 我真的试图把它分开,但一直在努力这样做,我很可能错过了一些愚蠢的东西,或者纯粹只是不理解创建层次结构的过程,我现在已经坐了一两天了。 I would be more than happy to use the same function just without calling it and rather doing it in the main select statement in place of the function being called but not sure if due to the recursion this will be an issue? 我很乐意在不调用它的情况下使用相同的函数,而是在主select语句中代替被调用的函数,但不确定是否由于递归这将是一个问题?

Try to use an inline table-valued function (ITVF) as they have better execution plans. 尝试使用内联表值函数(ITVF),因为它们具有更好的执行计划。 There is a great article at MSDN about query performance issues of multi-statement table valued functions: MSDN上一篇关于多语句表值函数的查询性能问题的文章

  1. Multi-statement TVF, in general, gives a very low cardinality estimate. 通常,多语句TVF给出非常低的基数估计。
  2. if you use multi-statement TVF, it's treated as just like another table. 如果你使用多语句TVF,它就像另一个表一样对待。 Because there are no statistics available, SQL Server has to make some assumptions and in general provide a low estimate. 由于没有可用的统计信息,SQL Server必须做出一些假设,并且通常会提供较低的估计值。 If your TVF returns only a few rows, it will be fine. 如果您的TVF只返回几行,那就没问题了。 But if you intend to populate the TVF with thousands of rows and if this TVF is joined with other tables, the inefficient plan can result from low cardinality estimate. 但是如果你打算用数千行填充TVF,并且如果这个TVF与其他表连接,那么效率低的计划可能是由于低基数估计造成的。

So just make two inline table functions from your multiline statement function Function_GetTheParentNodesForTheSelectedNodeType : 因此,只需从多行语句函数Function_GetTheParentNodesForTheSelectedNodeType创建两个内联表函数:

CREATE FUNCTION dbo.ufn_NodeSubTreesUpwards
     ( @NodeId int )
RETURNS table
AS
RETURN (
        SELECT SubRootNode.NodeId AS SubRootNodeId, 
              SubRootNode.*,
              NULL AS ChildNodeId, 
              0 AS HierarchyLevel
        FROM [...].[Nodes] AS SubRootNode
        WHERE SubRootNode.NodeId = @NodeId

      UNION ALL

       SELECT NodeSubTreesUpwards.SubRootNodeId, 
              ParentNode.*,
              Parent.ChildNodeId, (NodeSubTreesUpwards.HierarchyLevel) - 1 AS HierarchyLevel
        FROM NodeSubTreesUpwards
        INNER JOIN [...].[ParentChildNodes] AS Parent 
            ON Parent.ChildNodeId = NodeSubTreesUpwards.NodeId
        INNER JOIN [...].[Nodes] AS ParentNode ON ParentNode.NodeId = Parent.ParentNodeId
       )

and another function which will be used in your INSERT query: 以及将在INSERT查询中使用的另一个函数:

CREATE FUNCTION dbo.ufn_GetTheParentNodesForTheSelectedNodeType
     ( @NodeId int,
       @NodeTypeId int )
RETURNS table
AS
RETURN (
    SELECT 
     TOP 1 
     NodeTypeValues.Value
    , NodeSubTreesUpwards.NodeId
    FROM ufn_NodeSubTreesUpwards(@NodeId) NodeSubTreesUpwards
    INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = n.NodeTypeId
    INNER JOIN [...].[NodeTypeValues] NodeTypeValues 
        ON NodeTypeValues.NodeTypeValueId = n.NodeTypeValueId
        WHERE NodeType.NodeTypeId = @NodeTypeId
       )

UPDATE - an example of using recursive cte in an inline table functions: UPDATE - 在内联表函数中使用递归cte的示例:

create function SequenceList ( @variable int )
returns table
as
return (
with cte as
(
select id = 1
union all
select id = cte.id+1
from cte
where id < @variable
)
select id from cte
--option ( maxrecursion 0 )
)

SELECT * FROM dbo.SequenceList(5)

The whole script is in fact very poorly written performance-wise. 事实上,整个剧本的表现非常糟糕。 Each function call generates all parent relationships from a particular node but only returns 1 row corresponding to the node type filter (it uses a TOP 1 and doesn't have an ORDER BY , so they are assuming that the variable filter with produce the wanted row). 每个函数调用都会从特定节点生成所有父关系,但只返回与节点类型过滤器对应的1行(它使用TOP 1并且没有ORDER BY ,因此他们假设变量过滤器生成所需行)。

The script that does the insert is just "pivoting" the parent levels of a node, this is why there are N calls to the function, each to retrieve a higher level. 执行插入的脚本只是“旋转”节点的父级,这就是为什么有N个函数调用,每个调用更高级别。

I mixed the first SELECT (without the INSERT nor the variables) with the implementation of the function to work massively and in 1 go for all the appropriate records, in the following SQL. 我将第一个SELECT (没有INSERT和变量)与函数的实现混合在一起,并在下面的SQL中用1表示所有相应的记录。 A brief description of each CTE is below. 每个CTE的简要说明如下。

For any further corrections I'll need a full replicable DML + DDL, I did what I could without having the proper schema. 对于任何进一步的更正,我需要一个完全可复制的DML + DDL,我没有正确的架构就做了我能做的事。

;WITH RecursionInputNodes AS
(
    SELECT DISTINCT
        Nodes.NodeId
    FROM 
        [...].[Nodes] Nodes
        INNER JOIN [...].NodeTypes NodeTypes ON NodeTypes.NodeTypeId = Nodes.NodeTypeId
        INNER JOIN [...].NodeTypeValues NodeTypeValues ON NodeTypeValues.NodeTypeValueId = Nodes.NodeTypeValueId
    WHERE 
        NodeTypes.HierarchyTypeId = 1
),
RecursiveCTE AS
(
    -- CTE Anchor: Start with all input nodes at lvl 0
    SELECT 
        SubRootNode.NodeId AS NodeId, 
        NULL AS ChildNodeId,
        0 AS HierarchyLevel,
        SubRootNode.NodeTypeId AS NodeTypeId,
        NodeTypeValues.Value AS NodeTypeValue
    FROM
        RecursionInputNodes AS RI
        INNER JOIN [...].[Nodes] AS SubRootNode ON RI.NodeID = RI.NodeId
        INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = RI.NodeTypeId
        INNER JOIN [...].[NodeTypeValues] NodeTypeValues ON NodeTypeValues.NodeTypeValueId = RI.NodeTypeValueId

    UNION ALL

    -- CTE Recursion: Add each node's parent and decrease lvl by 1 each time
    SELECT 
        R.NodeId,
        Parent.ChildNodeId,
        R.HierarchyLevel - 1 AS HierarchyLevel,
        ParentNode.NodeTypeId AS NodeTypeId,
        NodeTypeValues.Value AS NodeTypeValue
    FROM 
        RecursiveCTE AS R
        INNER JOIN [...].[ParentChildNodes] AS Parent ON Parent.ChildNodeId = R.NodeId
        INNER JOIN [...].[Nodes] AS ParentNode ON ParentNode.NodeId = Parent.ParentNodeId
        INNER JOIN [...].[NodeTypes] NodeType ON NodeType.NodeTypeId = ParentNode.NodeTypeId
        INNER JOIN [...].[NodeTypeValues] NodeTypeValues ON NodeTypeValues.NodeTypeValueId = ParentNode.NodeTypeValueId
),
Just1RowByNodeTypeByNode AS
(
    SELECT
        R.NodeId,
        R.NodeTypeId,
        NodeTypeValue = MAX(R.NodeTypeValue) -- I'm "imitating" the TOP 1 from the function here
    FROM
        RecursiveCTE AS R
    GROUP BY
        R.NodeId,
        R.NodeTypeId
)
SELECT 
    Nodes.NodeId, 
    NodeTypeValues.Value AS HierarchyValue,
    L1.NodeTypeValue AS RootLevel,
    L2.NodeTypeValue AS Level1, -- Note that the alias Level 1 here actually corresponds to the value 2 for NodeTypeId
    L3.NodeTypeValue AS Level2,
    L4.NodeTypeValue AS Level3,
    L5.NodeTypeValue AS Level4
    --Level 5...
    --Level 6...
    --Level 7...
FROM 
    RecursionInputNodes Nodes
    INNER JOIN [...].NodeTypes NodeTypes ON NodeTypes.NodeTypeId = Nodes.NodeTypeId
    INNER JOIN [...].NodeTypeValues NodeTypeValues ON NodeTypeValues.NodeTypeValueId = Nodes.NodeTypeValueId

    LEFT JOIN Just1RowByNodeTypeByNode AS L1 ON Nodes.NodeId = L1.NodeId AND L1.NodeTypeId = 1
    LEFT JOIN Just1RowByNodeTypeByNode AS L2 ON Nodes.NodeId = L2.NodeId AND L2.NodeTypeId = 2
    LEFT JOIN Just1RowByNodeTypeByNode AS L3 ON Nodes.NodeId = L3.NodeId AND L3.NodeTypeId = 3
    LEFT JOIN Just1RowByNodeTypeByNode AS L4 ON Nodes.NodeId = L4.NodeId AND L4.NodeTypeId = 4
    LEFT JOIN Just1RowByNodeTypeByNode AS L5 ON Nodes.NodeId = L5.NodeId AND L5.NodeTypeId = 5
  • RecursionInputNodes holds the input Node list for the recursion. RecursionInputNodes保存RecursionInputNodes的输入节点列表。
  • RecursiveCTE is the set of all the input nodes with their parent relationships, until there are no more. RecursiveCTE是具有父关系的所有输入节点的集合,直到不再存在。 The parent relationship is given through Parent.ChildNodeId = R.NodeId . 父关系通过Parent.ChildNodeId = R.NodeId I also added NodeTypeId and NodeTypeValue because we need to filter them on the next CTE. 我还添加了NodeTypeIdNodeTypeValue因为我们需要在下一个CTE上过滤它们。
  • Just1RowByNodeTypeByNode is used to determine, by each NodeId and NodeTypeId , the wanted value of NodeTypeValue , which is what the caller wants from the function. Just1RowByNodeTypeByNode来确定,每个NodeIdNodeTypeId的通缉值NodeTypeValue ,这是主叫方从功能所需要的。 The NodeTypeId is gonna get filtered (it's the parameter from the original function). NodeTypeId将被过滤(它是原始函数的参数)。 This step "mimics" the TOP 1 from the original function. 此步骤“模仿”原始功能的TOP 1

I'd recommend executing each CTE one by one in order (each with the previous one, as they are referenced) to understand how the last SELECT gets all together. 我建议按顺序逐个执行每个CTE(每个CTE都有前一个,因为它们被引用)以了解最后一个SELECT如何一起使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM