繁体   English   中英

SQL:来自 cte 的 xml.nodes 非常缓慢

[英]SQL: xml.nodes from cte are very slowly

我有一个包含像这样的 xml 列的表:

<block>
    <blockIn>
        <G>1</G>            
    </blockIn>
    .....
    <blockIn>
        <G>12</G>
    </blockIn>
    ......
</block>
.....
<block>
......
</block>

我需要在<blockIn><G>之间找到 MAX ,然后总结所有这些 MAX

(sum (Max (<block> …<blockIn> ...<G></G>); Max (<block> …<blockIn> ...<G></G>) ...))

所以,我这样做了:

WITH ds AS 
(
    SELECT 
        fieldXML
    FROM 
        table
    WHERE 
        ID = 1
)
SELECT 
    (SELECT SUM(node_a.value('max(blockIn/G)' , 'int' )) 
     FROM ds.fieldXML.nodes('/Block')  AS node_refs(node_a)) AS [ArticulNum]
FROM
    ds

但它的工作非常缓慢。

如果我使用一个变量,它的工作速度非常快:

DECLARE @xml AS [XML];

SELECT 
    @xml = fieldXML
FROM 
    table
WHERE 
    ID = 1;

SELECT SUM(node_a.value('max(blockIn/G)' , 'INT' )) 
FROM @xml.fieldXML.nodes('/Block') AS node_refs(node_a)

我需要做什么才能让第一个解决方案也能快速运行?

用户定义函数 (UDF) 会有所帮助,但它必须是正确类型的 UDF,而且,如果性能很重要,那么它必须是内联函数 这是原始版本的清理版本(请注意,不需要最终的SUM ):

-- Original
CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML_original](@xml XML)  
RETURNS INT
AS  
BEGIN  
RETURN 
(
  SELECT node_a.value('max(blockIn/G)' , 'int' )
  FROM   @xml.nodes('/block') AS node_refs(node_a)
    ); 
END;  
GO

这是一个性能更好的改进的标量 UDF。 注意不同的上下文block/blockIntext()节点的使用。

-- Improved scalar UDF:
CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML_V2](@xml XML)  
RETURNS INT  
AS  
BEGIN  
RETURN 
(
  SELECT Mx = MAX(node_a.value('(G/text())[1]','int'))
  FROM   @xml.nodes('/block/blockIn') AS node_refs(node_a)
    ); 
END;  
GO

这会表现得更好,但仍然有一个基本问题:该函数不是内联的。 让我们用上面的逻辑来创建一个内联表值函数(iTVF):

-- INLINE UDF    
CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML_itvf](@xml XML)  
RETURNS TABLE AS RETURN 
  SELECT Mx = MAX(node_a.value('(G/text())[1]','int'))
  FROM   @xml.nodes('/block/blockIn') AS node_refs(node_a);
GO

接下来是用于性能测试的示例 xml 数据生成器。 此代码将创建一个包含 20K 个随机 XML 值的表:

IF OBJECT_ID('tempdb..#yourtable') IS NOT NULL DROP TABLE #yourtable;
SELECT TOP (20000) 
  SomeId  = IDENTITY(INT,1,1),
  xmldata = CAST(f.X AS XML),
  blob    = CAST(CAST(f.X AS VARBINARY(MAX)) AS image) 
INTO #yourtable
FROM       (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS a(X) -- 10
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS b(X) -- 100
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS c(X) -- 1K
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS d(X) -- 10K
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e(X) -- 100K
CROSS JOIN (VALUES (NEWID())) AS n(Id)
CROSS APPLY
(
  SELECT TOP(ABS(CHECKSUM(NEWID())%5)+b.X) 
     G = ABS(CHECKSUM(n.Id)%30)+c.X+ROW_NUMBER() OVER (ORDER BY (SELECT 1))
  FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS a(x)
  ORDER BY NEWID()
  FOR XML PATH('blockIn'), ROOT('block')
) AS f(x);

接下来进行快速的健全性检查。 下面的查询将返回相同的结果:

-- Sanity Check (all 3 return the same results)
SELECT TOP (10) t.SomeId, Mx = dbo.ArticulNumFromXML_original(xmldata)
FROM     #yourtable AS t
ORDER BY t.SomeId;

SELECT TOP (10) t.SomeId, Mx = dbo.ArticulNumFromXML_V2(xmldata)
FROM   #yourtable AS t
ORDER BY t.SomeId;

SELECT TOP (10) t.SomeId, f.Mx
FROM        #yourtable AS t
CROSS APPLY dbo.ArticulNumFromXML_itvf(xmldata) AS f
ORDER BY t.SomeId;

现在我们知道我们得到了正确的结果集,让我们做几个性能测试。 我注意到,在您的回答中,您首先要转换 XML 数据。 这是昂贵的。 在第一个测试中,我们正在做相同类型的转换:

-- Test #1: Blob data
PRINT CHAR(13)+'Scalar Version (original):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
  SELECT @Mx = dbo.ArticulNumFromXML_original(CAST(CAST(t.blob AS VARBINARY(MAX)) AS XML))
  FROM   #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3

PRINT CHAR(13)+'Scalar Version (V2 - leveraging the text() node):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
  SELECT @Mx = dbo.ArticulNumFromXML_V2(CAST(CAST(t.blob AS VARBINARY(MAX)) AS XML))
  FROM   #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3

PRINT CHAR(13)+'Inline Version:'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
  SELECT      @Mx = f.Mx
  FROM        #yourtable AS t
  CROSS APPLY dbo.ArticulNumFromXML_itvf(CAST(CAST(t.blob AS VARBINARY(MAX)) AS XML)) AS f;
PRINT DATEDIFF(MS,@st,getdate());
GO 3

结果:

Scalar Version (original):
------------------------------------------------------------------------------------------
Beginning execution loop
4560
4000
4346
Batch execution completed 3 times.

Scalar Version (V2 - leveraging the text() node):
------------------------------------------------------------------------------------------
Beginning execution loop
2503
2840
2796
Batch execution completed 3 times.

Inline Version:
------------------------------------------------------------------------------------------
Beginning execution loop
586
670
630
Batch execution completed 3 times.

如您所见:第一个改进将速度提高了 50% 以上,但是,将函数更改为内联表值函数使改进后的查询比原始函数快5-6 倍,几乎快 10 倍。

现在让我们跳过代价高昂的 XML 转换(这可以通过使用计算列或索引视图的预处理来处理。这是第二个测试:

-- Test #2: No XML Conversion
PRINT CHAR(13)+'Scalar Version (original):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
  SELECT @Mx = dbo.ArticulNumFromXML_original(xmldata)
  FROM   #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3

PRINT CHAR(13)+'Scalar Version (V2 - leveraging the text() node):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
  SELECT @Mx = dbo.ArticulNumFromXML_V2(xmldata)
  FROM   #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3

PRINT CHAR(13)+'Inline Version (No hints - Parallel):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
  SELECT      @Mx = f.Mx
  FROM        #yourtable AS t
  CROSS APPLY dbo.ArticulNumFromXML_itvf(xmldata) AS f;
PRINT DATEDIFF(MS,@st,getdate());
GO 3

结果:

Scalar Version (original):
------------------------------------------------------------------------------------------
Beginning execution loop
2933
2633
2953
Batch execution completed 3 times.

Scalar Version:
------------------------------------------------------------------------------------------
Beginning execution loop
826
876
970
Batch execution completed 3 times.

Inline Version (No hints - Parallel):
------------------------------------------------------------------------------------------
Beginning execution loop
63
70
63
Batch execution completed 3 times.

该死! 读取预先转换的 XML 显着减少了所有三个的时间,对于现在比原始函数快 40-50 倍的 iTVF 更是如此。

决策,制定功能:

CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML](@xml XML)  
RETURNS INT  
AS  
BEGIN  
RETURN (SELECT 
      SUM(node_a.value('max(blockIn/G)' , 'int' )) 
    FROM 
      @xml.nodes('/BLOCK') AS node_refs(node_a)
    ); 
END;  
GO

有了它,正常:

SELECT 
  [dbo].[ArticulNumFromXML](CAST(CAST(blob AS VARBINARY(max)) AS XML)) 
FROM 
  table
WHERE 
  ID = 1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM