简体   繁体   English

SQL:来自 cte 的 xml.nodes 非常缓慢

[英]SQL: xml.nodes from cte are very slowly

I have table with a column that contains xml like this:我有一个包含像这样的 xml 列的表:

<block>
    <blockIn>
        <G>1</G>            
    </blockIn>
    .....
    <blockIn>
        <G>12</G>
    </blockIn>
    ......
</block>
.....
<block>
......
</block>

I need find MAX between <blockIn><G> in each , and then summarize all this MAX我需要在<blockIn><G>之间找到 MAX ,然后总结所有这些 MAX

(sum (Max (<block> …<blockIn> ...<G></G>); Max (<block> …<blockIn> ...<G></G>) ...))

So, I did this:所以,我这样做了:

WITH ds AS 
(
    SELECT 
        fieldXML
    FROM 
        table
    WHERE 
        ID = 1
)
SELECT 
    (SELECT SUM(node_a.value('max(blockIn/G)' , 'int' )) 
     FROM ds.fieldXML.nodes('/Block')  AS node_refs(node_a)) AS [ArticulNum]
FROM
    ds

But it works very slowly.但它的工作非常缓慢。

If I use a variable, it works very fast:如果我使用一个变量,它的工作速度非常快:

DECLARE @xml AS [XML];

SELECT 
    @xml = fieldXML
FROM 
    table
WHERE 
    ID = 1;

SELECT SUM(node_a.value('max(blockIn/G)' , 'INT' )) 
FROM @xml.fieldXML.nodes('/Block') AS node_refs(node_a)

What do I need to do so that the first solution works fast, too?我需要做什么才能让第一个解决方案也能快速运行?

A User Defined Function (UDF) will help but it needs to be the right kind of UDF, and, if performance is important, then it must be an Inline Function .用户定义函数 (UDF) 会有所帮助,但它必须是正确类型的 UDF,而且,如果性能很重要,那么它必须是内联函数 Here's a cleaned up version of your original (note that the final SUM is not required):这是原始版本的清理版本(请注意,不需要最终的SUM ):

-- Original
CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML_original](@xml XML)  
RETURNS INT
AS  
BEGIN  
RETURN 
(
  SELECT node_a.value('max(blockIn/G)' , 'int' )
  FROM   @xml.nodes('/block') AS node_refs(node_a)
    ); 
END;  
GO

Here's an improved scalar UDF that will perform better.这是一个性能更好的改进的标量 UDF。 Note the different context block/blockIn and the use of the text() node.注意不同的上下文block/blockIntext()节点的使用。

-- Improved scalar UDF:
CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML_V2](@xml XML)  
RETURNS INT  
AS  
BEGIN  
RETURN 
(
  SELECT Mx = MAX(node_a.value('(G/text())[1]','int'))
  FROM   @xml.nodes('/block/blockIn') AS node_refs(node_a)
    ); 
END;  
GO

This will perform much better but still has a fundamental problem: the function is not inline.这会表现得更好,但仍然有一个基本问题:该函数不是内联的。 Let's take the logic above to create an inline table valued function (iTVF):让我们用上面的逻辑来创建一个内联表值函数(iTVF):

-- INLINE UDF    
CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML_itvf](@xml XML)  
RETURNS TABLE AS RETURN 
  SELECT Mx = MAX(node_a.value('(G/text())[1]','int'))
  FROM   @xml.nodes('/block/blockIn') AS node_refs(node_a);
GO

Next for a sample xml data generator for performance testing.接下来是用于性能测试的示例 xml 数据生成器。 This code will create a table with 20K random XML values:此代码将创建一个包含 20K 个随机 XML 值的表:

IF OBJECT_ID('tempdb..#yourtable') IS NOT NULL DROP TABLE #yourtable;
SELECT TOP (20000) 
  SomeId  = IDENTITY(INT,1,1),
  xmldata = CAST(f.X AS XML),
  blob    = CAST(CAST(f.X AS VARBINARY(MAX)) AS image) 
INTO #yourtable
FROM       (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS a(X) -- 10
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS b(X) -- 100
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS c(X) -- 1K
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS d(X) -- 10K
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e(X) -- 100K
CROSS JOIN (VALUES (NEWID())) AS n(Id)
CROSS APPLY
(
  SELECT TOP(ABS(CHECKSUM(NEWID())%5)+b.X) 
     G = ABS(CHECKSUM(n.Id)%30)+c.X+ROW_NUMBER() OVER (ORDER BY (SELECT 1))
  FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS a(x)
  ORDER BY NEWID()
  FOR XML PATH('blockIn'), ROOT('block')
) AS f(x);

Next for a quick sanity check.接下来进行快速的健全性检查。 The queries below will return the same results:下面的查询将返回相同的结果:

-- Sanity Check (all 3 return the same results)
SELECT TOP (10) t.SomeId, Mx = dbo.ArticulNumFromXML_original(xmldata)
FROM     #yourtable AS t
ORDER BY t.SomeId;

SELECT TOP (10) t.SomeId, Mx = dbo.ArticulNumFromXML_V2(xmldata)
FROM   #yourtable AS t
ORDER BY t.SomeId;

SELECT TOP (10) t.SomeId, f.Mx
FROM        #yourtable AS t
CROSS APPLY dbo.ArticulNumFromXML_itvf(xmldata) AS f
ORDER BY t.SomeId;

Now that we know we're getting the right result set let's do a couple performance tests.现在我们知道我们得到了正确的结果集,让我们做几个性能测试。 I noticed that, in your answer, you're converting the XML data first.我注意到,在您的回答中,您首先要转换 XML 数据。 This is expensive.这是昂贵的。 In this first test I'm we're doing the same type of conversion:在第一个测试中,我们正在做相同类型的转换:

-- Test #1: Blob data
PRINT CHAR(13)+'Scalar Version (original):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
  SELECT @Mx = dbo.ArticulNumFromXML_original(CAST(CAST(t.blob AS VARBINARY(MAX)) AS XML))
  FROM   #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3

PRINT CHAR(13)+'Scalar Version (V2 - leveraging the text() node):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
  SELECT @Mx = dbo.ArticulNumFromXML_V2(CAST(CAST(t.blob AS VARBINARY(MAX)) AS XML))
  FROM   #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3

PRINT CHAR(13)+'Inline Version:'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
  SELECT      @Mx = f.Mx
  FROM        #yourtable AS t
  CROSS APPLY dbo.ArticulNumFromXML_itvf(CAST(CAST(t.blob AS VARBINARY(MAX)) AS XML)) AS f;
PRINT DATEDIFF(MS,@st,getdate());
GO 3

Results:结果:

Scalar Version (original):
------------------------------------------------------------------------------------------
Beginning execution loop
4560
4000
4346
Batch execution completed 3 times.

Scalar Version (V2 - leveraging the text() node):
------------------------------------------------------------------------------------------
Beginning execution loop
2503
2840
2796
Batch execution completed 3 times.

Inline Version:
------------------------------------------------------------------------------------------
Beginning execution loop
586
670
630
Batch execution completed 3 times.

As you can see: The first improvement sped things up better than 50% but, changing the function to an Inline Table Value Function made the improved query 5-6 times faster and almost 10 times faster than your original function.如您所见:第一个改进将速度提高了 50% 以上,但是,将函数更改为内联表值函数使改进后的查询比原始函数快5-6 倍,几乎快 10 倍。

Now let's skip the costly XML conversion (this can be handled via pre-processing using a computed column or indexed view. Here's the second test:现在让我们跳过代价高昂的 XML 转换(这可以通过使用计算列或索引视图的预处理来处理。这是第二个测试:

-- Test #2: No XML Conversion
PRINT CHAR(13)+'Scalar Version (original):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
  SELECT @Mx = dbo.ArticulNumFromXML_original(xmldata)
  FROM   #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3

PRINT CHAR(13)+'Scalar Version (V2 - leveraging the text() node):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
  SELECT @Mx = dbo.ArticulNumFromXML_V2(xmldata)
  FROM   #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3

PRINT CHAR(13)+'Inline Version (No hints - Parallel):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
  SELECT      @Mx = f.Mx
  FROM        #yourtable AS t
  CROSS APPLY dbo.ArticulNumFromXML_itvf(xmldata) AS f;
PRINT DATEDIFF(MS,@st,getdate());
GO 3

Results:结果:

Scalar Version (original):
------------------------------------------------------------------------------------------
Beginning execution loop
2933
2633
2953
Batch execution completed 3 times.

Scalar Version:
------------------------------------------------------------------------------------------
Beginning execution loop
826
876
970
Batch execution completed 3 times.

Inline Version (No hints - Parallel):
------------------------------------------------------------------------------------------
Beginning execution loop
63
70
63
Batch execution completed 3 times.

Blam!该死! reading pre-converted XML reduced the time of all three dramatically, more so for the iTVF which is now 40-50 times faster than your original function.读取预先转换的 XML 显着减少了所有三个的时间,对于现在比原始函数快 40-50 倍的 iTVF 更是如此。

decision, made function:决策,制定功能:

CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML](@xml XML)  
RETURNS INT  
AS  
BEGIN  
RETURN (SELECT 
      SUM(node_a.value('max(blockIn/G)' , 'int' )) 
    FROM 
      @xml.nodes('/BLOCK') AS node_refs(node_a)
    ); 
END;  
GO

and with it, normal:有了它,正常:

SELECT 
  [dbo].[ArticulNumFromXML](CAST(CAST(blob AS VARBINARY(max)) AS XML)) 
FROM 
  table
WHERE 
  ID = 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM