[英]SQL: xml.nodes from cte are very slowly
我有一個包含像這樣的 xml 列的表:
<block>
<blockIn>
<G>1</G>
</blockIn>
.....
<blockIn>
<G>12</G>
</blockIn>
......
</block>
.....
<block>
......
</block>
我需要在<blockIn><G>
之間找到 MAX ,然后總結所有這些 MAX
(sum (Max (<block> …<blockIn> ...<G></G>); Max (<block> …<blockIn> ...<G></G>) ...))
所以,我這樣做了:
WITH ds AS
(
SELECT
fieldXML
FROM
table
WHERE
ID = 1
)
SELECT
(SELECT SUM(node_a.value('max(blockIn/G)' , 'int' ))
FROM ds.fieldXML.nodes('/Block') AS node_refs(node_a)) AS [ArticulNum]
FROM
ds
但它的工作非常緩慢。
如果我使用一個變量,它的工作速度非常快:
DECLARE @xml AS [XML];
SELECT
@xml = fieldXML
FROM
table
WHERE
ID = 1;
SELECT SUM(node_a.value('max(blockIn/G)' , 'INT' ))
FROM @xml.fieldXML.nodes('/Block') AS node_refs(node_a)
我需要做什么才能讓第一個解決方案也能快速運行?
用戶定義函數 (UDF) 會有所幫助,但它必須是正確類型的 UDF,而且,如果性能很重要,那么它必須是內聯函數。 這是原始版本的清理版本(請注意,不需要最終的SUM
):
-- Original
CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML_original](@xml XML)
RETURNS INT
AS
BEGIN
RETURN
(
SELECT node_a.value('max(blockIn/G)' , 'int' )
FROM @xml.nodes('/block') AS node_refs(node_a)
);
END;
GO
這是一個性能更好的改進的標量 UDF。 注意不同的上下文block/blockIn
和text()
節點的使用。
-- Improved scalar UDF:
CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML_V2](@xml XML)
RETURNS INT
AS
BEGIN
RETURN
(
SELECT Mx = MAX(node_a.value('(G/text())[1]','int'))
FROM @xml.nodes('/block/blockIn') AS node_refs(node_a)
);
END;
GO
這會表現得更好,但仍然有一個基本問題:該函數不是內聯的。 讓我們用上面的邏輯來創建一個內聯表值函數(iTVF):
-- INLINE UDF
CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML_itvf](@xml XML)
RETURNS TABLE AS RETURN
SELECT Mx = MAX(node_a.value('(G/text())[1]','int'))
FROM @xml.nodes('/block/blockIn') AS node_refs(node_a);
GO
接下來是用於性能測試的示例 xml 數據生成器。 此代碼將創建一個包含 20K 個隨機 XML 值的表:
IF OBJECT_ID('tempdb..#yourtable') IS NOT NULL DROP TABLE #yourtable;
SELECT TOP (20000)
SomeId = IDENTITY(INT,1,1),
xmldata = CAST(f.X AS XML),
blob = CAST(CAST(f.X AS VARBINARY(MAX)) AS image)
INTO #yourtable
FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS a(X) -- 10
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS b(X) -- 100
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS c(X) -- 1K
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS d(X) -- 10K
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e(X) -- 100K
CROSS JOIN (VALUES (NEWID())) AS n(Id)
CROSS APPLY
(
SELECT TOP(ABS(CHECKSUM(NEWID())%5)+b.X)
G = ABS(CHECKSUM(n.Id)%30)+c.X+ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS a(x)
ORDER BY NEWID()
FOR XML PATH('blockIn'), ROOT('block')
) AS f(x);
接下來進行快速的健全性檢查。 下面的查詢將返回相同的結果:
-- Sanity Check (all 3 return the same results)
SELECT TOP (10) t.SomeId, Mx = dbo.ArticulNumFromXML_original(xmldata)
FROM #yourtable AS t
ORDER BY t.SomeId;
SELECT TOP (10) t.SomeId, Mx = dbo.ArticulNumFromXML_V2(xmldata)
FROM #yourtable AS t
ORDER BY t.SomeId;
SELECT TOP (10) t.SomeId, f.Mx
FROM #yourtable AS t
CROSS APPLY dbo.ArticulNumFromXML_itvf(xmldata) AS f
ORDER BY t.SomeId;
現在我們知道我們得到了正確的結果集,讓我們做幾個性能測試。 我注意到,在您的回答中,您首先要轉換 XML 數據。 這是昂貴的。 在第一個測試中,我們正在做相同類型的轉換:
-- Test #1: Blob data
PRINT CHAR(13)+'Scalar Version (original):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
SELECT @Mx = dbo.ArticulNumFromXML_original(CAST(CAST(t.blob AS VARBINARY(MAX)) AS XML))
FROM #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3
PRINT CHAR(13)+'Scalar Version (V2 - leveraging the text() node):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
SELECT @Mx = dbo.ArticulNumFromXML_V2(CAST(CAST(t.blob AS VARBINARY(MAX)) AS XML))
FROM #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3
PRINT CHAR(13)+'Inline Version:'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
SELECT @Mx = f.Mx
FROM #yourtable AS t
CROSS APPLY dbo.ArticulNumFromXML_itvf(CAST(CAST(t.blob AS VARBINARY(MAX)) AS XML)) AS f;
PRINT DATEDIFF(MS,@st,getdate());
GO 3
結果:
Scalar Version (original):
------------------------------------------------------------------------------------------
Beginning execution loop
4560
4000
4346
Batch execution completed 3 times.
Scalar Version (V2 - leveraging the text() node):
------------------------------------------------------------------------------------------
Beginning execution loop
2503
2840
2796
Batch execution completed 3 times.
Inline Version:
------------------------------------------------------------------------------------------
Beginning execution loop
586
670
630
Batch execution completed 3 times.
如您所見:第一個改進將速度提高了 50% 以上,但是,將函數更改為內聯表值函數使改進后的查詢比原始函數快5-6 倍,幾乎快 10 倍。
現在讓我們跳過代價高昂的 XML 轉換(這可以通過使用計算列或索引視圖的預處理來處理。這是第二個測試:
-- Test #2: No XML Conversion
PRINT CHAR(13)+'Scalar Version (original):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
SELECT @Mx = dbo.ArticulNumFromXML_original(xmldata)
FROM #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3
PRINT CHAR(13)+'Scalar Version (V2 - leveraging the text() node):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
SELECT @Mx = dbo.ArticulNumFromXML_V2(xmldata)
FROM #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3
PRINT CHAR(13)+'Inline Version (No hints - Parallel):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
SELECT @Mx = f.Mx
FROM #yourtable AS t
CROSS APPLY dbo.ArticulNumFromXML_itvf(xmldata) AS f;
PRINT DATEDIFF(MS,@st,getdate());
GO 3
結果:
Scalar Version (original):
------------------------------------------------------------------------------------------
Beginning execution loop
2933
2633
2953
Batch execution completed 3 times.
Scalar Version:
------------------------------------------------------------------------------------------
Beginning execution loop
826
876
970
Batch execution completed 3 times.
Inline Version (No hints - Parallel):
------------------------------------------------------------------------------------------
Beginning execution loop
63
70
63
Batch execution completed 3 times.
該死! 讀取預先轉換的 XML 顯着減少了所有三個的時間,對於現在比原始函數快 40-50 倍的 iTVF 更是如此。
決策,制定功能:
CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML](@xml XML)
RETURNS INT
AS
BEGIN
RETURN (SELECT
SUM(node_a.value('max(blockIn/G)' , 'int' ))
FROM
@xml.nodes('/BLOCK') AS node_refs(node_a)
);
END;
GO
有了它,正常:
SELECT
[dbo].[ArticulNumFromXML](CAST(CAST(blob AS VARBINARY(max)) AS XML))
FROM
table
WHERE
ID = 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.