提高SQL Server中的Xquery性能

Question

I have an Azure SQL Database with 1 table and a lot of records (more than 75 000). 我有一个Azure SQL数据库，有1个表和很多记录（超过75 000）。 The table contains a column of the XML datatype. 该表包含XML数据类型的列。 This column looks like this: 此列如下所示：

<error application="application" host="host" type="exception" message="message" ...>
  <serverVariables>
    <item name="name1">
      <value string="text" />
    </item>
    <item name="name2">
      <value string="text2" />
    </item>
    <item name="name3">
      <value string="text3" />
    </item>
    <item name="name4">
      <value string="text4" />
    </item>
    <item name="name5">
      <value string="text5" />
    </item>
    <item name="name6">
      <value string="text6" />
    </item>
    <item name="name7">
      <value string="text7" />
    </item>
  </serverVariables>
</error>

If I want to get all records where the item attribute name is name5 and the value attribute string is text5 I would write a query like this: 如果我想得到所有记录，其中item属性名称是name5 ，值属性字符串是text5我会写一个这样的查询：

SELECT *
FROM Table
WHERE XmlColumn.exist('//item[@name[. = "name5"] and value/@string[. = "text5"]]') = 1

This uses an XQuery and has to query the whole document. 这使用XQuery并且必须查询整个文档。 This is also very slow. 这也很慢。

My question is how would it be possible to make this query execute faster? 我的问题是如何使这个查询执行得更快？ Would it be possible to declare a XML index on that column? 是否可以在该列上声明XML索引？ Are there other possibilities to make XQueries execute faster? 还有其他可能使XQueries执行得更快吗？

Answer 1

I just did a little test. 我刚做了一点测试。 With .nodes() you could get some 3%... Not really much actually. 使用.nodes()你可以得到3％......实际上并不是真的。 In my test machine (just a simple laptop) I got a result out of 100.000 rows within ~5 seconds. 在我的测试机器（只是一台简单的笔记本电脑）中，我在~5秒内得到了100.000行的结果。 Not that bad in fact... If you want it fast, you'll have to get the search values out of the XML or you use an XML index: 事实并非那么糟糕......如果你想要快速，你必须从XML中获取搜索值或使用XML索引：

Test scenario 测试场景

First I create a test table and fill it with 100.000 rows. 首先，我创建一个测试表，并用100.000行填充它。 A random number (0 to 1000) should lead to ~100 rows for each random number. 随机数（0到1000）应该导致每个随机数约100行。 This number is put into a varchar col and as a value into your XML. 此数字将放入varchar col 并作为XML的值。

Then I do a call like you'd need it with .exist() and with .nodes() with a small advantage for the second, but both take 5 to 6 seconds. 然后我打电话就像你需要它.exist()和.nodes()以及第二个小优势，但都需要5到6秒。 In fact I do the calls twice: a second time in swapped order and with slightly changed search params and with "//item" instead of the full path to avoid false positives via cached results or plans. 事实上，我做了两次调用：第二次以交换顺序和略微改变的搜索参数和“// item”而不是完整路径，以避免通过缓存结果或计划的误报。

Then I create an XML index and do the same calls 然后我创建一个XML索引并执行相同的调用

Now - what really did surprise me! 现在-什么真的让我感到吃惊！ - the .nodes with full path is much slower than before (9 secs) but the .exist() is down to half a second, with full path even down to about 0.10 sec -所述.nodes用完整路径是比以前（9秒）慢得多，但.exist()下降到半秒，用全路径甚至下降到约0.10秒

So my advise: Use an index and do it with `.exist()` 所以我建议：使用索引并使用`.exist()`

And here's the code for copy'n'paste and self testing 这里是copy'n'paste和自我测试的代码

CREATE TABLE #testTbl(ID INT IDENTITY PRIMARY KEY, SomeData VARCHAR(100),XmlColumn XML);
GO

DECLARE @RndNumber VARCHAR(100)=(SELECT CAST(CAST(RAND()*1000 AS INT) AS VARCHAR(100)));

INSERT INTO #testTbl VALUES('Data_' + @RndNumber,
'<error application="application" host="host" type="exception" message="message" >
  <serverVariables>
    <item name="name1">
      <value string="text" />
    </item>
    <item name="name2">
      <value string="text2" />
    </item>
    <item name="name3">
      <value string="text3" />
    </item>
    <item name="name4">
      <value string="text4" />
    </item>
    <item name="name5">
      <value string="My test ' +  @RndNumber + '" />
    </item>
    <item name="name6">
      <value string="text6" />
    </item>
    <item name="name7">
      <value string="text7" />
    </item>
  </serverVariables>
</error>');

GO 100000

DECLARE @d DATETIME=GETDATE()
SELECT #testTbl.*
FROM #testTbl
CROSS APPLY XmlColumn.nodes('/error/serverVariables/item[@name="name5" and value/@string="My test 600"]') AS a(b);
SELECT CAST(GETDATE()-@d AS TIME) AS NodesFullPath_no_index;
GO

DECLARE @d DATETIME=GETDATE();
SELECT * 
FROM #testTbl
--WHERE XmlColumn.exist('//item[@name[. = "name5"] and value/@string[. = "My test 600"]]') = 1
--The same, just a bit shorter...
WHERE XmlColumn.exist('/error/serverVariables/item[@name="name5" and value/@string="My test 600"]') = 1;
SELECT CAST(GETDATE()-@d AS TIME) AS ExistFullPath_no_index;
GO

DECLARE @d DATETIME=GETDATE();
SELECT * 
FROM #testTbl
--WHERE XmlColumn.exist('//item[@name[. = "name5"] and value/@string[. = "My test 600"]]') = 1
--The same, just a bit shorter...
WHERE XmlColumn.exist('//item[@name="name5" and value/@string="My test 500"]') = 1;
SELECT CAST(GETDATE()-@d AS TIME) AS ExistShortPath_no_index;
GO

DECLARE @d DATETIME=GETDATE()
SELECT #testTbl.*
FROM #testTbl
CROSS APPLY XmlColumn.nodes('//item[@name="name5" and value/@string="My test 500"]') AS a(b);
SELECT CAST(GETDATE()-@d AS TIME) AS NodesShortPath_no_index;
GO

CREATE PRIMARY XML INDEX PXML_test_XmlColum1 ON #testTbl(XmlColumn);
CREATE XML INDEX IXML_test_XmlColumn2 ON #testTbl(XmlColumn) USING XML INDEX PXML_test_XmlColum1 FOR PATH;
GO

DECLARE @d DATETIME=GETDATE()
SELECT #testTbl.*
FROM #testTbl
CROSS APPLY XmlColumn.nodes('/error/serverVariables/item[@name="name5" and value/@string="My test 600"]') AS a(b);
SELECT CAST(GETDATE()-@d AS TIME) AS NodesFullPath_with_index;
GO

DECLARE @d DATETIME=GETDATE();
SELECT * 
FROM #testTbl
--WHERE XmlColumn.exist('//item[@name[. = "name5"] and value/@string[. = "My test 600"]]') = 1
--The same, just a bit shorter...
WHERE XmlColumn.exist('/error/serverVariables/item[@name="name5" and value/@string="My test 600"]') = 1;
SELECT CAST(GETDATE()-@d AS TIME) AS ExistFullPath_with_index;
GO

DECLARE @d DATETIME=GETDATE();
SELECT * 
FROM #testTbl
--WHERE XmlColumn.exist('//item[@name[. = "name5"] and value/@string[. = "My test 600"]]') = 1
--The same, just a bit shorter...
WHERE XmlColumn.exist('//item[@name="name5" and value/@string="My test 500"]') = 1;
SELECT CAST(GETDATE()-@d AS TIME) AS ExistShortPath_with_index;
GO

DECLARE @d DATETIME=GETDATE()
SELECT #testTbl.*
FROM #testTbl
CROSS APPLY XmlColumn.nodes('//item[@name="name5" and value/@string="My test 500"]') AS a(b);
SELECT CAST(GETDATE()-@d AS TIME) AS NodesShortPath_with_index;
GO

DROP TABLE #testTbl;

提高SQL Server中的Xquery性能

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-02-24 08:40:54

Test scenario 测试场景

Then I create an XML index and do the same calls 然后我创建一个XML索引并执行相同的调用

So my advise: Use an index and do it with `.exist()` 所以我建议：使用索引并使用`.exist()`

提高SQL Server中的Xquery性能

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-02-24 08:40:54

Test scenario 测试场景

Then I create an XML index and do the same calls 然后我创建一个XML索引并执行相同的调用

So my advise: Use an index and do it with .exist() 所以我建议：使用索引并使用.exist()

解决方案1
2 已采纳 2016-02-24 08:40:54

So my advise: Use an index and do it with `.exist()` 所以我建议：使用索引并使用`.exist()`