简体   繁体   English

Sql Server:无法有效使用选择性XML索引

[英]Sql Server: Selective XML Index not being efficiently used

I'm exploring ways of improving the performance of an application which I can only affect on the database level to a limited degree. 我正在探索提高应用程序性能的方法,我只能在有限的程度上影响数据库级别。 The SQL Server version is 2012 SP2 and the table and view structure in question is (I cannot really affect this + note that the xml document may have several hundred elements in total): SQL Server版本是2012 SP2,有问题的表和视图结构是(我不能真正影响这个+注意,xml文档可能总共有几百个元素):

CREATE TABLE Orders(
    id nvarchar(64) NOT NULL,
    xmldoc xml NULL,
    CONSTRAINT PK_Order_id PRIMARY KEY CLUSTERED (id)
);

CREATE VIEW V_Orders as
SELECT 
    a.id, a.xmldoc
    ,a.xmldoc.value('data(/row/c1)[1]', 'nvarchar(max)') "Stuff"
    ,a.xmldoc.value('data(/row/c2)[1]', 'nvarchar(max)') "OrderType"
etc..... many columns
from Orders a;

A typical query (and the one being used for testing below): 一个典型的查询(以及下面用于测试的查询):

SELECT id FROM V_Orders WHERE OrderType = '30791'

All the queries are performed against the view and I can affect neither the queries nor the table/view structure. 所有查询都是针对视图执行的,我既不会影响查询,也不会影响表/视图结构。

I thought adding a selective XML index to the table would be my saviour: 我认为在表中添加一个选择性XML索引将是我的救星:

CREATE SELECTIVE XML INDEX I_Orders_OrderType ON Orders(xmldoc)
FOR(
    pathOrderType = '/row/c2' as SQL [nvarchar](20)
)

But even after updating the statistics the execution plan is looking weird. 但即使在更新统计数据后,执行计划看起来也很奇怪。 Couldn't post a pic as new account so the relevant details as text: 无法将图片作为新帐户发布,因此相关详细信息如下:

  • Clustered index seek from selectiveXml (Cost: 2% of total). 聚集索引从selectiveXml寻求(成本:总数的2%)。 Expected number of rows 1 but expected number of execution times 1269 (number of rows in the table) 预期行数1但预期执行次数1269(表中的行数)
  • -> Top N sort (Cost: 95% of total) - >排名前N的排序(费用:总计的95%)
  • -> Compute scalar (Cost 0) - >计算标量(成本0)

  • Separate branch: Clustered index scan PK_Order_id (Cost: 3% of total). 单独分支:聚簇索引扫描PK_Order_id(成本:总计的3%)。 Expected number of rows 1269 预期行数1269

  • -> Merged to the Computer scalar results with Nested loops (Left outer join) - >使用嵌套循环合并到计算机标量结果(左外连接)
  • -> Filter - >过滤器
  • -> Final result (Expected number of rows 1269) - >最终结果(预期行数1269)

In actuality with my test data the query doesn't even return any results but whether it returns one or few doesn't make any difference. 实际上,对于我的测试数据,查询甚至不会返回任何结果,但是它是返回一个还是几个没有任何区别。 Execution times support the query really taking as long as could be deduced from the execution plan and have read counts in the thousands. 执行时间支持查询,只要可以从执行计划中推断出并且具有数千个读取计数。

So my question is why is the selective xml index not being used properly by the optimizer? 所以我的问题是为什么优化器没有正确使用选择性xml索引? Or have I got something wrong? 或者我有什么不对劲? How would I optimize this specific query's performance with selective xml indexing (or perhaps persisted column)? 如何使用选择性xml索引(或者可能是持久列)来优化此特定查询的性能?

Edit: I did additional testing with larger sample data (~274k rows in the table with XML documents close to average production sizes) and compared the selective XML index to a promoted column. 编辑:我使用更大的样本数据进行了额外的测试(表格中约274k行,XML文档接近平均生产大小),并将选择性XML索引与提升列进行比较。 The results are from Profiler trace, concentrating on CPU usage and read counts. 结果来自Profiler跟踪,专注于CPU使用和读取计数。 The execution plan for selective xml indexing is basically identical to what is described above. 选择性xml索引的执行计划基本上与上面描述的相同。

Selective XML index and 274k rows (executing the query above): CPU: 6454, reads: 938521 选择性XML索引和274k行(执行上面的查询):CPU:6454,读取:938521

After I updated the values in the searched field to be unique (total records still 274k) I got the following results: 在我将搜索字段中的值更新为唯一(总记录仍为274k)后,我得到了以下结果:

Selective XML index and 274k rows (executing the query above): CPU: 10077, reads: 1006466 选择性XML索引和274k行(执行上面的查询):CPU:10077,读取:1006466

Then using a promoted (ie persisted) separately indexed column and using it directly in the view: CPU: 0, reads: 23 然后使用提升(即持久化)的单独索引列并在视图中直接使用它:CPU:0,读取:23

Selective XML index performance seems to be closer to full table scan than proper SQL indexed column fetch. 选择性XML索引性能似乎比正确的SQL索引列提取更接近全表扫描。 I read somewhere that using schema for the table might help drop the TOP N step from execution plan (assuming we're searching for a non-repeating field) but I'm not sure whether that's a realistic possibility in this case. 我在某处读到,使用表格的模式可能有助于从执行计划中删除TOP N步骤(假设我们正在搜索非重复字段)但我不确定在这种情况下这是否是现实的可能性。

The selective XML index you create is stored in an internal table with the primary key from Orders as the leading column for the clustered key for the internal table and the paths specified stored as sparse columns. 您创建的选择性XML索引存储在内部表中,其中Orders的主键作为内部表的聚簇键的前导列,指定的路径存储为稀疏列。

The query plan you get probably looks a something like this: 你得到的查询计划可能看起来像这样:

在此输入图像描述

You have a scan over the entire Orders table with a seek in the internal table on the primary key for each row in Orders. 您可以扫描整个Orders表,并在主键上的内部表中搜索Orders中的每一行。 The final Filter operator is responsible for checking the value of OrderType returning only the matching rows. 最终的Filter运算符负责检查OrderType的值, OrderType返回匹配的行。

Not really what you would expect from something called an index. 不是你所期望的所谓的索引。

To the rescue comes a secondary selective XML index. 为了救援,提出了一个辅助选择性XML索引。 They are created for one of the paths specified in the primary selective index and will create a non-clustered key on the values extracted in the path expression. 它们是为主选择索引中指定的路径之一创建的,并将在路径表达式中提取的值上创建非聚簇键。

It is however not all that easy. 然而,这并不容易。 SQL Server will not use the secondary index on predicates used on values extracted by the values() function. SQL Server不会对由values()函数提取的values()使用的谓词使用二级索引。 You have to use exists() instead. 你必须使用exists() Also, exists() requires the use of XQUERY data types in the path expressions where value() uses SQL data types. 此外, exists()要求在路径表达式中使用XQUERY数据类型,其中value()使用SQL数据类型。

Your primary selective XML index could look like this: 您的主要选择性XML索引可能如下所示:

CREATE SELECTIVE XML INDEX I_Orders_OrderType ON Orders(xmldoc)
FOR 
(
  pathOrderType = '/row/c2' as sql nvarchar(20), 
  pathOrderTypeX = '/row/c2/text()' as xquery 'xs:string' maxlength (20)
)

With a secondary on pathOrderTypeX . 使用pathOrderTypeX上的pathOrderTypeX

CREATE XML INDEX I_Orders_OrderType2 ON Orders(xmldoc)
  USING XML INDEX I_Orders_OrderType FOR (pathOrderTypeX) 

And with a query that uses exist() you will get this plan. 使用exist()的查询,您将获得此计划。

select id
from V_Orders
where xmldoc.exist('/row/c2/text()[. = "30791"]') = 1

在此输入图像描述

The first seek is a seek for the value you are looking for in the non-clustered index of the internal table. 第一个搜索是在内部表的非聚集索引中寻找您正在寻找的值。 The key lookup is done on the clustered key on the internal table (don't know why that is necessary). 密钥查找是在内部表上的集群密钥上完成的(不知道为什么这是必要的)。 And the last seek is on primary key in the Orders table followed by a filter that checks for null values in the column xmldoc . 最后一次搜索是在Orders表中的主键上,后跟一个过滤器,用于检查列xmldoc中的空值。

If you can get away with using property promotion , creating calculated indexed columns in the Orders table from the XML, I guess you would still get better performance than using secondary selective XML indexes. 如果您可以使用属性提升 ,从XML中创建Orders表中的计算索引列,我想您仍然会比使用辅助选择性XML索引获得更好的性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM