sql 服务器 Xquery 节点值性能

Question

I have a table with 25,000 rows.我有一个包含 25,000 行的表。 Table Audit (Id int identity(1,1), AdditionalInfo xml) The sample data in AdditionalInfo column for a row looks like below Table Audit (Id int identity(1,1), AdditionalInfo xml) 一行的 AdditionalInfo 列中的示例数据如下所示

<Audit version="1">
  <Context name="Event">
    <Action name="OrganizationEventReceived">
      <Input>
        <Source type="SourceOrganizationId">77d2678b-ea4a-43ad-816b-c63edf206b08</Source>
        <Target type="TargetOrganizationId">b98fd3ae-dbcb-4826-9d92-7e445ad61273,b98fd3ae-dbcb-4826-9d92-7e445ad61273,b98fd3ae-dbcb-4826-9d92-7e445ad61273</Target>
      </Input>
    </Action>
  </Context>
</Audit>

I like to shred the xml and collect the data in output dataset with following query.我喜欢粉碎 xml 并使用以下查询收集 output 数据集中的数据。

SELECT   Id,
         p.value('(@name)[1]', 'nvarchar (100)') AS TargetAction, 
         p.value('(Input/Source/text())[1]', 'nvarchar (500)') AS Source, 
         p.value('(Input/Target/text())[1]', 'nvarchar (max)') AS Target
FROM dbo.Audit  CROSS APPLY AdditionalInfo.nodes('/Audit/Context/Action') AS AdditionalInfo(p)

The performance of the query is bad.查询的性能很差。 It is taking 15 seconds to give the result set for just 25,000 rows.给出仅 25,000 行的结果集需要 15 秒。 Is there a better way of doing it.有没有更好的方法。 I even tried putting primary and secondary xml indexes on AdditionalInfo column.我什至尝试将主要和次要索引 xml 放在 AdditionalInfo 列上。 Please help and let me know, to use better sql server xquery techniques.请帮助并让我知道，使用更好的 sql 服务器 xquery 技术。

Thanks,谢谢，

Answer 1

Great question.很好的问题。

My recent task requires to parse about 35'000 XML documents, valid document being ~20kB.我最近的任务需要解析大约 35'000 个 XML 文档，有效文档约为 20kB。

More and larger xml files tend to exponentially fill the memory:更多更大的 xml 文件往往会以指数方式填满 memory：

100 documents: 0:33 100 个文档：0:33
1000 documents: 25:00 1000 份文件：25:00

Try to distribute your work:尝试分发您的工作：

Variable target stores unstructured data, which eats most of computing power due to the data type and different length in values变量target存储非结构化数据，由于数据类型和值的长度不同，它会消耗大部分计算能力
depth of nodes in CROSS APPLY matters: avoid triple nodes in nodes() , consider two nodes and recursion (see below on split) CROSS APPLY中的节点深度很重要：避免在nodes()中使用三重节点，考虑两个节点和递归（参见下面的拆分）
batch mode: process several documents at time, WHERE id IN (1,2,3)批处理模式：一次处理多个文档， WHERE id IN (1,2,3)
loop a list of documents, FOR ;循环文档列表， FOR ；
parse using local variables, such as DECLARE @xml_doc XML; SET @xml_doc = SELECT xmldata FROM xmlsource WHERE id=1;使用局部变量解析，例如DECLARE @xml_doc XML; SET @xml_doc = SELECT xmldata FROM xmlsource WHERE id=1; DECLARE @xml_doc XML; SET @xml_doc = SELECT xmldata FROM xmlsource WHERE id=1;
avoid exporting xml node content, only write result values避免导出 xml 节点内容，只写入结果值
parse all elements separately: saving order of elements using function ROW_NUMBER() , then LEFT JOIN all parts to xml documents list using some identifier, such as xml_id分别解析所有元素：使用 function ROW_NUMBER()保存元素顺序，然后使用一些标识符将所有部分LEFT JOIN连接到 xml 文档列表，例如xml_id

sql 服务器 Xquery 节点值性能

问题描述

1 个解决方案

解决方案1
0 2022-12-09 19:06:35

sql 服务器 Xquery 节点值性能

问题描述

1 个解决方案

解决方案1 0 2022-12-09 19:06:35

解决方案1
0 2022-12-09 19:06:35