简体   繁体   English

有没有比这更快的方法从T-SQL中的XML节点中提取数据?

[英]Is there a faster way than this to extract data from XML nodes in T-SQL?

I am currently trying to create a stored procedure in T-SQL which takes an XML table as its input, and then inserts the data in it into a temporary table. 我目前正在尝试在T-SQL中创建一个存储过程,它以XML表作为输入,然后将数据插入到临时表中。

The XML that I am using has the following format: 我使用的XML具有以下格式:

<Table>
    <row MyFirstColumn="foo" MySecondColumn="bar" ... />
</Table>

The SQL that I am using to insert this XML data into a temporary table is of the following format: 我用于将此XML数据插入临时表的SQL具有以下格式:

INSERT INTO
    #TempTable
SELECT
    T.c.value('@MyFirstColumn', 'varchar(50)')
   ,T.c.value('@MySecondColumn', 'varchar(50)')
   ,...
FROM
    @x.nodes('//Table/row') T(c)

However, I am doing this with XML tables containing 150 columns and upwards of 200,000 rows. 但是,我使用包含150列和超过200,000行的XML表来执行此操作。 At present, executing this SQL on 10,000 rows takes ~142 seconds, so this is completely inappropriate for dealing with XML tables containing large numbers of rows. 目前,在10,000行上执行此SQL需要大约142秒,因此这对于处理包含大量行的XML表是完全不合适的。

Can anyone suggest a way to speed up this process? 有谁能建议加快这个过程的方法?

Shredding XML with nodes()/value() in SQL Server has performance issues when you query a lot of columns. 在查询大量列时,在SQL Server中使用nodes()/ value()粉碎XML会出现性能问题。 There is one nested loop join with a call to a xml function for each column. 有一个嵌套循环连接,每个列都调用xml函数。

Query plan with 3 columns: 查询计划有3列:

在此输入图像描述

Query plan with 5 columns: 查询计划有5列:

在此输入图像描述

Just imagine what it would look like with more than 150 columns. 想象一下超过150列的情况会是什么样子。

Another option for you is to use OPENXML . 另一个选择是使用OPENXML It does not have the same problems with many columns. 它与许多列没有相同的问题。

Your query would look something like this: 您的查询看起来像这样:

declare @H int;
declare @X xml;

exec sys.sp_xml_preparedocument @H output,
                                @X;

select C1,
       C2,
       C3
from
       openxml(@H, 'Table/row', 0)
       with (
              C1 int,
              C2 int,
              C3 int
            );

exec sys.sp_xml_removedocument @H;

For me, using 150 columns and 1000 rows took about 14 seconds with nodes()/value() and 3 seconds with OPENXML. 对我来说,使用150列和1000行使用nodes()/ value()花费大约14秒,使用OPENXML花费3秒。

Vote for a change. 投票支持改变。

Code used for testing; 用于测试的代码;

drop table T;

go

declare @C int = 150;
declare @S nvarchar(max);
declare @X xml;
declare @N int = 1000;
declare @D datetime;

set @S = 'create table T('+
stuff((
      select top(@C) ', '+N'C'+cast(row_number() over(order by 1/0) as nvarchar(3)) + N' int'
      from sys.columns
      for xml path('')
      ), 1, 2, '') + ')'

exec sp_executesql @S;

set @S = 'insert into T select top(@N) '+
stuff((
      select top(@C) ',1'
      from sys.columns as c1
      for xml path('')
      ), 1, 1, '') + ' from sys.columns as c1, sys.columns as c2';

exec sp_executesql @S, N'@N int', @N;

set @X = (
         select *
         from dbo.T
         for xml raw, root('Table')
         );

set @S = 'select '+
stuff((
      select top(@C) ', '+N'T.X.value(''@C'+cast(row_number() over(order by 1/0) as nvarchar(3)) + N''', ''int'')'
      from sys.columns
      for xml path('')
      ), 1, 2, '') + ' from @X.nodes(''Table/row'') as T(X)'

set @D = getdate();
exec sp_executesql @S, N'@X xml', @X;
select datediff(second, @D, getdate());

set @S = 'declare @H int;
exec sp_xml_preparedocument @H output, @X;

select *
from openxml(@H, ''Table/row'', 0)
  with (' +
stuff((
      select top(@C) ', C'+cast(row_number() over(order by 1/0) as nvarchar(3))+ ' int'
      from sys.columns
      for xml path('')
      ), 1, 2, '') + ');
exec sys.sp_xml_removedocument @H';

set @D = getdate();
exec sp_executesql @S, N'@X xml', @X
select datediff(second, @D, getdate());

Your options depend on how much control you have over the server and what preparation you're willing and able to do. 您的选择取决于您对服务器的控制程度以及您愿意和能够做的准备。

If you have the ability to clean your data before calling the procedure (running an executable, for example)... 如果您能够在调用过程之前清理数据(例如,运行可执行文件)...

You could deserialize your data into an entity and use your ORM tool of choice (nHibernate, EntityFramework, etc.) to store the entity. 您可以将数据反序列化为实体,并使用您选择的ORM工具(nHibernate,EntityFramework等)来存储实体。

You could parse the XML into an object that a bulk importer could handle, store it to a file, and make use of sql's bulk import functionality. 您可以将XML解析为批量导入程序可以处理的对象,将其存储到文件中,并使用sql的批量导入功能。 https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017 https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017

If you are able to make use of custom functionality on the server, you can use CLR user-defined functions to do this work instead of running it in a separate executable. 如果您能够在服务器上使用自定义功能,则可以使用CLR用户定义的函数来执行此工作,而不是在单独的可执行文件中运行它。 https://docs.microsoft.com/en-us/sql/relational-databases/clr-integration-database-objects-user-defined-functions/clr-user-defined-functions?view=sql-server-2017 https://docs.microsoft.com/en-us/sql/relational-databases/clr-integration-database-objects-user-defined-functions/clr-user-defined-functions?view=sql-server-2017

If I think of anything else I'll edit this post. 如果我想到其他任何事情,我会编辑这篇文章。

SQL-Server is pretty fast in dealing with XML, but you did not tell us the most important thing: Where is @x coming from? SQL-Server处理XML的速度非常快,但是你并没有告诉我们最重要的事情: @x来自哪里?

Within SQL-Server the XML is not stored as as string you see, but as a hierarchically organised tree in physical tables. 在SQL-Server中,XML不会像您看到的那样存储,而是存储在物理表中的分层组织树中。 If you get this XML on string base and assign it to a variable of type XML , the engine will have to parse the whole lot and transfer all its content into the internal structures. 如果您在字符串基础上获得此XML并将其分配给XML类型的变量,则引擎将必须解析整个批次并将其所有内容传输到内部结构中。 The rest should be rather fast. 剩下的应该是相当快的。

On the first sight there are two places to tune it a bit: 第一眼看到有两个地方可以调整一下:

  • FROM @x.nodes('//Table/row') T(c)
    The // will use a deep search , the engine will look into each <row> if there might be another <Table> nested below. //将使用深度搜索 ,如果下面嵌套了另一个<Table> ,引擎将查看每个<row> Rather use FROM @x.nodes('/Table/row') T(c) . 而是使用FROM @x.nodes('/Table/row') T(c)

  • And use 'nvarchar(50)' instead of 'varchar(50)' . 并使用'nvarchar(50)'而不是'varchar(50)' Internally XML stores its strings as NVARCHAR . 内部XML将其字符串存储为NVARCHAR You can avoid all these casts... 你可以避免所有这些演员......

If you have SQL-Server 2016+ and you have control over the sender, you might give JSON a try. 如果您拥有SQL-Server 2016+并且您可以控制发件人,则可以尝试使用JSON This is better in one-time-actions because it will not transfer your data in internal structures before it can work with it. 这在一次性操作中更好,因为在它可以使用之前,它不会在内部结构中传输数据。

I really liked and voted for Mikael Eriksson answer, but there's one aspect about it: 我非常喜欢并投票支持Mikael Eriksson的回答,但有一个方面是:

His test generates 909 KB XML document with 1000 rows, 150 columns. 他的测试生成了909 KB XML文档,包含1000行,150列。 And sp_xml_preparedocument takes only 226 milliseconds in his case (which is really fast), but... 并且sp_xml_preparedocument在他的情况下只需要226毫秒(这真的很快),但......

I tried applying it to my XML document which is 521 MB. 我尝试将它应用于我的XML文档,即521 MB。 It contains 2045156 rows with 11 different columns, all are read as nvarchar(255) 它包含2045156行,包含11个不同的列,所有行都读取为nvarchar(255)

When I selected all 11 columns via *: 当我通过*选择所有11列时:

  • select * via .value() took 297 sec select * via .value()花了297秒
  • select * via openxml took 231 sec in total: (sp_xml_preparedocument took 107 sec, select * from openxml took 123 sec) select * via openxml共计231秒:( sp_xml_preparedocument需要107秒,select * from openxml需要123秒)

openxml works better in this case! openxml在这种情况下效果更好!

When I selected only 2 columns: 当我只选择2列时:

  • select 2 columns via .value() took 57 sec 通过.value()选择2列需要57秒
  • select 2 columns via openxml took 189 sec in total: (sp_xml_preparedocument - 86 sec, select * from openxml - 103 sec) 通过openxml选择2列共计189秒:(sp_xml_preparedocument - 86秒,从openxml中选择* - 103秒)

.value() works better in this case! 在这种情况下,.value()效果更好!

So it looks like which method is faster actually depends on xml size, number of rows and number of columns that you query from xml! 所以看起来哪个方法更快实际上取决于你从xml查询的xml大小,行数和列数!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM