简体繁体 English

为什么在 SQL 服务器中使用 XML 类型存储 XML 数据？

[英]Why use XML type to store XML data in SQL Server?

原文 2011-08-11 16:40:24 9 6 sql/ sql-server/ xml

I'm playing around and learning to use Microsoft SQL Server.我正在玩耍并学习使用 Microsoft SQL 服务器。 I want to store XML documents in a table, parts of the XML document won't be modified within the table (ie any changes will be done by updating the whole XML document in that cell).我想将 XML 文档存储在表格中，XML 文档的部分内容不会在表格中修改（即任何更改都将通过更新该单元格中的整个 XML 文档来完成）。

From what I can see, I can store the XML documents in a column of type Xml or in a varchar(MAX).据我所知，我可以将 XML 文档存储在 Xml 类型的列或 varchar(MAX) 中。

What are the pros and cons of each?各自的优缺点是什么？

6 个解决方案

XML datatype supports: XML 数据类型支持：

XML schema validation XML 模式验证
XML Indexing XML 分度
XML data methods to query and manipulate XML via XPath/XQuery XML 数据方法通过 XPath/XQuery 查询和操作 XML

Besides, using an XML type it will be harder for you to do the typical mistakes junior developers do when handling XML: threat it as a string, mix or ignore encodings like UTF8 and UTF16, ignore namespaces, confuse or ignore processing instructions etc etc.此外，使用 XML 类型，您将更难犯初级开发人员在处理 XML 时会犯的典型错误：将其作为字符串威胁，混合或忽略 UTF8 和 UTF16 等编码，忽略命名空间，混淆或忽略处理指令等。

Please read XML Best Practices for Microsoft SQL Server 2005请阅读XML Microsoft SQL 服务器 2005 的最佳实践

Yes, you can.是的你可以。

Now, go on reading the documentation further.现在，go 进一步阅读文档。 The part about better search for XML - you can put an index on a XML field and it will allow you a lot more query syntax specific for XML than a text field because XML fields internally parse the XML. The part about better search for XML - you can put an index on a XML field and it will allow you a lot more query syntax specific for XML than a text field because XML fields internally parse the XML.

Quoted from the below SO post: Microsoft SQL Server 2005/2008: XML vs text/varchar data type引用自以下 SO 帖子： Microsoft SQL Server 2005/2008：XML 与 text/varchar 数据类型

If you store xml in an xml typed column, the data will not get stored as simple text, as in the nvarchar case, it will be stored in some sort of parsed data tree, which in turn will be smaller than the unparsed xml version. If you store xml in an xml typed column, the data will not get stored as simple text, as in the nvarchar case, it will be stored in some sort of parsed data tree, which in turn will be smaller than the unparsed xml version. This not only decreases the database size, but gives you other advantages, like validation, easy manipulation etc. (even though you're not using any of these, still, they are there for future use).这不仅减少了数据库的大小，而且还为您提供了其他优势，例如验证、易于操作等（即使您没有使用任何这些，但它们仍可供将来使用）。

On the other hand, the server will have to parse the data upon insertion, which will probably slow your database down - you have to make a decision of speed vs. size.另一方面，服务器必须在插入时解析数据，这可能会减慢您的数据库速度 - 您必须决定速度与大小。

Personally, I think that data in the database should be stored as xml only when it has structure which is hard to implement in a relational model, eg layouts, style descriptions etc. Usually that means that there won't be much data and speed is not an issue, thus added xml features, like data validation and manipulation ability (also, last but not least, the ability to click on the value in managment studio and see formatted xml - I really love that feature,).就个人而言，我认为数据库中的数据应该存储为 xml 只有当它具有在关系 model 中难以实现的结构时，例如布局，样式描述等。通常这意味着不会有太多的数据和速度不是问题，因此添加了 xml 功能，例如数据验证和操作能力（最后但并非最不重要的一点是，可以在管理工作室中单击值并查看格式化的 xml - 我真的很喜欢这个功能，）。 outweight the costs.超过成本。

I don't have direct experience in storing large amounts of xml in the database and I wouldn't do that if I had the option, since it is almost always slower that a relational model, but if that would be the case, I'd recommend profiling both options, and choosing between size and speed that best suit your needs.我没有在数据库中存储大量 xml 的直接经验，如果可以选择，我不会这样做，因为它几乎总是比关系 model 慢，但如果是这样的话，我d 建议对这两个选项进行分析，并在最适合您需求的大小和速度之间进行选择。

I did some tests to compare insert performance between untyped XML, typed XML, and NVARCHAR(MAX).我做了一些测试来比较无类型 XML、类型 XML 和 NVARCHAR(MAX) 之间的插入性能。 I found that XML was the fasted and used the least storage on disk.我发现 XML 是禁食的并且使用磁盘上的存储空间最少。 The test that I did, inserted 7,936,510 rows.我所做的测试插入了 7,936,510 行。 It used the XSD at https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd .它在https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd使用了 XSD。

I ran the typed XML test twice.我运行了两次键入的 XML 测试。 The first time took 01:23:26.1355961.第一次用了 01:23:26.1355961。 The second time I took 01:15:15.5957446.我第二次拿了 01:15:15.5957446。 The size on disk was 57,520,685,056.磁盘上的大小为 57,520,685,056。

The untyped XML test took 00:48:48.6290364 and was 36,515,610,624 on disk.无类型的 XML 测试采用 00:48:48.6290364 并且在磁盘上为 36,515,610,624。

The NVARCHAR(MAX) test took 00:50:22.1841067 and was 72,620,179,456 on disk. NVARCHAR(MAX) 测试时间为 00:50:22.1841067，磁盘上为 72,620,179,456。

Note, I dropped and recreated the database for each test.请注意，我为每个测试删除并重新创建了数据库。

My take away from this is that it's best to use untyped XML instead of NVARCHAR(MAX) because it uses a lot less disk.我从中得出的结论是，最好使用无类型的 XML 而不是 NVARCHAR(MAX) 因为它使用的磁盘要少得多。 Maybe if you just used non-Unicode VARCHAR it would be less of a difference.也许如果您只是使用非 Unicode VARCHAR，那么差异会更小。 I'm thinking it's probably using two bytes to store each character.我认为它可能使用两个字节来存储每个字符。 But, also, there is a lot of whitespace in the files.但是，文件中也有很多空白。 So, that's a lot of wasted storage there.所以，那里有很多浪费的存储空间。 So, that might have had something to do with it.所以，这可能与它有关。

I'm not sure how much of the extra slowness associated with using typed XML versus untyped XML is due to the validation, or, if there are other differences.我不确定使用类型化的 XML 与使用非类型化的 XML 相关的额外缓慢有多少是由于验证，或者是否存在其他差异。 If I remember correctly, I once read that the data is stored in hidden tables relationally.如果我没记错的话，我曾经读到数据以关系方式存储在隐藏表中。 I'm not sure if it does this for both typed and untyped XML.我不确定它是否对有类型和无类型 XML 都这样做。

I haven't yet tested query performance.我还没有测试查询性能。 I'm assuming it would be faster for typed XML.我假设输入 XML 会更快。

Also, I specified that the typed XML was DOCUMENT, not the default CONTENT.另外，我指定输入的 XML 是 DOCUMENT，而不是默认的 CONTENT。

1.It is based on a Standard: SQLXML , so you can expect other major databases to have similar capabilities. 1.它基于一个标准： SQLXML ，因此您可以期望其他主要数据库具有类似的功能。

2.Queries may use standards such as XPATH 2.查询可使用XPATH等标准

3.You can index the data 3.您可以索引数据

4.If you have a schema for data storage (less) and query optimizations is performed based on type information 4.如果你有数据存储的模式（less）并且基于类型信息执行查询优化

Cons: If you are storing structured xml data in an xml data field then replication currently will NOT sync changes between publisher and subscriber.缺点：如果您将结构化 xml 数据存储在 xml 数据字段中，则复制当前不会同步发布者和订阅者之间的更改。

eg if the subscriber changes an xml element and the publisher changes a different element of the same xml data column then there will be a conflict - one will lose and you have to manually find a solution to the missing data.例如，如果订阅者更改了 xml 元素，而发布者更改了同一 xml 数据列的不同元素，那么就会发生冲突 - 一个会丢失，您必须手动找到丢失数据的解决方案。

Pros: Many web/desktop applications store their data as xml data types - this can be easily mapped to a sql xml data type.优点：许多 Web/桌面应用程序将其数据存储为 xml 数据类型 - 这可以轻松映射到 sql xml 数据类型。