简体   繁体   中英

Why use XML type to store XML data in SQL Server?

I'm playing around and learning to use Microsoft SQL Server. I want to store XML documents in a table, parts of the XML document won't be modified within the table (ie any changes will be done by updating the whole XML document in that cell).

From what I can see, I can store the XML documents in a column of type Xml or in a varchar(MAX).

What are the pros and cons of each?

XML datatype supports:

Besides, using an XML type it will be harder for you to do the typical mistakes junior developers do when handling XML: threat it as a string, mix or ignore encodings like UTF8 and UTF16, ignore namespaces, confuse or ignore processing instructions etc etc.

Please read XML Best Practices for Microsoft SQL Server 2005

Yes, you can.

Now, go on reading the documentation further. The part about better search for XML - you can put an index on a XML field and it will allow you a lot more query syntax specific for XML than a text field because XML fields internally parse the XML.

Quoted from the below SO post: Microsoft SQL Server 2005/2008: XML vs text/varchar data type

If you store xml in an xml typed column, the data will not get stored as simple text, as in the nvarchar case, it will be stored in some sort of parsed data tree, which in turn will be smaller than the unparsed xml version. This not only decreases the database size, but gives you other advantages, like validation, easy manipulation etc. (even though you're not using any of these, still, they are there for future use).

On the other hand, the server will have to parse the data upon insertion, which will probably slow your database down - you have to make a decision of speed vs. size.

Personally, I think that data in the database should be stored as xml only when it has structure which is hard to implement in a relational model, eg layouts, style descriptions etc. Usually that means that there won't be much data and speed is not an issue, thus added xml features, like data validation and manipulation ability (also, last but not least, the ability to click on the value in managment studio and see formatted xml - I really love that feature,). outweight the costs.

I don't have direct experience in storing large amounts of xml in the database and I wouldn't do that if I had the option, since it is almost always slower that a relational model, but if that would be the case, I'd recommend profiling both options, and choosing between size and speed that best suit your needs.

I did some tests to compare insert performance between untyped XML, typed XML, and NVARCHAR(MAX). I found that XML was the fasted and used the least storage on disk. The test that I did, inserted 7,936,510 rows. It used the XSD at https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd .

I ran the typed XML test twice. The first time took 01:23:26.1355961. The second time I took 01:15:15.5957446. The size on disk was 57,520,685,056.

The untyped XML test took 00:48:48.6290364 and was 36,515,610,624 on disk.

The NVARCHAR(MAX) test took 00:50:22.1841067 and was 72,620,179,456 on disk.

Note, I dropped and recreated the database for each test.

My take away from this is that it's best to use untyped XML instead of NVARCHAR(MAX) because it uses a lot less disk. Maybe if you just used non-Unicode VARCHAR it would be less of a difference. I'm thinking it's probably using two bytes to store each character. But, also, there is a lot of whitespace in the files. So, that's a lot of wasted storage there. So, that might have had something to do with it.

I'm not sure how much of the extra slowness associated with using typed XML versus untyped XML is due to the validation, or, if there are other differences. If I remember correctly, I once read that the data is stored in hidden tables relationally. I'm not sure if it does this for both typed and untyped XML.

I haven't yet tested query performance. I'm assuming it would be faster for typed XML.

Also, I specified that the typed XML was DOCUMENT, not the default CONTENT.

1.It is based on a Standard: SQLXML , so you can expect other major databases to have similar capabilities.

2.Queries may use standards such as XPATH

3.You can index the data

4.If you have a schema for data storage (less) and query optimizations is performed based on type information

Cons: If you are storing structured xml data in an xml data field then replication currently will NOT sync changes between publisher and subscriber.

eg if the subscriber changes an xml element and the publisher changes a different element of the same xml data column then there will be a conflict - one will lose and you have to manually find a solution to the missing data.

Pros: Many web/desktop applications store their data as xml data types - this can be easily mapped to a sql xml data type.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM