SQL Server 2008-手动解析XML，还是对XML字段使用内置的XML索引？

Question

I'm working on a logging database in SQL Server 2008. It'll consist mainly of one table something like this: 我正在使用SQL Server 2008中的日志记录数据库。它主要由一个这样的表组成：

StepLog 
----------------
  StepLogID
  ClientID
  LogContent   XML
  CreateDate

Basically what will happen in this table is that various clients will log certain activities to this table. 基本上，此表中将发生的事情是各个客户端会将某些活动记录到该表中。 The LogContent field will be XML - untyped because we don't know what clients want to log. LogContent字段将为XML-无类型，因为我们不知道客户端要记录什么内容。

To allow the LogContent field to be searched, the current plan is to shred out the LogContent field programmatically. 为了允许搜索LogContent字段，当前计划是以编程方式切碎LogContent字段。 The metadata for shredding would be in a table something like the following: 用于粉碎的元数据将在表中，如下所示：

XPathAttribute
----------------
  XPathAttributeID
  AttributeName
  AttributeDescription
  XPath

Upon insert of a record into StepLog, we would have a stored procedure that would take all the Xpaths defined in XPathAttribute, and write them out to another table, XPathAttributeValue 将记录插入到StepLog中之后，我们将拥有一个存储过程，该存储过程将采用XPathAttribute中定义的所有Xpath，并将它们写到另一个表XPathAttributeValue中。

XPathAttributeValue
----------------
  XPathAttributeValueID
  StepLogID
  AttributeID
  AttributeValue

My original idea, when looking at this design, was "why not just use the XML indexes, both primary and secondary? That would avoid lots of work on our side, and use built-in functionality. 在查看此设计时，我的最初想法是“为什么不只使用XML索引，包括主索引和次索引？这将避免我们这边的大量工作，而是使用内置功能。

I don't have a lot of experience with XML indexes, and the original designer had some poor experiences with XML indexes (poor performance) in SQL Server2005, that's how this design originated. 我对XML索引没有太多的经验，并且原始设计者在SQL Server2005中对XML索引（性能差）有一些不好的经验，这就是这种设计的起源。

Feedback would be very much appreciated! 反馈将不胜感激！

thanks, Sylvia 谢谢西尔维亚

Answer 1

XML indexes help in particular scenarios, as described in Secondary XML Indexes : XML索引在特定情况下有帮助，如“ 辅助XML索引”中所述：

Following are some guidelines for creating one or more secondary indexes: 以下是创建一个或多个二级索引的一些准则：

If your workload uses path expressions significantly on XML columns, the PATH secondary XML index is likely to speed up your workload. 如果您的工作负载在XML列上大量使用路径表达式，则PATH辅助XML索引可能会加快您的工作负载。 The most common case is the use of the exist() method on XML columns in the WHERE clause of Transact-SQL. 最常见的情况是在Transact-SQL的WHERE子句中的XML列上使用exist()方法。

If your workload retrieves multiple values from individual XML instances by using path expressions, clustering paths within each XML instance in the PROPERTY index may be helpful. 如果您的工作负载通过使用路径表达式从单个XML实例中检索多个值，则在PROPERTY索引中将每个XML实例内的路径聚集起来可能会有所帮助。 This scenario typically occurs in a property bag scenario when properties of an object are fetched and its primary key value is known. 当获取对象的属性并且其主键值已知时，此情况通常发生在属性包情况中。

If your workload involves querying for values within XML instances without knowing the element or attribute names that contain those values, you may want to create the VALUE index. 如果您的工作负载涉及在XML实例中查询值而又不知道包含这些值的元素或属性名称，则可能需要创建VALUE索引。 This typically occurs with descendant axes lookups, such as //author[last-name="Howard"] , where elements can occur at any level of the hierarchy. 这通常发生在后代轴查找中，例如//author[last-name="Howard"] ，其中元素可以出现在层次结构的任何级别。 It also occurs in wildcard queries, such as /book [@* = "novel"] , where the query looks for <book> elements that have some attribute having the value "novel". 它也发生在通配符查询中，例如/book [@* = "novel"] ，在该查询中查找具有某些属性的值为“ novel”的<book>元素。

As you can see, each type of index is appropriate for a particular scenario. 如您所见，每种类型的索引都适用于特定情况。 With an open ended approach like your project, is hard to tell which indexes are going to be helpful and which not. 使用像您的项目这样的开放式方法，很难说出哪些索引会有所帮助，而哪些则无济于事。

Another thing to consider is that XML performs much better if you can declare an XML schema for the column, but the nature of your project does not allow this. 要考虑的另一件事是，如果可以为列声明XML模式，则XML的性能要好得多，但是项目的性质不允许这样做。

So overall I'd say... measure and see. 因此，总的来说，我会说...进行观察。 Shredding the XML and storing the values in relational tables is very likely to boost performance over raw XML access. 将XML切碎并将值存储在关系表中，很可能会提高原始XML访问的性能。 But that would apply if you know the schema and shred out a specific set of information, that you then index properly. 但这适用于您了解架构并切出一组特定信息，然后正确索引的情况。 Right now, even though you shred out some information, you shred it out into what basically is an EAV structure, which is difficult both to query and to optimize. 现在，即使您切碎了一些信息，您也将其切碎成基本上是EAV结构，这既难以查询又无法优化。 I also recommend you read up on Best Practices for Semantic Data Modeling for Performance and Scalability for some discussions around the EAV shortcomings and how to avoid some problems. 我还建议您阅读有关性能和可伸缩性的语义数据建模最佳实践，以获取有关EAV缺点以及如何避免某些问题的一些讨论。

Answer 2

I basically agree with what @Remus has said. 我基本上同意@Remus所说的。

Which is to say, by all means use the built-in XML indexes. 也就是说，一定要使用内置的XML索引。 SQL Server handles huge XML collections remarkably well (IMHO). SQL Server可以很好地处理庞大的XML集合（IMHO）。 The time saving over rolling your own will be immeasurable. 通过滚动自己节省的时间将是无法估量的。

One thing I would mention — adding a schema hurt performance in my case . 我要提到的一件事-添加架构会损害我的性能。 I'd hoped it would help the query optimizer, but it didn't, so I just left it out. 我希望它能对查询优化器有所帮助，但没有帮助，所以我就省略了。 (You said it was untyped, so this shouldn't come up.) （您说过它是未键入的，所以不应该出现。）

SQL Server 2008-手动解析XML，还是对XML字段使用内置的XML索引？

问题描述

2 个解决方案

解决方案1
3 已采纳 2010-11-05 19:35:38

解决方案2
2 2010-11-05 20:09:40

SQL Server 2008-手动解析XML，还是对XML字段使用内置的XML索引？

问题描述

2 个解决方案

解决方案1 3 已采纳 2010-11-05 19:35:38

解决方案2 2 2010-11-05 20:09:40

解决方案1
3 已采纳 2010-11-05 19:35:38

解决方案2
2 2010-11-05 20:09:40