简体   繁体   English

在 SQL Server Graph Schema 中表示 XML Schema 及其数据

[英]Represent an XML Schema and its Data in a SQL Server Graph Schema

I have a problem where I need to represent an XML Schema and its data inside of a SQL Server Database.我有一个问题,我需要在 SQL 服务器数据库中表示 XML 架构及其数据。 I need to be able to access the data in a way that will allow me to create either an XML or JSON file.我需要能够以允许我创建 XML 或 JSON 文件的方式访问数据。

I have looked at couple of solutions to this problem.我已经研究了几个解决这个问题的方法。 First creating a traditional relational database and storing the XML data in a table representing the hierarchical structure by use of a self referencing parentId.首先创建一个传统的关系型数据库,将 XML 数据存储在一个表示层次结构的表中,使用一个自引用的 parentId。 This structure seems OK but as the size of XML is large the accessing of data is slow, as I must use a lot of recursions to obtain the hierarchical data I need.这个结构看起来不错,但是由于 XML 的大小很大,所以访问数据很慢,因为我必须使用大量递归来获取我需要的分层数据。 The performance of this design will tend to worsen as the amount of data increases.这种设计的性能会随着数据量的增加而趋于恶化。

Secondly, I looked at the use of creating a Graph Schema inside of SQL Server.其次,我查看了在 SQL 服务器内部创建图形模式的用途。 Assigning each XML element as a node table and the xml element attributes as columns in that table.将每个 XML 元素分配为节点表,并将 xml 元素属性分配为该表中的列。 I then created a simple 'isParentOf' edge table, inserting this relationship between the different xml elements into the table.然后我创建了一个简单的“isParentOf”边缘表,将不同 xml 元素之间的这种关系插入到表中。 However, as each element is a separate node it is making queries cumbersome.但是,由于每个元素都是一个单独的节点,因此查询起来很麻烦。

I know there isn't a direct correlation between the XML Schema structure and Databases and have read articles on the complexity of such problems.我知道 XML 模式结构与数据库之间没有直接关联,并且已阅读有关此类问题复杂性的文章。 But I wanted to reach out to the community to see if it is possible to achieve my goal using the SQL Graph Databases as this seems to be the best fit, in that I can define my elements and then create the different relationships.但我想接触社区,看看是否有可能使用 SQL 图形数据库实现我的目标,因为这似乎是最合适的,因为我可以定义我的元素,然后创建不同的关系。

I have provided some sample XML data below which contains the different permutations of the XML that I am currently working with in terms of element hierarchies, attributes, and data.我在下面提供了一些样本 XML 数据,其中包含我目前在元素层次结构、属性和数据方面使用的 XML 的不同排列。

<?xml version="1.0" encoding="utf-8"?>
<Document xmlns='http://mydocument.com/schema/1'>
  <BankStatement frequency='monthly'>
    <Customer>
      <AcctNo>012-3456789</AcctNo>
      <Name type="full">John Doe</Name>
      <Street>123 Street Road</Street>
      <City>London</City>
    </Customer>
    <BeginDate>18/10/2022</BeginDate>
    <EndDate>18/11/2022</EndDate>
  </BankStatement>
</Document>

First creating a traditional relational database and storing the XML data in a table representing the hierarchical structure by use of a self referencing parentId...The performance of this design will tend to worsen as the amount of data increases.首先创建一个传统的关系型数据库,将XML数据存储在一个表中,使用自引用的parentId表示层级结构……这种设计的性能会随着数据量的增加而变差。

No, it likely won't.不,它可能不会。 When properly architected and indexed, the search time of your tables will be O(log(n)) because indexes use B-Tree data structures.当正确构建和索引时,表的搜索时间将为O(log(n)) ,因为索引使用 B-Tree 数据结构。

Let's say your table had 1 million rows in it.假设您的表中有 100 万行。 In the worst case scenario, log2(1 million) = 30 .在最坏的情况下, log2(1 million) = 30 That's only 30 nodes of the B-Tree that would need to be seeked through to find your data.这只是 B 树的 30 个节点,需要遍历这些节点才能找到您的数据。 If your table grew to 1 billion rows, log2(1 billion) = 40 .如果您的表增长到 10 亿行,则log2(1 billion) = 40 These are extremely small numbers for a computer to search through.这些是计算机搜索的极小数字。 (It's actually usually less than this because of something called the fan-out factor.) (由于扇出因子的原因,它实际上通常小于这个值。)

Typically a recursive CTE can easily be used to efficiently crawl a hierarchical structure too.通常,递归 CTE 也可以很容易地用于高效地爬取层次结构。 Hundreds of thousands of hierarchical rows can be crawled and related appropriately in under a second.可以在一秒钟内抓取数十万个分层行并适当关联。


Aside from all of that, I don't even see a hierarchical relationship in your example data.除此之外,我什至在您的示例数据中看不到层次关系。 Rather it appears to be a traditional data model problem that a relational structure would be well suited for.相反,它似乎是关系结构非常适合的传统数据 model 问题。

Tables that I would recommend defining would be Customers , Accounts , maybe Addresses if a Customer can have more than one Address , and BankStatements .我建议定义的表是CustomersAccounts ,如果Customer可以有多个AddressBankStatements ,则可能是Addresses I'm sure your data probably has other relevant entities too.我确定您的数据可能也有其他相关实体。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM