简体   繁体   English

以XML存储关系数据

[英]Storing Relational Data in XML

I'm wondering what the best practices are for storing a relational data structure in XML. 我想知道在XML中存储关系数据结构的最佳实践是什么。 Particulary, I am wondering about best practices for enforcing node order. 特别是,我想知道执行节点顺序的最佳实践。 For example, say I have three objects: School , Course , and Student , which are defined as follows: 例如,假设我有三个对象: SchoolCourseStudent ,它们的定义如下:

class School
{
    List<Course> Courses;
    List<Student> Students;
}

class Course
{
    string Number;
    string Description;
}

class Student
{
    string Name;
    List<Course> EnrolledIn;
}

I would store such a data structure in XML like so: 我将这样的数据结构存储在XML中,如下所示:

<School>
    <Courses>
        <Course Number="ENGL 101" Description="English I" />
        <Course Number="CHEM 102" Description="General Inorganic Chemistry" />
        <Course Number="MATH 103" Description="Trigonometry" />
    </Courses>
    <Students>
        <Student Name="Jack">
            <EnrolledIn>
                <Course Number="CHEM 102" />
                <Course Number="MATH 103" />
            </EnrolledIn>
        </Student>
        <Student Name="Jill">
            <EnrolledIn>
                <Course Number="ENGL 101" />
                <Course Number="MATH 103" />
            </EnrolledIn>
        </Student>
    </Students>
</School>

With the XML ordered this way, I can parse Courses first. 通过这种方式订购XML,我可以首先解析Courses Then, when I parse Students , I can look up each Course listed in EnrolledIn (by its Number ) in the School.Courses list. 然后,当我分析Students ,我可以查看每个Course中列出EnrolledIn (其Number在) School.Courses名单。 This will give me an object reference to add to the EnrolledIn list in Student . 这将为我提供一个对象引用,以添加到StudentEnrolledIn列表中。 If Students , however, comes before Courses , such a lookup to get a object reference is not possible. 但是,如果Students Courses 之前来了,则无法进行这种查找以获取对象引用。 (Since School.Courses has not yet been populated.) (自School.Courses以来尚未填充。)

So what are the best practices for storing relational data in XML? 那么在XML中存储关系数据的最佳实践是什么? - Should I enforce that Courses must always come before Students ? -我是否应该强制要求Courses必须始终摆在Students面前? - Should I tolerate any ordering and create a stub Course object whenever I encounter one I have not yet seen? -每当遇到未见过的对象时,是否应该允许任何顺序并创建存根Course对象? (To be expanded when the definition of the Course is eventually reached later.) - Is there some other way I should be persisting/loading my objects to/from XML? (将在稍后最终定义Course时进行扩展。)-是否还有其他方法可以将对象持久化到XML中或从XML中加载对象? (I am currently implementing Save and Load methods on all my business objects and doing all this manually using System.Xml.XmlDocument and its associated classes.) (我目前正在所有业务对象上实现Save and Load方法,并使用System.Xml.XmlDocument及其关联的类手动完成所有操作。)

I am used to working with relational data out of SQL, but this is my first experience trying to store a non-trivial relational data structure in XML. 我习惯于使用SQL之外的关系数据,但这是我第一次尝试以XML存储非平凡的关系数据结构的经验。 Any advice you can provide as to how I should proceed would be greatly appreciated. 您能提供有关我应该如何进行的任何建议,将不胜感激。

While you can specify order of child elements using a <xsd:sequence>, by requiring child objects to come in specific order you make your system less flexible (ie, harder to update using notepad). 虽然可以使用<xsd:sequence>指定子元素的顺序,但是通过要求子对象按特定顺序排列,则会使系统的灵活性降低(即,更难使用记事本更新)。

Best thing to do is to parse out all your data, then perform what actions you need to do. 最好的办法是解析所有数据,然后执行需要执行的操作。 Don't act during the parse. 在解析期间不要行动。


Obviously, the design of the XML and the data behind it precludes serializing a single POCO to XML. 显然,XML的设计及其背后的数据使得无法将单个POCO序列化为XML。 You need to control the serialization and deserialization logic in order to unhook and re-hook objects together. 您需要控制序列化和反序列化逻辑,以将对象解钩和重新钩在一起。

I'd suggest creating a custom serializer that builds the xml representation of this object graph. 我建议创建一个自定义的序列化程序,以构建该对象图的xml表示形式。 It can thereby control not only the order of serialization, but also handle situations where nodes aren't in the expected order. 因此,它不仅可以控制序列化的顺序,还可以处理节点不在预期顺序中的情况。 You could do other things such as adding custom attributes to use for linking objects together which don't exist as public properties on the objects being serialized. 您可以执行其他操作,例如添加自定义属性以将对象链接在一起,而这些属性在序列化对象上不作为公共属性存在。

Creating the xml would be as simple as iterating over your objects a few times, building up collections of XElements with the expected representation of the objects as xml. 创建xml就像遍历对象几次一样简单,使用对象的预期表示形式以xml的形式构建XElement的集合。 When you're done you can stitch them together into an XDocument and grab the xml from it. 完成后,您可以将它们缝合在一起成为XDocument并从中获取xml。 You can make multiple passes over the xml on the reverse side to re-create your object graph and restore all references. 您可以在反面的xml上进行多次传递,以重新创建对象图并恢复所有引用。

Don't think in SQL or relational when working with XML, because there are no order constraints. 使用XML时不要考虑使用SQL或关系式,因为没有顺序限制。

You can however query using XPath to any portion of the XML document at any time. 但是,您可以随时使用XPath查询XML文档的任何部分。 You want the courses first, then "//Courses/Course". 您首先要课程,然后是“ //课程/课程”。 You want the students enrollments next, then "//Students/Student/EnrolledIn/Course". 您要接下来的学生入学,然后是“ //学生/学生/已入学/课程”。

The bottom line being... just because XML is stored in a file, don't get caught thinking all your accesses are serial. 最重要的是...仅仅因为XML存储在文件中,所以不要以为您的所有访问都是串行访问就可以了。


I posted a separate question, "Can XPath do a foreign key lookup across two subtrees of an XML?" 我发布了一个单独的问题, “ XPath可以在XML的两个子树之间执行外键查找吗?” , in order to clarify my position. ,以阐明我的立场。 The solution shows how you can use XPath to make relational queries against XML data. 该解决方案说明了如何使用XPath对XML数据进行关系查询。

Node ordering is only important if you need to do forward-only processing of the data, eg using an XmlReader or a SAX parser. 仅当您需要对数据进行仅前向处理(例如,使用XmlReader或SAX解析器)时,节点顺序才重要。 If you're going to read the XML into a DOM before processing it (which you are if you're using XmlDocument), node order doesn't really matter. 如果要在处理XML之前将XML读取到DOM中(如果使用的是XmlDocument,则是XML),则节点顺序并不重要。 What matters more is that the XML be structured so that you can query it with XPath efficiently, ie without having to use "//". 更重要的是XML的结构使得您可以有效地使用XPath查询它,即不必使用“ //”。

If you take a look at the schema that the DataSetGenerator produces, you'll see that there's no ordering associated with the DataTable-level elements. 如果您看一下DataSetGenerator生成的模式,您会发现与DataTable级元素没有任何关联。 It may be that ADO processes elements in some sequence not represented in the schema (eg one DataTable at a time), or it may be that ADO does forward-only processing and doesn't enforce relational constraints until the DataSet is fully read. 可能是ADO以某种未在架构中表示的顺序处理元素(例如,一次只能处理一个DataTable),也可能是ADO执行仅前向处理并且不强制执行关系约束,直到完全读取DataSet为止。 I don't know. 我不知道。 But it's clear that ADO doesn't couple the processing order to the document order. 但是很明显,ADO不会将处理顺序与文档顺序耦合在一起。

(And yes, you can specify the order of child elements in an XML schema; that's what xs:sequence does. If you don't want node order to be enforced, you use an unbounded xs:choice.) (是的,您可以在XML模式中指定子元素的顺序;这就是xs:sequence的作用。如果您不希望强制执行节点顺序,请使用无限制的xs:choice。)

From experience, XML isn't the best to store relational data. 从经验来看,XML并不是存储关系数据的最佳方法。 Have you investigated YAML ? 您调查过YAML吗? Do you have the option? 你有选择吗?

If you don't, a safe way would be to have a strict DTD for the XML and enforce that way. 如果您不这样做,一种安全的方法是为XML设置严格的DTD并强制执行该方法。 You could also, as you suggest, keep a hash of objects created. 如您所建议,您还可以保留创建的对象的哈希。 That way if a Student creates a Course you keep that Course around for future updating when the tag is hit. 这样,如果学生创建课程,则可以保留该课程,以便将来在命中标签时进行更新。

Also remember you can use XPath queries to access specific nodes directly, so you can enforce parsing of courses first regardless of position in the XML document. 还要记住,您可以使用XPath查询直接访问特定的节点,因此无论XML文档中的位置如何,都可以首先强制进行课程分析。 (making a more complete answer, thanks to dacracot) (感谢dacracot,使答案更加完整)

The order is not usually important in XML. 在XML中,顺序通常并不重要。 In this case the Courses could come after Students . 在这种情况下, Courses可以紧随Students之后。 You parse the XML and then you make your queries on the entire data. 您解析XML,然后对整个数据进行查询。

XML is definitely not a friendly place for relational data. XML绝对不是关系数据的友好之地。

If you absolutely need to do this, then I'd recommend a funky inverted kind of logic. 如果您绝对需要这样做,那么我建议您使用一种时髦的倒置逻辑。

In your example, you've got Schools, which offers many courses, taken by many students. 在您的示例中,您拥有学校,该学校提供许多学生参加的许多课程。

Your XML might follow as such: 您的XML可能如下所示:

<School>
    <Students>
        <Student Name="Jack">
            <EnrolledIn>
                <Course Number="CHEM 102" Description="General Inorganic Chemistry" />
                <Course Number="MATH 103" Description="Trigonometry" />
            </EnrolledIn>
        </Student>
        <Student Name="Jill">
            <EnrolledIn>
                <Course Number="ENGL 101" Description="English I" />
                <Course Number="MATH 103" Description="Trigonometry" />
            </EnrolledIn>
        </Student>
    </Students>
</School>

This obviously isn't the least repetitive way to do this (it's relational data!), but it's easily parse-able. 这显然不是执行此操作的最重复的方法(它是关系数据!),但是它很容易解析。

You could also use two XML files, one for courses and a second for students. 您还可以使用两个XML文件,一个用于课程,另一个用于学生。 Open and parse the first before you do the second. 打开并解析第一个,然后再执行第二个。

I's been a while, but I seem to remember doing a base collection of 'things' in one part of an xml file, and referring to them in another using the schema features keyref and refer . 我已经有一段时间了,但是我似乎还记得在xml文件的一部分中做过“事物”的基础集合,并使用模式功能keyrefRefer 引用了它们 I found a few examples here . 我在这里找到了一些例子。 My apologies if this is not what you're looking for. 如果这不是您想要的,我深表歉意。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM