简体   繁体   English

NET中正确的XML序列化和“混合”类型的反序列化

[英]Correct XML serialization and deserialization of “mixed” types in .NET

My current task involves writing a class library for processing HL7 CDA files. 我当前的任务涉及编写用于处理HL7 CDA文件的类库。
These HL7 CDA files are XML files with a defined XML schema, so I used xsd.exe to generate .NET classes for XML serialization and deserialization. 这些HL7 CDA文件是具有定义的XML模式的XML文件,因此我使用xsd.exe生成.NET类以进行XML序列化和反序列化。

The XML Schema contains various types which contain the mixed="true" attribute , specifying that an XML node of this type may contain normal text mixed with other XML nodes. XML模式包含各种类型,这些类型包含mixed =“ true”属性 ,该属性指定此类型的XML节点可以包含与其他XML节点混合的普通文本。
The relevant part of the XML schema for one of these types looks like this: 这些类型之一的XML模式的相关部分如下所示:

<xs:complexType name="StrucDoc.Paragraph" mixed="true">
    <xs:sequence>
        <xs:element name="caption" type="StrucDoc.Caption" minOccurs="0"/>
        <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element name="br" type="StrucDoc.Br"/>
            <xs:element name="sub" type="StrucDoc.Sub"/>
            <xs:element name="sup" type="StrucDoc.Sup"/>
            <!-- ...other possible nodes... -->
        </xs:choice>
    </xs:sequence>
    <xs:attribute name="ID" type="xs:ID"/>
    <!-- ...other attributes... -->
</xs:complexType>

The generated code for this type looks like this: 为该类型生成的代码如下所示:

/// <remarks/>
[System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038")]
[System.SerializableAttribute()]
[System.Diagnostics.DebuggerStepThroughAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(TypeName="StrucDoc.Paragraph", Namespace="urn:hl7-org:v3")]
public partial class StrucDocParagraph {

    private StrucDocCaption captionField;

    private object[] itemsField;

    private string[] textField;

    private string idField;

    // ...fields for other attributes...

    /// <remarks/>
    public StrucDocCaption caption {
        get {
            return this.captionField;
        }
        set {
            this.captionField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("br", typeof(StrucDocBr))]
    [System.Xml.Serialization.XmlElementAttribute("sub", typeof(StrucDocSub))]
    [System.Xml.Serialization.XmlElementAttribute("sup", typeof(StrucDocSup))]
    // ...other possible nodes...
    public object[] Items {
        get {
            return this.itemsField;
        }
        set {
            this.itemsField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlTextAttribute()]
    public string[] Text {
        get {
            return this.textField;
        }
        set {
            this.textField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlAttributeAttribute(DataType="ID")]
    public string ID {
        get {
            return this.idField;
        }
        set {
            this.idField = value;
        }
    }

    // ...properties for other attributes...
}

If I deserialize an XML element where the paragraph node looks like this: 如果我反序列化段落节点如下所示的XML元素:

<paragraph>first line<br /><br />third line</paragraph>

The result is that the item and text arrays are read like this: 结果是像这样读取item和text数组:

itemsField = new object[]
{
    new StrucDocBr(),
    new StrucDocBr(),
};
textField = new string[]
{
    "first line",
    "third line",
};

From this there is no possible way to determine the exact order of the text and the other nodes. 由此无法确定文本和其他节点的确切顺序。
If I serialize this again, the result looks exactly like this: 如果我再次序列化它,结果将看起来完全像这样:

<paragraph>
    <br />
    <br />first linethird line
</paragraph>

The default serializer just serializes the items first and then the text. 默认的序列化程序只会先序列化项目,然后再序列化文本。

I tried implementing IXmlSerializable on the StrucDocParagraph class so that I could control the deserialization and serialization of the content, but it's rather complex since there are so many classes involved and I didn't come to a solution yet because I don't know if the effort pays off. 我尝试在StrucDocParagraph类上实现IXmlSerializable ,以便可以控制内容的反序列化和序列化,但是由于涉及的类太多,而且还没有找到解决方案,因为它不知道是否存在,所以它相当复杂。努力得到回报。

Is there some kind of easy workaround to this problem, or is it even possible by doing custom serialization via IXmlSerializable ? 是否有某种简单的解决方法解决此问题,或者是否有可能通过IXmlSerializable执行自定义序列化? Or should I just use XmlDocument or XmlReader / XmlWriter to process these documents? 还是应该只使用XmlDocumentXmlReader / XmlWriter处理这些文档?

To solve this problem I had to modify the generated classes: 为了解决这个问题,我不得不修改生成的类:

  1. Move the XmlTextAttribute from the Text property to the Items property and add the parameter Type = typeof(string) XmlTextAttributeText属性移到Items属性,然后添加参数Type = typeof(string)
  2. Remove the Text property 删除Text属性
  3. Remove the textField field 删除textField字段

As a result the generated code (modified) looks like this: 结果, 生成的代码(已修改)如下所示:

/// <remarks/>
[System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038")]
[System.SerializableAttribute()]
[System.Diagnostics.DebuggerStepThroughAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(TypeName="StrucDoc.Paragraph", Namespace="urn:hl7-org:v3")]
public partial class StrucDocParagraph {

    private StrucDocCaption captionField;

    private object[] itemsField;

    private string idField;

    // ...fields for other attributes...

    /// <remarks/>
    public StrucDocCaption caption {
        get {
            return this.captionField;
        }
        set {
            this.captionField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("br", typeof(StrucDocBr))]
    [System.Xml.Serialization.XmlElementAttribute("sub", typeof(StrucDocSub))]
    [System.Xml.Serialization.XmlElementAttribute("sup", typeof(StrucDocSup))]
    // ...other possible nodes...
    [System.Xml.Serialization.XmlTextAttribute(typeof(string))]
    public object[] Items {
        get {
            return this.itemsField;
        }
        set {
            this.itemsField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlAttributeAttribute(DataType="ID")]
    public string ID {
        get {
            return this.idField;
        }
        set {
            this.idField = value;
        }
    }

    // ...properties for other attributes...
}

Now if I deserialize an XML element where the paragraph node looks like this: 现在,如果我反序列化段落节点如下所示的XML元素:

<paragraph>first line<br /><br />third line</paragraph>

The result is that the item array is read like this: 结果是像这样读取item数组:

itemsField = new object[]
{
    "first line",
    new StrucDocBr(),
    new StrucDocBr(),
    "third line",
};

This is exactly what I need , the order of the items and their content is correct . 正是我所需要的 ,项目的顺序及其内容是正确的
And if I serialize this again, the result is again correct: 如果我再次序列化此结果,则结果再次正确:

<paragraph>first line<br /><br />third line</paragraph>

What pointed me in the right direction was the answer by Guillaume, I also thought that it must be possible like this. 正确的方向指向我的就是纪尧姆(Guillaume)的答案,我还认为这样做一定是可能的。 And then there was this in the MSDN documentation to XmlTextAttribute : 然后在MSDN文档中有XmlTextAttribute

You can apply the XmlTextAttribute to a field or property that returns an array of strings. 您可以将XmlTextAttribute应用于返回字符串数组的字段或属性。 You can also apply the attribute to an array of type Object but you must set the Type property to string. 您也可以将属性应用于Object类型的数组,但是必须将Type属性设置为string。 In that case, any strings inserted into the array are serialized as XML text. 在这种情况下,插入到数组中的所有字符串都将序列化为XML文本。

So the serialization and deserialization work correct now, but I don't know if there are any other side effects. 因此,序列化和反序列化现在可以正常工作,但是我不知道是否还有其他副作用。 Maybe it's not possible to generate a schema from these classes with xsd.exe anymore, but I don't need that anyway. 也许不可能再使用xsd.exe从这些类中生成模式了,但是我还是不需要它。

I had the same problem as this, and came across this solution of altering the .cs generated by xsd.exe. 我遇到了同样的问题,并遇到了更改xsd.exe生成的.cs的解决方案。 Although it did work, I wasn't comfortable with altering the generated code, as I would need to remember to do it any time I regenerated the classes. 尽管它确实有效,但是我对修改生成的代码并不满意,因为我需要在每次重新生成类时记住这样做。 It also led to some awkward code which had to test for and cast to XmlNode[] for the mailto elements. 这也导致了一些尴尬的代码,这些代码必须测试并强制转换为mailto元素的XmlNode []。

My solution was to rethink the xsd. 我的解决方案是重新考虑xsd。 I ditched the use of the mixed type, and essentially defined my own mixed type. 我放弃了使用混合类型,并本质上定义了我自己的混合类型。

I had this 我有这个

XML: <text>some text <mailto>me@email.com</mailto>some more text</text>

<xs:complexType name="text" mixed="true">
    <xs:sequence>
      <xs:element minOccurs="0" maxOccurs="unbounded" name="mailto" type="xs:string" />
    </xs:sequence>
  </xs:complexType>

and changed to 并更改为

XML: <mytext><text>some text </text><mailto>me@email.com</mailto><text>some more text</text></mytext>

<xs:complexType name="mytext">
    <xs:sequence>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element name="text">
          <xs:complexType>
            <xs:simpleContent>
              <xs:extension base="xs:string" />
            </xs:simpleContent>
          </xs:complexType>
        </xs:element>
        <xs:element name="mailto">
          <xs:complexType>
            <xs:simpleContent>
              <xs:extension base="xs:string" />
            </xs:simpleContent>
          </xs:complexType>
        </xs:element>
      </xs:choice>
    </xs:sequence>
  </xs:complexType>

My generated code now gives me a class myText: 现在,我生成的代码为我提供了一个myText类:

public partial class myText{

    private object[] itemsField;

    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("mailto", typeof(myTextTextMailto))]
    [System.Xml.Serialization.XmlElementAttribute("text", typeof(myTextText))]
    public object[] Items {
        get {
            return this.itemsField;
        }
        set {
            this.itemsField = value;
        }
    }
}

the order of the elements is now preserved in the serilization/deserialisation, but i do have to test for/ cast to/program against the types myTextTextMailto and myTextText . 元素的顺序现在保留在序列化/反序列化中,但是我必须针对myTextTextMailtomyTextText类型进行测试/ myTextTextMailto转换/编程。

Just thought I'd throw that in as an alternative approach which worked for me. 只是以为我会将它作为对我有用的替代方法。

What about 关于什么

itemsField = new object[] 
{ 
    "first line", 
    new StrucDocBr(), 
    new StrucDocBr(), 
    "third line", 
};

?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM