简体   繁体   English

许多类型的XML反序列化

[英]XML Deserialization of many types

I am deserializing a large xml doc into a C# object. 我正在将一个大型xml文档反序列化为C#对象。

I've run into an issue where there are multiple xml elements on the same line, and am having trouble re-constructing them properly in code. 我遇到了一个问题,即同一行上有多个xml元素,并且在代码中正确地重构它们时遇到了麻烦。

A snippet example as so: 片段示例如下:

<parent> 
    <ce:para view="all">
     Text <ce:cross-ref refid="123">[1]</ce:cross-ref> More Text <ce:italic>Italicized text</ce:italic> and more text here
    </ce:para>
    <ce:para>...</ce:para>
</parent>

The generated C# class looks like this 生成的C#类如下所示

[XmlRoot(ElementName = "para", Namespace = "namespace")]
public class Para
{
    [XmlElement(ElementName = "cross-ref", Namespace = "namespace")]
    public List<Crossref> Crossref { get; set; }

    [XmlText]
    public List<string> Text { get; set; }

    [XmlElement(ElementName = "italic", Namespace = "namespace")]
    public List<Italic> Italic { get; set; }
}

I want to be able to loop over this object and re-construct the sentence as a plain string. 我希望能够遍历此对象并将句子重新构造为纯字符串。

Text [1] More Text Italicized Text and more text here 文字[1]更多斜体文字和更多文字在这里

The only problem is though when the deserialization happens, the order is lost as each bit is stuck into it's respective object. 唯一的问题是,当反序列化发生时,由于每个位卡在其各自的对象中而丢失了顺序。 This means I have no way of knowing how to reconstruct the string back to how it is supposed to be. 这意味着我无法知道如何将字符串重新构造成应该的样子。

Text: {"Text", "More Text", "and more text here"}
Crossref: {"[1]"}
Italic: {"Italicized Text"}

I've thought about bringing in the whole element in as a string, and then scrubbing the tags out of it, but I'm not sure how to properly get it deserialized. 我曾考虑过将整个元素作为字符串输入,然后从中清除标签,但是我不确定如何正确地反序列化它。 Or if there is a better way to go about it. 或者,如果有更好的方法可以解决此问题。

Disclaimer: I am not able to alter the XML document as it is coming in from a 3rd party. 免责声明:我无法更改XML文档,因为它来自第三方。

Thanks 谢谢

Once you have deserialized the 3rd party XML into an object that directly matches the XML's schema (as you have done already in your example above) you should be able to use XmlNode.InnerText() on the <ce:para node to extract what you're looking for without having to write any parsing code. 将第三方XML反序列化为与XML模式直接匹配的对象后(如上面的示例中所做的那样),您应该能够在<ce:para节点上使用XmlNode.InnerText()来提取您要的内容寻找而不必编写任何解析代码。

At that point, you could do a translation from the object you deserialized into from the raw 3rd party XML into an object which flattens out the <ce:para node into a simple string. 那时,您可以将反序列化的对象从原始的第三方XML转换为一个对象,该对象将<ce:para节点展平为简单的字符串。

As per Chris' request, I'm posting my solution. 根据Chris的要求,我正在发布解决方案。 It probably could used refining as I'm not very experienced with linq queries. 由于我对linq查询的经验不是很丰富,因此可能可以使用优化。

XDocument xdoc = xmlAdapter.GetAsXDoc(xmlstring);

IEnumerable<XElement> body = from b in xdoc.Descendants()
                                     where b.Name.LocalName == "body"
                                     select b;

IEnumerable<XElement> sections = from s in body.Descendants()
                                         where s.Name.LocalName == "sections"
                                         select s;

IEnumerable<XElement> paragraphs = from p in sections.Descendants()
                                           where p.Name.LocalName == "para"
                                           select p;

string bodytext = "";
if (paragraphs.Count() > 0)
{
    StringBuilder text = new StringBuilder();
    foreach (XElement p in paragraphs)
    {
        text.AppendFormat("{0} ", p.Value);
    }
}

bodytext = text.ToString();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM