简体   繁体   English

C#XMlSerializer中具有特殊字符的XML反序列化

[英]XML Deserialization with special characters in C# XMlSerializer

I have an xml sheet which contains some special character "& is the special character causing issues" and i use below code to deserialize XML 我有一个包含一些特殊字符“&是引起问题的特殊字符的xml工作表,我使用下面的代码反序列化XML

           XMLDATAMODEL imported_data;

            // Create an instance of the XmlSerializer specifying type and namespace.
            XmlSerializer serializer = new XmlSerializer(typeof(XMLDATAMODEL));

            // A FileStream is needed to read the XML document.
            FileStream fs = new FileStream(path, FileMode.Open);
            XmlReader reader = XmlReader.Create(fs);


            // Use the Deserialize method to restore the object's state.
            imported_data = (XMLDATAMODEL)serializer.Deserialize(reader);
            fs.Close();

and structre of my XML MOdel is like this 我的XML MOdel的structre是这样的

    [XmlRoot(ElementName = "XMLDATAMODEL")]
    public class XMLDATAMODEL
    {
        [XmlElement(ElementName = "EventName")]
        public string EventName { get; set; }
        [XmlElement(ElementName = "Location")]
        public string Location { get; set; }
    }

I tried this code as well with Encoding mentioned but no success 我也提到了Encoding尝试过此代码,但没有成功

            // Declare an object variable of the type to be deserialized.

            StreamReader streamReader = new StreamReader(path, System.Text.Encoding.UTF8, true);
            XmlSerializer serializer = new XmlSerializer(typeof(XMLDATAMODEL));
            imported_data = (XMLDATAMODEL)serializer.Deserialize(streamReader);
            streamReader.Close();

Both approaches failed and if i put special character inside Cdata it looks working. 两种方法都失败了,如果我在Cdata中放入特殊字符,它似乎可以工作。 How can i make it work for xml data without CData as well? 我怎样才能使其在没有CData的情况下也适用于xml数据?

Here is my XML file content 这是我的XML文件内容

http://pastebin.com/Cy7icrgS http://pastebin.com/Cy7icrgS

And error i am getting is There is an error in XML document (2, 17). 我得到的错误是XML文档(2,17)中有错误。

The best answer I could get after looking around is, unless you serialize the data yourself, it will be pretty trouble some to deserialize XML will special characters. 环顾 四周 后,我能得到的最佳答案是,除非您自己对数据进行序列化,否则将XML反序列化特殊字符会有些麻烦。

For your case, since the special character is & before you can deserialize it, you should convert it to & 对于您的情况,由于特殊字符为&然后才能将其反序列化,因此应将其转换为& Unless the character & is converted to & 除非字符&转换为& we cannot really deserialize it with XmlSerializer. 我们无法使用XmlSerializer对其进行反序列化。 Yes, we still can read it by using 是的,我们仍然可以通过使用

XmlReaderSettings settings = new XmlReaderSettings();
settings.CheckCharacters = false; //not to check false character, this setting can be set.
FileStream fs = new FileStream(xmlfolder + "\\xmltest.xml", FileMode.Open);
XmlReader reader = XmlReader.Create(fs, settings);

But we cannot deserialize it. 但是我们不能反序列化它。

As how to convert & to & 至于如何将&转换为& , there are various ways with plus and minus. ,有加号和减号的各种方式。 But the bottom line in all conversion is, do not use stream directly . 但是所有转换的底线是, 不要直接使用stream Just take the data from the file and convert it to string by using, for example, File.ReadAllText and start doing the string processing. 只需从文件中获取数据,然后使用例如File.ReadAllText将其转换为string ,然后开始进行字符串处理即可。 After that, convert it to MemoryStream and start the deserialization; 之后, 将其转换MemoryStream并开始反序列化;

And now for the string processing before deserialization, there are couple of ways to do it. 现在,对于反序列化之前的字符串处理,有两种方法可以实现。

The easiest, and most of the time could be the most unsafe, would be by using string.Replace("&", "&") . 使用string.Replace("&", "&")最简单,而且在大多数情况下可能是最不安全的。

The other way, harder but safer, is by using Regex . 另一种更困难但更安全的方法是使用Regex Since your case is something inside CData , this could be a good way too. 由于您的案例是CData内部的内容,因此这也是一个好方法。

Another way harder yet safer, by creating your parsing for line by line. 通过逐行创建您的解析,另一种更难却更安全的方式。

I have yet to find what is the common, safe, way for this conversion. 我尚未找到进行此转换的常见,安全的方法。

But as for your example, the string.Replace would work. 但以您的示例为例, string.Replace将起作用。 Also, you could potentially exploit the pattern (something inside CData ) to use Regex. 此外,您可能会利用模式( CData )来使用Regex。 This could be a good way too. 这也可能是一个好方法。

Edit: 编辑:

As for what are considered as special characters in XML and how to process them before hand, according to this , non-Roman characters are included. 至于根据什么被认为是XML,以及如何前手处理这些特殊字符, 这样 ,非罗马字符都包括在内。

Apart from the non-Roman characters, in here , there are 5 special characters listed: 除了非罗马字符外, 此处还列出了5个特殊字符:

<   ->  &lt;
>   ->  &gt;
"   ->  &quot;
'   ->  &apos;
&   ->  &amp;

And from here , we get one more: 这里 ,我们又得到了:

%   -> &#37;

Hope they can help you! 希望他们能为您服务!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM