简体   繁体   English

如何在不转义字符的情况下保存XML?

[英]How to save XML without characters being escaped?

In my C# app, XML data may contain arbitrary element text that's already been pre-processed, so that (among other things) illegal characters have been converted to their escaped (xml character entity encoded) form. 在我的C#应用​​程序中,XML数据可能包含已被预处理的任意元素文本,因此(除其他事项外)非法字符已转换为它们的转义(编码的XML字符实体)形式。

Example: <myElement>this & that</myElement> has been converted to <myElement>this &amp; that</myElement> 示例: <myElement>this & that</myElement>已转换为<myElement>this &amp; that</myElement> <myElement>this &amp; that</myElement> . <myElement>this &amp; that</myElement>

The problem is that when I use XmlTextWriter to save the file, the '&' is getting re-escaped into <myElement>this &amp;amp; that</myElement> 问题是当我使用XmlTextWriter保存文件时, “&”被重新转义到<myElement>this &amp;amp; that</myElement> <myElement>this &amp;amp; that</myElement> . <myElement>this &amp;amp; that</myElement> I don't want that extra &amp; 我不想要额外的&amp; in the string. 在字符串中。

Another example: <myElement>• bullet</myElement> , my processing changes it to <myElement>&#8226; bullet</myElement> 另一个示例: <myElement>• bullet</myElement> ,我的处理将其更改为<myElement>&#8226; bullet</myElement> <myElement>&#8226; bullet</myElement> which gets saved to <myElement>&amp;#8226; bullet</myElement> <myElement>&#8226; bullet</myElement>保存到<myElement>&amp;#8226; bullet</myElement> <myElement>&amp;#8226; bullet</myElement> . <myElement>&amp;#8226; bullet</myElement> All I want output to the file is the <myElement>&#8226; bullet</myElement> 我要输出到文件的是<myElement>&#8226; bullet</myElement> <myElement>&#8226; bullet</myElement> form. <myElement>&#8226; bullet</myElement>表单。

I've tried various options on the various XmlWriters, etc but can't seem to get the raw strings to get output correctly. 我已经在各种XmlWriters上尝试了各种选项,但是似乎无法获取原始字符串来正确获取输出。 And why can't the XML parser recognize & not rewrite already a valid escapes? 为什么XML解析器不能识别和不重写已经有效的转义符?

update: afer more debugging, I found that element text strings (actually all strings including element tags, names, attributes, etc. ) get encoded whenever they get copied into the .net xml object data (CDATA being an exception) by an internal class called XmlCharType under System.Xml. 更新:经过更多调试之后,我发现元素文本字符串(实际上是包括元素标签,名称,属性等在内的所有字符串)在通过内部类复制到.net xml对象数据(CDATA是一个例外)中时都会得到编码。在System.Xml下称为XmlCharType。 So the problem has nothing to do with the XmlWriters. 因此,问题与XmlWriters无关。 It looks like the best way to solve the problem is to un-escape the data when it's output, either by using something like: 看起来解决问题的最佳方法是使用以下方式取消转义数据:

string output = System.Net.WebUtility.HtmlDecode(xmlDoc.OuterXml);

Which will probably evolve into a custom XmlWriter in order to preserve formatting, etc. 可能会演变成自定义XmlWriter,以保留格式等。

Thanks all for the helpful suggestions. 感谢所有有用的建议。

calling xmlwriter.writeraw instead. 而是调用xmlwriter.writeraw。 But it is not smart enough to check the characters are valid or not. 但是,检查字符是否有效还不够聪明。 So you have to check by yourself otherwise an invalid xml will be generated. 因此,您必须自己检查,否则将生成无效的xml。

Ok, here's the solution I came up with: 好的,这是我想出的解决方案:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Runtime.Versioning;
using System.Text;

namespace YourName {

    // Represents a writer that makes it possible to pre-process 
    // XML character entity escapes without them being rewritten.
    class XmlRawTextWriter : System.Xml.XmlTextWriter {
        public XmlRawTextWriter(Stream w, Encoding encoding)
            : base(w, encoding) {
        }

        public XmlRawTextWriter(String filename, Encoding encoding)
            : base(filename, encoding) {
        }

        public override void WriteString(string text) {
            base.WriteRaw(text);
        }
    }
}

then using that as you would XmlTextWriter: 然后像使用XmlTextWriter那样使用它:

        XmlRawTextWriter rawWriter = new XmlRawTextWriter(thisFilespec, Encoding.UTF8);
        rawWriter.Formatting = Formatting.Indented;
        rawWriter.Indentation = 1;
        rawWriter.IndentChar = '\t';
        xmlDoc.Save(rawWriter);

This works without having to un-encode or hack around the encoding functionality. 无需取消编码或修改编码功能即可使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM