简体   繁体   English

XmlDocument.Load替换“>”

[英]XmlDocument.Load Replacing “>”

When running the following code: 运行以下代码时:

 static void Main(string[] args)
    {
        var xmlDoc = new XmlDocument();

        var fileReader = new BinaryReader(File.Open(@"C:\Users\username\Desktop\doc.xlf",FileMode.Open,FileAccess.Read,FileShare.Read));
        var sourceStream = new MemoryStream(fileReader.ReadBytes((int)fileReader.BaseStream.Length));
        xmlDoc.Load(sourceStream);
    }

On a file with a node that looks like this: 在具有如下所示节点的文件上:

<source xml:lang="en-us">
        &lt;b>This text is displayed in Bold.&lt;/b>&lt;br>
        &lt;i>This text is displayed in italics.&lt;/i>
</source>

The node gets converted to the following when it is read in: 读入时,节点将转换为以下内容:

<source xml:lang="en-us">
        &lt;b&gt;This text is displayed in Bold.&lt;/b&gt;&lt;br&gt;
        &lt;i&gt;This text is displayed in italics.&lt;/i&gt;
</source>

In other words, all > are being replaced with &gt; 换句话说,所有>都被&gt;替换&gt;

Normally that would be OK (and I am even under the impression that it would be technically legal, even if bad practice), but in this case it is absolutely imperative that the node not change when it is read in. Any thoughts on either (1) how to read in the xml to allow > or (2) how to work around this issue? 通常情况下这是可以的(我甚至认为它在技术上是合法的,即使是不好的做法),但在这种情况下,当读入时节点不会改变是绝对必要的。任何想法都是( 1)如何在xml中读取以允许>或(2)如何解决此问题? Thanks! 谢谢!

Although the right angle bracket is legal in XML, there is no option on XmlDocument to avoid changing it to the corresponding entity. 尽管右尖括号在XML中是合法的,但XmlDocument上没有选项可以避免将其更改为相应的实体。

You could use a CDATA section instead: 您可以使用CDATA部分:

<source xml:lang="en-us">
    <![CDATA[&lt;b>This text is displayed in Bold.&lt;/b>&lt;br>
&lt;i>This text is displayed in italics.&lt;/i>]]>
</source>

There is no difference in XML between &gt; &gt;之间的XML没有区别 and > in values of text nodes. >在文本节点的值。 So XML parser/DOM is free to represent value in either form as it see fit. 因此,XML解析器/ DOM可以自由地以任何形式表示值。

Note that there are other normalizations in XML (whitespace and new lines) which almost guarantee that saved XML will not be byte-by-byte identical to source XML. 请注意,XML中有其他标准化(空格和新行)几乎可以保证保存的XML不会与源XML逐字节相同。

If you really need to keep > intact try using CDATA, also even that guarantees that text will not be touched - some parsers may let you keep > instead of converting to &gt; 如果你真的需要保持>完整的尝试使用CDATA,甚至可以保证不会触及文本 - 一些解析器可能会让你保持>而不是转换为&gt; on save. 在保存。

Real solution is to accept the fact that some characters will be encoded (including non-ASCII characters if saved with encoding that does not directly support) on save to produce valid XML. 真正的解决方案是接受以下事实:在保存时,某些字符将被编码(包括非ASCII字符,如果使用不直接支持的编码保存)以生成有效的XML。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM