简体   繁体   English

MSXML删除了encoding =“ UTF-8”属性

[英]MSXML removes the encoding=“UTF-8” attribute

I'm reading the following XML text from a file: 我正在从文件中读取以下XML文本:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<SampleXML>...</SampleXML>

and I load the UTF-8 text using IXMLDomDocument::loadXML. 然后使用IXMLDomDocument :: loadXML加载UTF-8文本。 Then I manipulate the XML and call IXMLDomDocument::Getxml() to get a _bstr_t of the modified XML. 然后,我操纵XML并调用IXMLDomDocument :: Getxml()以获取修改后的XML的_bstr_t。 this _bstr_t looks as follows: _bstr_t如下所示:

<?xml version="1.0" standalone="yes"?>
<ModifiedSampleXML>...</ModifiedSampleXML>

The encoding="UTF-8" attribute in the header is gone. 标头中的encoding =“ UTF-8”属性消失了。 However, if I call IXMLDomDocument::save(FileName) to save the XML to a file, when I open the file I see that the encoding="UTF-8" attribute is preserved. 但是,如果我调用IXMLDomDocument :: save(FileName)将XML保存到文件中,则当我打开文件时,会看到保留了encoding =“ UTF-8”属性。

Why the encoding="UTF-8" attribute is not there when I call Getxml()? 为什么在我调用Getxml()时没有encoding =“ UTF-8”属性? How do I tell MSXML to always preserve this attribute? 如何告诉MSXML始终保留此属性? (not only upon save) (不仅是保存时)

The attribute "encoding='UTF-8'" is removed because Getxml() returns the loaded XML in wide character (16-bit) string. 删除了属性“ encoding ='UTF-8'”,因为Getxml()以宽字符(16位)字符串返回已加载的XML。 You can verify this by looking into the memory held by the returned _bstr_t . 您可以通过查看返回的_bstr_t拥有的内存来验证这一点。

It would be incorrect for MSXML to preserve the attribute that says the encoding is 8-bits when in fact it is 16-bits. MSXML保留实际上是16位的编码为8位的属性,这是不正确的。

If however you load a unicode xml file having "encoding='UTF-16'" attribute, you will see that Getxml() will not remove the attribute. 但是,如果加载具有“ encoding ='UTF-16'”属性的unicode xml文件,则会看到Getxml()不会删除该属性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM