简体   繁体   中英

MSXML removes the encoding=“UTF-8” attribute

I'm reading the following XML text from a file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<SampleXML>...</SampleXML>

and I load the UTF-8 text using IXMLDomDocument::loadXML. Then I manipulate the XML and call IXMLDomDocument::Getxml() to get a _bstr_t of the modified XML. this _bstr_t looks as follows:

<?xml version="1.0" standalone="yes"?>
<ModifiedSampleXML>...</ModifiedSampleXML>

The encoding="UTF-8" attribute in the header is gone. However, if I call IXMLDomDocument::save(FileName) to save the XML to a file, when I open the file I see that the encoding="UTF-8" attribute is preserved.

Why the encoding="UTF-8" attribute is not there when I call Getxml()? How do I tell MSXML to always preserve this attribute? (not only upon save)

The attribute "encoding='UTF-8'" is removed because Getxml() returns the loaded XML in wide character (16-bit) string. You can verify this by looking into the memory held by the returned _bstr_t .

It would be incorrect for MSXML to preserve the attribute that says the encoding is 8-bits when in fact it is 16-bits.

If however you load a unicode xml file having "encoding='UTF-16'" attribute, you will see that Getxml() will not remove the attribute.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM