简体   繁体   中英

Default encoding for variant bstr to std::string conversion

I have a variant bstr that was pulled from MSXML DOM, so it is in UTF-16. I'm trying to figure out what default encoding occurs with this conversion:

VARIANT vtNodeValue;
pNode->get_nodeValue(&vtNodeValue);
string strValue = (char*)_bstr_t(vtNodeValue);

From testing, I believe that the default encoding is either Windows-1252 or Ascii, but am not sure.

Btw, this is the chunk of code that I am fixing and converting the variant to a wstring and going to a multi-byte encoding with a call to WideCharToMultiByte.

Thanks!

The operator char* method calls _com_util::ConvertBSTRToString() . The documentation is pretty unhelpful, but I assume it uses the current locale settings to do the conversion.

Update:

Internally, _com_util::ConvertBSTRToString() calls WideCharToMultiByte , passing zero for all the code-page and default character parameters. This is the same as passing CP_ACP , which means to use the system's current ANSI code-page setting (not the current thread setting).

If you want to avoid losing data, you should probably call WideCharToMultiByte directly and use CP_UTF8 . You can still treat the string as a null-terminated single-byte string and use std::string , you just can't treat bytes as characters.

std::string by itself doesn't specify/contain any encoding. It is merely a sequence of bytes. The same holds for std::wstring , which is merely a sequence of wchar_t s (double-byte words, on Win32).

By converting _bstr_t to a char* through its operator char* , you'll simply get a pointer to the raw data. According to MSDN , this data consists of wide characters, that is, wchar_t s, which represent UTF-16.

I'm surprised that it actually works to construct a std::string from this; you should not get past the first zero byte (which occurs soon, if your original string is English).

But since wstring is a string of wchar_t , you should be able to construct one directly from the _bstr_t , as follows:

_bstr_t tmp(vtNodeValue);
wstring strValue((wchar_t*)tmp, tmp.length());

(I'm not sure about length ; is it the number of bytes or the number of characters?) Then, you'll have a wstring that's encoded in UTF-16 on which you can call WideCharToMultiByte .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM