变量bstr到std :: string转换的默认编码

Question

I have a variant bstr that was pulled from MSXML DOM, so it is in UTF-16. 我有一个从MSXML DOM中提取的变体bstr，所以它是UTF-16。 I'm trying to figure out what default encoding occurs with this conversion: 我试图找出此转换发生的默认编码：

VARIANT vtNodeValue;
pNode->get_nodeValue(&vtNodeValue);
string strValue = (char*)_bstr_t(vtNodeValue);

From testing, I believe that the default encoding is either Windows-1252 or Ascii, but am not sure. 从测试来看，我认为默认编码是Windows-1252或Ascii，但我不确定。

Btw, this is the chunk of code that I am fixing and converting the variant to a wstring and going to a multi-byte encoding with a call to WideCharToMultiByte. 顺便说一句，这是我正在修复的代码块，并将变量转换为wstring，并通过调用WideCharToMultiByte进行多字节编码。

Thanks! 谢谢！

Answer 1

The operator char* method calls _com_util::ConvertBSTRToString() . operator char*方法调用_com_util::ConvertBSTRToString() 。 The documentation is pretty unhelpful, but I assume it uses the current locale settings to do the conversion. 该文档非常无用，但我认为它使用当前的语言环境设置来进行转换。

Update: 更新：

Internally, _com_util::ConvertBSTRToString() calls WideCharToMultiByte , passing zero for all the code-page and default character parameters. 在内部， _com_util::ConvertBSTRToString()调用WideCharToMultiByte ，为所有代码页和默认字符参数传递零。 This is the same as passing CP_ACP , which means to use the system's current ANSI code-page setting (not the current thread setting). 这与传递CP_ACP相同，这意味着使用系统当前的ANSI代码页设置（而不是当前的线程设置）。

If you want to avoid losing data, you should probably call WideCharToMultiByte directly and use CP_UTF8 . 如果您想避免丢失数据，您应该直接调用WideCharToMultiByte并使用CP_UTF8 。 You can still treat the string as a null-terminated single-byte string and use std::string , you just can't treat bytes as characters. 您仍然可以将字符串视为以空字符结尾的单字节字符串并使用std::string ，您只是不能将字节视为字符。

Answer 2

std::string by itself doesn't specify/contain any encoding. std::string本身不指定/包含任何编码。 It is merely a sequence of bytes. 它只是一个字节序列。 The same holds for std::wstring , which is merely a sequence of wchar_t s (double-byte words, on Win32). 对于std::wstring ，它只是wchar_t的序列（Win32上的双字节字）。

By converting _bstr_t to a char* through its operator char* , you'll simply get a pointer to the raw data. 通过转换_bstr_t到一个char*通过其运营的char *中，只要获得一个指向原始数据。 According to MSDN , this data consists of wide characters, that is, wchar_t s, which represent UTF-16. 根据MSDN ，这个数据由宽字符组成，即wchar_t ，代表UTF-16。

I'm surprised that it actually works to construct a std::string from this; 我很惊讶它实际上可以从这个构造一个std::string ; you should not get past the first zero byte (which occurs soon, if your original string is English). 你不应该超过第一个零字节（如果您的原始字符串是英语，则很快就会出现）。

But since wstring is a string of wchar_t , you should be able to construct one directly from the _bstr_t , as follows: 但由于wstring是一个wchar_t字符串，您应该能够直接从_bstr_t构造一个，如下所示：

_bstr_t tmp(vtNodeValue);
wstring strValue((wchar_t*)tmp, tmp.length());

(I'm not sure about length ; is it the number of bytes or the number of characters?) Then, you'll have a wstring that's encoded in UTF-16 on which you can call WideCharToMultiByte . （我不确定length ;它是字节数还是字符数？）然后，你将有一个以UTF-16编码的wstring ，你可以在其上调用WideCharToMultiByte 。

变量bstr到std :: string转换的默认编码

问题描述

2 个解决方案

解决方案1
10 已采纳 2009-12-01 17:29:14

解决方案2
0 2009-12-01 17:22:58

变量bstr到std :: string转换的默认编码

问题描述

2 个解决方案

解决方案1 10 已采纳 2009-12-01 17:29:14

解决方案2 0 2009-12-01 17:22:58

解决方案1
10 已采纳 2009-12-01 17:29:14

解决方案2
0 2009-12-01 17:22:58