简体   繁体   English

允许为std :: string分配“const char *”,但是不能编译分配给std :: wstring。 为什么?

[英]Assigning a “const char*” to std::string is allowed, but assigning to std::wstring doesn't compile. Why?

I assumed that std::wstring and std::string both provide more or less the same interface. 我假设std :: wstring和std :: string都提供了或多或少相同的接口。

So I tried to enable unicode capabilities for our application 所以我尝试为我们的应用程序启用unicode功能

# ifdef APP_USE_UNICODE
    typedef std::wstring AppStringType;
# else
    typedef std::string  AppStringType;
# endif

However that gives me a lot of compile errors when -DAPP_USE_UNICODE is used. 但是,当使用-DAPP_USE_UNICODE时,这会给我带来很多编译错误。

It turned out, that the compiler chokes when a const char[] is assigned to std::wstring . 事实证明,当const char[]被分配给std::wstring时,编译器会扼杀。

EDIT : improved example by removing the usage of literal "hello". 编辑 :通过删除文字“你好”的使用改进的例子。

#include <string>

void myfunc(const char h[]) {
   string  s = h; // compiles OK
   wstring w = h; // compile Error
}

Why does it make such a difference? 为什么会有这样的差异?

Assigning a const char* to std::string is allowed, but assigning to std::wstring gives compile errors. 允许将const char*分配给std::string ,但是分配给std::wstring会产生编译错误。

Shouldn't std::wstring provide the same interface as std::string ? std::wstring不应该提供与std::string相同的接口吗? At least for such a basic operation as assignment? 至少对于这样的基本操作如赋值?

(environment: gcc-4.4.1 on Ubuntu Karmic 32bit) (环境:Ubuntu Karmic 32bit上的gcc-4.4.1)

You should do: 你应该做:

#include <string>

int main() {
  const wchar_t h[] = L"hello";
  std::wstring w = h;
  return 0;
}

std::string is a typedef of std::basic_string<char> , while std::wstring is a typedef of std::basic_string<wchar_t> . std::stringstd::basic_string<char>的typedef,而std::wstringstd::basic_string<wchar_t>的typedef。 As such, the 'equivalent' C-string of a wstring is an array of wchar_t s. 因此, wstring的“等效”C字符串是wchar_t的数组。

The 'L' in front of the string literal is to indicate that you are using a wide-char string constant. 字符串文字前面的“L”表示您正在使用宽字符串常量。

The relevant part of the string API is this constructor: 字符串API的相关部分是这个构造函数:

basic_string(const charT*);

For std::string, charT is char. 对于std :: string,charT是char。 For std::wstring it's wchar_t. 对于std :: wstring,它是wchar_t。 So the reason it doesn't compile is that wstring doesn't have a char* constructor. 所以它不编译的原因是wstring没有char *构造函数。 Why doesn't wstring have a char* constructor? 为什么wstring没有char *构造函数?

There is no one unique way to convert a string of char to a string of wchar. 没有一种独特的方法可以将char字符串转换为wchar字符串。 What's the encoding used with the char string? char字符串使用的编码是什么? Is it just 7 bit ASCII? 它只是7位ASCII吗? Is it UTF-8? 是UTF-8吗? Is it UTF-7? 是UTF-7吗? Is it SHIFT-JIS? 它是SHIFT-JIS吗? So I don't think it would entirely make sense for std::wstring to have an automatic conversion from char*, even though you could cover most cases. 所以我不认为std :: wstring从char *自动转换是完全有意义的,即使你可以覆盖大多数情况。 You can use: 您可以使用:

w = std::wstring(h, h + sizeof(h) - 1);

which will convert each char in turn to wchar (except the NUL terminator), and in this example that's probably what you want. 它会将每个char依次转换为wchar(NUL终结符除外),在这个例子中,这可能就是你想要的。 As int3 says though, if that's what you mean it's most likely better to use a wide string literal in the first place. 正如int3所说,如果这就是你的意思,那么最好首先使用宽字符串文字。

To convert from a multibyte encoding to a wide character encoding, take a look at the header <locale> and the type std::codecvt . 要从多字节编码转换为宽字符编码,请查看标头<locale>和类型std::codecvt The Dinkumware library has a class Dinkum::wstring_convert that makes performing such multibyte-to-wide conversions easier. Dinkumware库有一个Dinkum::wstring_convert类,可以更轻松地执行这种多字节到宽的转换。

The function std::codecvt_byname allows one to find a codecvt instance for a particular named encoding. 函数std::codecvt_byname允许查找特定命名编码的codecvt实例。 Unfortunately, discovering the names of the encodings (or locales) on your system is implementation-specific. 不幸的是,在系统上发现编码(或语言环境)的名称是特定于实现的。

Small suggestion... Do not use "Unicode" strings under Linux (aka wide strings). 小建议......不要在Linux(也就是宽字符串)下使用“Unicode”字符串。 std::string is perfectly fine and holds Unicode very well (UTF-8). std::string非常好并且非常好地保存Unicode(UTF-8)。

Most Linux API works with char * strings and most popular encoding is UTF-8. 大多数Linux API使用char *字符串,最流行的编码是UTF-8。

So... Just don't bother yourself using wstring. 所以...只是不要使用wstring打扰自己。

In addition to the other answers, you could use a trick from Microsoft's book (specifically, tchar.h ), and write something like this: 除了其他答案之外,您可以使用Microsoft的书(特别是tchar.h )中的技巧,并编写如下内容:

# ifdef APP_USE_UNICODE
    typedef std::wstring AppStringType;
    #define _T(s) (L##s)
# else
    typedef std::string  AppStringType;
    #define _T(s) (s)
# endif

AppStringType foo = _T("hello world!");

(Note: my macro-fu is weak, and this is untested, but you get the idea.) (注意:我的宏观功能很弱,这是未经测试的,但你明白了。)

Looks like you can do something like this: 看起来你可以这样做:

    #include <sstream>
    // ...
    std::wstringstream tmp;
    tmp << "hello world";
    std::wstring our_string = 

Although for a more complex situation, you may want to break down and use mbstowcs 虽然对于更复杂的情况,您可能想要分解并使用mbstowcs

you should use 你应该使用

#include <tchar.h>

tstring instead of wstring/string TCHAR* instead of char* and _T("hello") instead of "hello" or L"hello" tstring而不是wstring / string TCHAR *而不是char *和_T(“hello”)而不是“hello”或L“hello”

this will use the appropriate form of string+char, when _UNICODE is defined. 当定义_UNICODE时,这将使用适当形式的string + char。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM