简体   繁体   English

C ++中的便携式wchar_t

[英]Portable wchar_t in C++

Is there a portable wchar_t in C++? 在C ++中是否有可移植的wchar_t? On Windows, its 2 bytes. 在Windows上,它的2个字节。 On everything else is 4 bytes. 其他一切都是4个字节。 I would like to use wstring in my application, but this will cause problems if I decide down the line to port it. 我想在我的应用程序中使用wstring,但如果我决定将其移植到端口,这将导致问题。

If you're dealing with use internal to the program, don't worry about it; 如果您正在处理程序的内部使用,请不要担心; a wchar_t in class A is the same as in class B. A类中的wchar_t与B类中的相同。

If you're planning to transfer data between Windows and Linux/MacOSX versions, you've got more than wchar_t to worry about, and you need to come up with means to handle all the details. 如果您计划在Windows和Linux / MacOSX版本之间传输数据,那么您需要担心的不仅仅是wchar_t,而且您需要提供处理所有细节的方法。

You could define a type that you'll define to be four bytes everywhere, and implement your own strings, etc. (since most text handling in C++ is templated), but I don't know how well that would work for your needs. 您可以定义一个类型,您将其定义为四个字节,并实现您自己的字符串等(因为C ++中的大多数文本处理都是模板化的),但我不知道这对您的需求有多好。

Something like typedef int my_char; typedef std::basic_string<my_char> my_string; 类似于typedef int my_char; typedef std::basic_string<my_char> my_string; typedef int my_char; typedef std::basic_string<my_char> my_string;

What do you mean by "portable wchar_t"? 什么是“portable wchar_t”是什么意思? There is a uint16_t type that is 16bits wide everywhere, which is often available. uint16_t类型到处都是16位宽,通常可用。 But that of course doesn't make up a string yet. 但那当然不构成一个字符串。 A string has to know of its encoding to make sense of functions like length() , substring() and so on (so it doesn't cut characters in the middle of a code point when using utf8 or 16). 字符串必须知道其编码以理解诸如length()substring()等函数(因此当使用utf8或16时,它不会在代码点中间剪切字符)。 There are some unicode compatible string classes i know of that you can use. 我知道有一些你可以使用的unicode兼容的字符串类。 All can be used in commercial programs for free (the Qt one will be compatible with commercial programs for free in a couple of months, when Qt 4.5 is released). 所有这些都可以免费用于商业节目(Qt 4.5将在几个月内免费与商业节目兼容,当Qt 4.5发布时)。

ustring from the gtkmm project. 来自gtkmm项目的ustring If you program with gtkmm or use glibmm, that should be the first choice, it uses utf-8 internally. 如果用gtkmm编程或使用glibmm,那应该是第一选择,它在内部使用utf-8 Qt also has a string class, called QString. Qt还有一个名为QString的字符串类。 It's encoded in utf-16 . 它以utf-16编码。 ICU is another project that creates portable unicode string classes, and has a UnicodeString class that internally seems to be encoded in utf-16, like Qt. ICU是另一个创建可移植的unicode字符串类的项目,并且有一个UnicodeString类,内部似乎用utf-16编码,就像Qt一样。 Haven't used that one though. 虽然没有用过那个。

The proposed C++0x standard will have char16_t and char32_t types. 建议的C ++ 0x标准将具有char16_tchar32_t类型。 Until then, you'll have to fall back on using integers for the non- wchar_t character type. 在此之前,您将不得不使用非wchar_t字符类型的整数。

#if defined(__STDC_ISO_10646__)
    #define WCHAR_IS_UTF32
#elif defined(_WIN32) || defined(_WIN64)
    #define WCHAR_IS_UTF16
#endif

#if defined(__STDC_UTF_16__)
    typedef _Char16_t CHAR16;
#elif defined(WCHAR_IS_UTF16)
    typedef wchar_t CHAR16;
#else
    typedef uint16_t CHAR16;
#endif

#if defined(__STDC_UTF_32__)
    typedef _Char32_t CHAR32;
#elif defined(WCHAR_IS_UTF32)
    typedef wchar_t CHAR32;
#else
    typedef uint32_t CHAR32;
#endif

According to the standard, you'll need to specialize char_traits for the integer types. 根据标准,您需要为整数类型专门化char_traits But on Visual Studio 2005, I've gotten away with std::basic_string<CHAR32> with no special handling. 但是在Visual Studio 2005上,我已经使用std::basic_string<CHAR32>而没有特殊处理。

I plan to use a SQLite database. 我打算使用SQLite数据库。

Then you'll need to use UTF-16, not wchar_t . 然后你需要使用UTF-16,而不是wchar_t

The SQLite API also has a UTF-8 version. SQLite API也有UTF-8版本。 You may want to use that instead of dealing with the wchar_t differences. 您可能希望使用它而不是处理wchar_t差异。

My suggestion. 我的建议。 Use UTF-8 and std::string. 使用UTF-8和std :: string。 Wide strings would not bring you too much added value. 宽字符串不会给你带来太多的附加价值。 As you anyway can't interpret wide character as letter as some characters crated from several unicode code points. 无论如何你无法将宽字符解释为字母,因为某些字符来自几个unicode代码点。

So use anywhere UTF-8 and use good library to deal with natural languages. 所以在任何地方使用UTF-8并使用好的库来处理自然语言。 Like for example Boost.Locale. 例如Boost.Locale。

Bad idea: define something like typedef uint32_t mychar; 不好的想法:定义类似typedef uint32_t mychar; is bad. 不好。 As you can't use iostream with it, you can't create for example stringstream based in this character as you would not be able to write in it. 由于你不能使用iostream,你不能创建例如基于这个字符的stringstream,因为你无法在其中写入。

For example this would not work: 例如,这不起作用:

std::basic_ostringstream<unsigned> s;
ss << 10;

Would not create you a string. 不会创建一个字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM