简体   繁体   English

在构造CString时使用带符号或无符号字符?

[英]Use signed or unsigned char in constructing CString?

I am check the document for CString . 我正在检查CString文档 In the following statement: 在以下语句中:

  • CStringT( LPCSTR lpsz ) : Constructs a Unicode CStringT from an ANSI string. CStringT( LPCSTR lpsz ) :从ANSI字符串构造一个Unicode CStringT You can also use this constructor to load a string resource as shown in the example below. 您还可以使用此构造函数来加载字符串资源,如下例所示。

  • CStringT( LPCWSTR lpsz ) : Constructs a CStringT from a Unicode string. CStringT( LPCWSTR lpsz ) :从Unicode字符串构造一个CStringT

  • CStringT( const unsigned char* psz ) : Allows you to construct a CStringT from a pointer to unsigned char . CStringT( const unsigned char* psz ) :允许您从指向unsigned char的指针构造CStringT

I have some questions: 我有一些疑问:

  1. Why are there two versions, one for const char* ( LPCSTR ) and one for unsigned char* ? 为什么会有两个版本,一个用于const char*LPCSTR ),一个用于unsigned char* Which version should I use for different cases? 在不同情况下应该使用哪个版本? For example, does CStringT("Hello") use the first or second version? 例如, CStringT("Hello")使用第一个或第二个版本? When getting a null-terminated string from a third-party, such as sqlite3_column_text() ( see here ), should I convert it to char* or unsigned char * ? 当从第三方获取以null终止的字符串时,例如sqlite3_column_text()请参见此处 ),我应该将其转换为char*unsigned char *吗? ie, should I use CString((LPCSTR)sqlite3_column_text(...)) or CString(sqlite3_column_text(...)) ? 即,我应该使用CString((LPCSTR)sqlite3_column_text(...))还是CString(sqlite3_column_text(...)) It seems that both will work, is that right? 看来两者都可以,对吗?

  2. Why does the char* version construct a "Unicode" CStringT but the unsigned char* version will construct a CStringT ? 为什么char*版本构造一个“ Unicode” CStringT但未unsigned char*版本构造一个CStringT CStringT is a templated class to indicate all 3 instances, ie, CString , CStringA , CStringW , so why the emphasis on "Unicode" CStringT when constructing using LPCSTR ( const char* )? CStringT是一个模板化类,用于指示所有3个实例,即CStringCStringACStringW ,那么为什么在使用LPCSTRconst char* )构造时强调“ Unicode” CStringT呢?

LPCSTR is just const char* , not const signed char* . LPCSTR只是const char* ,而不是const signed char* char is signed or unsigned depending on compiler implementation, but char , signed char , and unsigned char are 3 distinct types for purposes of overloading. char是带符号的还是无符号的,取决于编译器的实现,但是出于重载的目的, charsigned charunsigned char是3种不同的类型。 String literals in C++ are of type const char[] , so CStringT("Hello") will always use the LPCSTR constructor, never the unsigned char* constructor. C ++中的字符串文字类型为const char[]类型,因此CStringT("Hello")将始终使用LPCSTR构造函数, LPCSTR使用unsigned char*构造函数。

sqlite3_column_text(...) returns unsigned char* because it returns UTF-8 encoded text. sqlite3_column_text(...)返回unsigned char*因为它返回UTF-8编码的文本。 I don't know what the unsigned char* constructor of CStringT actually does (it has something to do with MBCS strings), but the LPCSTR constructor performs a conversion from ANSI to UNICODE using the user's default locale. 我不知道CStringTunsigned char*构造函数实际上是做什么的(它与MBCS字符串有关),但是LPCSTR构造函数使用用户的默认语言环境执行从ANSI到UNICODE的转换。 That would destroy UTF-8 text that contains non-ASCII characters. 这将破坏包含非ASCII字符的UTF-8文本。

Your best option in that case is to convert the UTF-8 text to UTF-16 (using MultiByteToWideChar() or equivalent, or simply using sqlite3_column_text16() instead, which returns UTF-16 encoded text), and then use the LPCWSTR ( const wchar_t* ) constructor of CStringT , as Windows uses wchar_t for UTF-16 data. 在这种情况下,最好的选择是将UTF-8文本转换为UTF-16(使用MultiByteToWideChar()或同等功能,或者仅使用sqlite3_column_text16()来返回UTF-16编码的文本),然后使用LPCWSTRconst wchar_t* CStringT构造函数const wchar_t* ),因为Windows使用wchar_t表示UTF-16数据。

tl;dr: Use either of the following: tl; dr:使用以下任一项:

  • CStringW value( sqlite3_column_text16() ); (optionally setting SQLite's internal encoding to UTF-16), or ((可选)将SQLite的内部编码设置为UTF-16),或
  • CStringW value( CA2WEX( sqlite3_column_text(), CP_UTF8 ) );

Everything else is just not going to work out, one way or another. 其他一切都不会解决,无论是哪种方式。


First things first: CStringT is a class template , parameterized (among others) on the character type it uses to represent the stored sequence. 首先, CStringT是一个类模板 ,在其用来表示存储序列的字符类型上进行了参数化(除其他外)。 This is passed as the BaseType template type argument. 作为BaseType模板类型参数传递。 There are 2 concrete template instantiations, CStringA and CStringW , that use char and wchar_t to store the sequence of characters, respectively 1 . 有2个具体的模板实例CStringACStringW ,分别使用charwchar_t来存储字符序列1

CStringT exposes the following predefined types that describe the properties of the template instantiation: CStringT公开了以下预定义的类型 ,这些类型描述了模板实例化的属性:

  • XCHAR : Character type used to store the sequence. XCHAR :用于存储序列的字符类型。
  • YCHAR : Character type that an instance can be converted from/to. YCHAR :实例可以转换的字符类型。

The following table shows the concrete types for CStringA and CStringW : 下表显示了CStringACStringW的具体类型:

         | XCHAR   | YCHAR
---------+---------+--------
CStringA | char    | wchar_t
CStringW | wchar_t | char

While the storage of the CStringT instantiations make no restrictions with respect to the character encoding being used, the conversion c'tors and operators are implemented based on the following assumptions: 尽管CStringT实例的存储对于所使用的字符编码没有任何限制,但是基于以下假设实现了转换字符和运算符:

  • char represents ANSI 2 encoded code units. char表示ANSI 2编码的代码单元。
  • whcar_t represents UTF-16 encoded code units. whcar_t表示UTF-16编码的代码单元。

If your program doesn't match those assumptions, it is strongly advised to disable implicit wide-to-narrow and narrow-to-wide conversions. 如果您的程序与这些假设不符,强烈建议您禁用隐式的宽到窄和窄到宽转换。 To do this, defined the _CSTRING_DISABLE_NARROW_WIDE_CONVERSION preprocessor symbol prior to including any ATL/MFC header files. 为此,请在包含任何ATL / MFC头文件之前定义_CSTRING_DISABLE_NARROW_WIDE_CONVERSION预处理程序符号。 Doing so is recommended even if your program meets the assumptions to prevent accidental conversions, that are both costly as well as potentially destructive. 即使您的程序符合防止意外转换的假设,建议这样做也是如此,因为转换既昂贵又可能造成破坏。

With that out of the way, let's move on to the questions: 顺便说一句,让我们继续讨论以下问题:

Why are there two versions, one for const char* ( LPCSTR ) and one for unsigned char* ? 为什么会有两个版本,一个用于const char*LPCSTR ),一个用于unsigned char*

That's easy: Convenience. 很简单:方便。 The overload simply allows you to construct a CString instance irrespective of the signedness of the character type 3 . 重载仅允许您构造CString实例,而不考虑字符类型3的签名。 The implementation of the overload taking a const unsigned char* argument 'forwards' to the c'tor taking a const char* : 重载的实现将const unsigned char*参数'forwards'传递给采用const char*的c'tor:

CSTRING_EXPLICIT CStringT(_In_z_ const unsigned char* pszSrc) :
    CThisSimpleString( StringTraits::GetDefaultManager() )
{
    *this = reinterpret_cast< const char* >( pszSrc );
}

Which version should I use for different cases? 在不同情况下应该使用哪个版本?

It doesn't matter, as long as you are constructing a CStringA , ie no conversion is applied. 没关系,只要您正在构造CStringA ,即不应用任何转换。 If you are constructing a CStringW , you shouldn't be using either of those (as explained above). 如果要构造CStringW ,则不应使用其中任何一个(如上所述)。

For example, does CStringT("Hello") use the first or second version? 例如, CStringT("Hello")使用第一个或第二个版本?

"Hello" is of type const char[6] , that decays into a const char* to the first element in the array, when passed to the CString c'tor. "Hello"类型为const char[6] ,当传递给CString c'tor时,它会衰减为数组中第一个元素的const char* It calls the overload taking a const char* argument. 它使用const char*参数调用重载。

When getting a null-terminated string from a third-party, such as sqlite3_column_text() ( see here ), should I convert it to char* or unsigned char * ? 当从第三方获取以null终止的字符串时,例如sqlite3_column_text()请参见此处 ),我应该将其转换为char*unsigned char *吗? ie, should I use CString((LPCSTR)sqlite3_column_text(...)) or CString(sqlite3_column_text(...)) ? 即,我应该使用CString((LPCSTR)sqlite3_column_text(...))还是CString(sqlite3_column_text(...))

SQLite assumes UTF-8 encoding in this case. 在这种情况下,SQLite假定使用UTF-8编码。 CStringA can store UTF-8 encoded text, but it's really, really dangerous to do so. CStringA 可以存储UTF-8编码的文本,但是这样做确实非常危险。 CStringA assumes ANSI encoding, and readers of your code likely will do, too. CStringA假定使用ANSI编码,您的代码阅读器也可能会这样做。 It is recommended to either change your SQLite database to store UTF-16 (and use sqlite_column_text16 ) to construct a CStringW . 建议更改您的SQLite数据库以存储UTF-16(并使用sqlite_column_text16 )来构造CStringW If that is not feasible, manually convert from UTF-8 to UTF-16 before storing the data in a CStringW instance using the CA2WEX macro: 如果那不可行,请在使用CA2WEX宏将数据存储在CStringW实例中之前,先手动从UTF-8转换为UTF-16:

CStringW data( CA2WEX( sqlite3_column_text(), CP_UTF8 ) );

It seems that both will work, is that right? 看来两者都可以,对吗?

That's not correct. 那是不对的。 Neither one works as soon as you get non-ASCII characters from your database. 从数据库中获取非ASCII字符后,任何一种都无法使用。

Why does the char* version construct a "Unicode" CStringT but the unsigned char* version will construct a CStringT ? 为什么char*版本构造一个“ Unicode” CStringT但未unsigned char*版本构造一个CStringT

That looks to be the result of documentation trying to be compact. 这似乎是文档试图精简的结果。 A CStringT is a class template. CStringT是一个类模板。 It is neither Unicode nor does it even exist. 它既不是Unicode,也不存在。 I'm guessing that remark section on the constructors is meant to highlight the ability to construct Unicode strings from ANSI input (and vice versa). 我猜想构造函数上的备注部分旨在强调从ANSI输入构造Unicode字符串的能力(反之亦然)。 This is briefly mentioned, too ( "Note that some of these constructors act as conversion functions." ). 也简要地提到了这一点( “请注意,其中一些构造函数充当转换函数。” )。

To sum this up, here is a list of generic advice when using MFC/ATL strings: 总结一下,这是使用MFC / ATL字符串时的一般建议列表:

  • Prefer using CStringW . CStringW使用CStringW This is the only string type whose implied character encoding is unambiguous (UTF-16). 这是唯一的隐含字符编码是明确的(UTF-16)的字符串类型。
  • Use CStringA only, when interfacing with legacy code. 与旧版代码交互时,仅使用CStringA Make sure to unambiguously note the character encoding used. 确保明确记下所使用的字符编码。 Also make sure to understand that "currently active locale" can change at any time. 还请确保了解“当前活动的语言环境”可以随时更改。 See Keep your eye on the code page: Is this string CP_ACP or UTF-8? 请参阅注意代码页:此字符串是CP_ACP还是UTF-8? for more information. 欲获得更多信息。
  • Never use CString . 永远不要使用CString Just by looking at code, it's no longer clear, what type this is (could be any of 2 types). 仅通过查看代码,就不再清楚这是什么类型(可以是2种类型中的任何一种)。 Likewise, when looking at a constructor invocation, it is no longer possible to see, whether this is a copy or conversion operation. 同样,在查看构造函数调用时,将不再可能看到这是复制操作还是转换操作。
  • Disable implicit conversions for the CStringT class template instantiations. 禁用CStringT类模板实例化的隐式转换。

1 There's also CString that uses the generic-text mapping TCHAR as its BaseType . 1 还有一个CString使用通用文本映射TCHAR作为其BaseType TCHAR expands to either char or wchar_t , depending preprocessor symbols. TCHAR扩展为charwchar_t ,具体取决于预处理器符号。 CString is thus an alias for either CStringA or CStringW depending on those very same preprocessor symbols. 因此, CStringCStringACStringW的别名,这取决于那些非常相同的预处理器符号。 Unless you are targeting Win9x, don't use any of the generic-text mappings. 除非您以Win9x为目标,否则请勿使用任何通用文本映射。

2 Unlike Unicode encodings, ANSI is not a self-contained representation. 2 与Unicode编码不同,ANSI不是独立的表示形式。 Interpretation of code units depends on external state (the currently active locale). 代码单元的解释取决于外部状态(当前活动的语言环境)。 Do not use unless you are interfacing with legacy code. 除非与旧版代码连接,否则不要使用。

3 It is implementation defined, whether char is interpreted as signed or unsigned. 3 它是实现定义的,无论char被解释为带符号还是无符号。 Either way, char , unsigned char , and signed char are 3 distinct types. 无论哪种方式, charunsigned charsigned char是3种不同的类型。 By default, Visual Studio interprets char as signed. 默认情况下,Visual Studio将char解释为带符号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM