简体   繁体   English

C ++ 0x中char和Unicode的签名

[英]Signedness of char and Unicode in C++0x

From the C++0x working draft, the new char types ( char16_t and char32_t ) for handling Unicode will be unsigned ( uint_least16_t and uint_least32_t will be the underlying types). 从C ++ 0x工作草案开始,用于处理Unicode的新char类型( char16_tchar32_t )将是无符号的( uint_least16_tuint_least32_t将是基础类型)。

But as far as I can see (not very far perhaps) a type char8_t (based on uint_least8_t ) is not defined. 但是据我所知 (也许不是很远),没有定义类型char8_t (基于uint_least8_t )。 Why ? 为什么呢

And it's even more confusing when you see that a new u8 encoding prefix is introduced for UTF-8 string literal... based on old friend (sign/unsigned) char . 而且,当您看到基于旧朋友(带符号/无符号) char为UTF-8字符串文字引入了新的u8编码前缀时,这更加令人困惑。 Why ? 为什么呢

Update : There's a proposal to add a new type : char8_t 更新 :有一个建议添加一个新类型:char8_t

char8_t: A type for UTF-8 characters and strings (Revision 1) http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0482r1.html char8_t:一种用于UTF-8字符和字符串的类型(修订版1) http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0482r1.html

char will be the type used for UTF-8 because it's redefined to be sure it can be used with it: char将是用于UTF-8的类型,因为已对其进行了重新定义以确保可以与之一起使用:

For the purpose of enhancing support for Unicode in C++ compilers, the definition of the type char has been modified to be both at least the size necessary to store an eight-bit coding of UTF-8 and large enough to contain any member of the compiler's basic execution character set . 为了增强对C ++编译器中Unicode的支持, 已将char类型的定义修改为至少具有存储UTF-8八位编码所需的大小,并且大小足以包含编译器的任何成员。基本执行字符集 It was previously defined as only the latter. 以前仅将其定义为后者。 There are three Unicode encodings that C++0x will support: UTF-8, UTF-16, and UTF-32. C ++ 0x将支持三种Unicode编码:UTF-8,UTF-16和UTF-32。 In addition to the previously noted changes to the definition of char, C++0x will add two new character types: char16_t and char32_t. 除了前面提到的对char定义的更改之外,C ++ 0x将添加两个新的字符类型:char16_t和char32_t。 These are designed to store UTF-16 and UTF-32 respectively. 它们分别用于存储UTF-16和UTF-32。

Source : http://en.wikipedia.org/wiki/C%2B%2B0x 来源: http : //en.wikipedia.org/wiki/C%2B%2B0x

Most of UTF-8 application uses char already anyway on PC/mac. 无论如何,大多数UTF-8应用程序已在PC / mac上使用char。

char16_t and char32_t are supposed to be usable for representing code points. 应该将char16_tchar32_t用于表示代码点。 Since there are no negative code points, it's sensible for these to be unsigned. 由于没有负代码点,因此将它们取消签名是明智的。

UTF-8 does not represent code points directly, so it doesn't matter whether u8 's underlying type is signed or not. UTF-8并不直接表示代码点,因此u8的基础类型是否已签名都无关紧要。

The C++0x draft doesn't seem to indicate whether or not the new Unicode character types are signed or unsigned. C ++ 0x草案似乎并未表明新的Unicode字符类型是带符号的还是无符号的。 However, as others have already mentioned, since there are no negative Unicode codepoints it would make more sense for char16_t and char32_t to be unsigned. 但是,正如其他人已经提到的那样,由于没有负的Unicode代码点,因此对char16_tchar32_t进行无符号签名会更有意义。 (Then again, it would have made sense for char to be unsigned, yet we've been dealing with "negative" characters since the 70s.) (再说一次,将char取消签名是很有意义的,但是自70年代以来,我们一直在处理“负”字符。)

Also, since UTF-16 ranges from 0x0 through 0xFFFF (ignoring surrogate pairs), you'd need the entire range of an unsigned 16-bit integer to properly represent all values. 另外,由于UTF-16的范围是从0x0到0xFFFF(忽略代理对),因此您需要整个无符号16位整数范围以正确表示所有值。 It would be awkward, to say the least, if codepoints 0x8000 through 0xFFFF were represented as negative numbers with a char16_t . 至少可以说,如果将代码点0x8000到0xFFFF表示为带有char16_t负数,那将很尴尬。

Anyway, until the C++0x committee says something definitive on the matter, you can always just check your implementation: 无论如何,直到C ++ 0x委员会就此事做出明确的决定之前,您始终可以只检查自己的实现:

#include <type_traits>
#include <iostream>

int main()
{
    std::cout << std::boolalpha << std::is_signed<char16_t>::value << std::endl;
}

This prints out false using GCC 4.45 on Linux. 在Linux上使用GCC 4.45可以打印出false So on one platform, at least, the new Unicode types are definitely unsigned. 因此,至少在一个平台上,新的Unicode类型肯定是未签名的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM