C ++文字和Unicode

Question

C++ Literals C ++文字

Environment: 环境：

OS: Windows 10 Pro; 操作系统：Windows 10 Pro;
Compiler: GCC latest. 编译器：GCC最新。
IDE: Code::Blocks latest. IDE：代码:::阻止最新。
working on: Console applications. 工作于：控制台应用程序。

My understanding for numerical literals prefixes is that they are useful to determine the numerical value type (not sure).However, I have a lot of confusion on character and string literals prefixes and suffixes. 我对数字文字前缀的理解是，它们对确定数值类型很有用（不确定），但是我对字符和字符串文字前缀和后缀有很多困惑。 I read a lot and spent days trying to understand the situation, but I got more questions and few answers. 我读了很多书，花了几天的时间试图了解这种情况，但是我遇到了更多的问题而答案却很少。 so I thought stack overflow could be of a lot of help. 所以我认为堆栈溢出可能会很有帮助。

Qs: 问：

1- What are the correct use for the string prefixes u8 u UL? 1-字符串前缀u8 u UL的正确用法是什么？

I have the following code as example: 我有以下代码作为示例：

#include <iostream>
#include <string>
using namespace std;

int main()
{
    cout << "\n\n Hello World! (plain) \n";
    cout << u8"\n Hello World! (u8) \n";
    cout << u"\n Hello World! (u) \n";
    cout << U"\n Hello World! (U) \n";
    cout << L"\n Hello World! (plain) \n\n";

    cout << "\n\n\n";
}

The output is like this: 输出如下：

Hello World! 你好，世界！ (plain) （普通）

Hello World! 你好，世界！ (u8) （u8）

0x47f0580x47f0840x47f0d8 0x47f0580x47f0840x47f0d8

Q2: Why U u ans L has such output? 问题2：为什么U u ans L具有这样的输出？ I expected it is just to determine type not do encoding mapping (if it is). 我希望这只是确定类型而不进行编码映射（如果是）。

Q3 Is there a simple and to the point references about encodings like UTF-8. 问题3关于UTF-8之类的编码是否有一个简单的要点参考。 I am confused about them, in addition I doubt that console applications is capable to deal with them. 我对它们感到困惑，此外，我怀疑控制台应用程序是否能够处理它们。 I see it is crucial to understand them. 我认为了解它们至关重要。

Q4: Also I will appreciate a step by step reference that explain custom type literals. Q4：另外，我将感谢您逐步解释自定义类型文字的参考。

Answer 1

First see: http://en.cppreference.com/w/cpp/language/string_literal 首先看： http : //en.cppreference.com/w/cpp/language/string_literal

std::cout 's class operator << is properly overloaded to print const char* . std::cout的类运算符<<已正确重载以输出const char* 。 That is why the first two strings are printed. 这就是为什么打印前两个字符串的原因。

 cout << "\\n\\n Hello World! (plain) \\n"; cout << u8"\\n Hello World! (u8) \\n";

As expected, prints ¹ : 如预期的那样，打印¹ ：

 Hello World! (plain) Hello World! (u8)

Meanwhile std::cout 's class has no special << overload for const char16_t* , const char32_t* and const wchar_t* , hence it will match << 's overload for printing pointers, that is why: 同时std::cout的类对于const char16_t* ， const char32_t*和const wchar_t*没有特殊的<<重载，因此它将匹配<<的重载以打印指针，这就是为什么：

 cout << u"\\n Hello World! (u) \\n"; cout << U"\\n Hello World! (U) \\n"; cout << L"\\n Hello World! (plain) \\n\\n";

Prints: 印刷品：

 0x47f0580x47f0840x47f0d8

As you can see, there are actually 3 pointer values printed there: 0x47f058 , 0x47f084 and 0x47f0d8 正如你所看到的，其实有印有3个指针值： 0x47f058 ， 0x47f084和0x47f0d8

However, for the last one, you can get it to print properly using std::wcout 但是，对于最后一个，您可以使用std::wcout使其正确打印

 std::wcout << L"\\n Hello World! (plain) \\n\\n";

prints 版画

  Hello World! (plain)

^{1: The u8 literal printed as expected because of the direct ASCII mapping of the first few codepoints of UTF-8.} ^{1：由于UTF-8的前几个代码点具有直接ASCII映射，因此按预期打印了u8文字。}

Answer 2

1) Narrow multibyte string literal. 1）窄多字节字符串文字。 The type of an unprefixed string literal is const char[] . 无前缀字符串文字的类型为const char[] 。

2) Wide string literal. 2）宽字符串文字。 The type of a L"..." string literal is const wchar_t[] . L"..."字符串文字的类型为const wchar_t[] 。

3) UTF-8 encoded string literal. 3）UTF-8编码的字符串文字。 The type of a u8"..." string literal is const char[] . u8"..."字符串文字的类型为const char[] 。

4) UTF-16 encoded string literal. 4）UTF-16编码的字符串文字。 The type of a u"..." string literal is const char16_t[] . u"..."字符串文字的类型为const char16_t[] 。

5) UTF-32 encoded string literal. 5）UTF-32编码的字符串文字。 The type of a U"..." string literal is const char32_t[] . U"..."字符串文字的类型为const char32_t[] 。

6) Raw string literal. 6）原始字符串文字。 Used to avoid escaping of any character, anything between the delimiters becomes part of the string. 用于避免转义任何字符，定界符之间的任何内容都将成为字符串的一部分。 prefix, if present, has the same meaning as described above. 前缀（如果存在）具有与上述相同的含义。

std::cout expects single byte characters, otherwise it can output a value such as 0x47f0580x47f0840x47f0d8 . std::cout需要单字节字符，否则它可以输出诸如0x47f0580x47f0840x47f0d8的值。 If your trying to output literals that consist of multi-byte characters (char16_t, char32_t, or wchar_t) then you need to use std::wcout to output them to the console, or convert them to a single byte character type. 如果您试图输出包含多字节字符（char16_t，char32_t或wchar_t）的文字，则需要使用std::wcout将其输出到控制台，或将它们转换为单字节字符类型。 Raw string literals are very handy for formatting output. 原始字符串文字对于格式化输出非常方便。 An example of Raw string literals is R"~(This is the text that will be output just as I typed it into the code editor!)~" and will be a single byte character string. 原始字符串文字的一个示例是R"~(This is the text that will be output just as I typed it into the code editor!)~" ，它将是一个单字节字符串。 If it's prefixed with any of the multi-byte qualifiers the raw string literal will be multi-byte. 如果以任何多字节限定符作为前缀，则原始字符串文字将为多字节。 Here is a very comprehensive reference on string literals. 这是有关字符串文字的非常全面的参考。

C ++文字和Unicode

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-02-20 21:20:36

解决方案2
1 2017-02-20 21:17:21

C ++文字和Unicode

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-02-20 21:20:36

解决方案2 1 2017-02-20 21:17:21

解决方案1
3 已采纳 2017-02-20 21:20:36

解决方案2
1 2017-02-20 21:17:21