[英]Strange behavior of std::string with unicode
I have the following piece of code: 我有以下代码:
#include <iostream>
std::string eps("ε");
int main()
{
std::cout << eps << '\n';
return 0;
}
Somehow it compiles with g++ and clang on Ubuntu, and even prints out right character ε
. 不知何故,它在Ubuntu上用g ++和clang编译,甚至打印出正确的字符ε
。 Also I have almost same piece of code which happily reads ε
with cin
into std::string
. 此外,我有几乎相同的代码片段,用cin
快乐地将ε
读入std::string
。 By the way, eps.size()
is 2. 顺便说一句, eps.size()
是2。
My question is - how that works? 我的问题是 - 它是如何工作的? How can we insert unicode character into std::string
? 我们如何将unicode字符插入到std::string
? My guess is that operating system handles all this work with unicode, but I'm not sure. 我的猜测是操作系统使用unicode处理所有这些工作,但我不确定。
EDIT 编辑
As with output, I understood that it is terminal who is responsible for showing me right character (ε in this case). 和输出一样,我知道终端负责向我展示正确的角色(在这种情况下为ε)。
But with input: cin reads symbols to ' '
or any other space character (and as I understand byte by byte). 但是输入:cin将符号读取到' '
或任何其他空格字符(并且我逐字节理解)。 So, if I take Ƞ
, which second byte is 32 ' '
it will read only first byte, and then stop. 所以,如果我取Ƞ
,哪个第二个字节是32 ' '
它将只读取第一个字节,然后停止。 But it reads Ƞ
. 但它写着Ƞ
。 How? 怎么样?
The most likely reason is that everything is getting encoded in UTF-8 , as it does on my system: 最可能的原因是所有内容都以UTF-8编码,就像在我的系统上一样:
$ xxd test.cpp
...
0000020: 2065 7073 2822 ceb5 2229 3b0a 0a69 6e74 eps("..");..int
^^^^ ε in UTF-8 ^^ TWO bytes!
...
$ g++ -o test.out test.cpp
$ ./test.out
ε
$ ./test.out | xxd
0000000: ceb5 0a
^^^^
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.