简体   繁体   English

如何将utf-8编码的字符串打印到带有printf的C屏幕上?

[英]How is utf-8 coded string printed to screen in C with printf?

For below code in C: 对于C中的以下代码:

char s[] = "这个问题";
printf("%s", s);

Knew that source file is "UTF-8 Unicode C program text" with file command. 使用file命令知道源文件是“UTF-8 Unicode C程序文本”。

How the string is coded after compile? 编译后如何编码字符串? Also utf-8 in the .out file? .out文件中也是utf-8?

When the binary file executed in bash, how the string is coded in memory? 当bash中执行二进制文件时,字符串如何在内存中编码? Is it also utf-8? 它也是utf-8吗?

Then, how bash knows the coding scheme and show right character? 然后,bash如何知道编码方案并显示正确的字符?

Last, now the bash know what to show, but how bytes translated to pixels on the screen? 最后,现在bash知道要显示什么,但字节如何转换为屏幕上的像素? Is there some mapping from bytes to pixels? 是否存在从字节到像素的映射?

In all these processes, is there any encoding or decoding of utf-8? 在所有这些过程中,是否有utf-8的编码或解码?

Assuming GCC, this manual page says that the preprocessor will first translate the character set of the incoming files to the so called source character set , which for gcc is UTF-8. 假设GCC, 这个手册页说预处理器将首先将传入文件的字符集转换为所谓的源字符集 ,对于gcc,它是UTF-8。 So for an UTF-8 file, nothing happens. 因此对于UTF-8文件,没有任何反应。 The default execution character set is then used for string constants, and that is (again, for GCC) UTF-8 by default. 然后将默认执行字符集用于字符串常量,默认情况下(再次,对于GCC)UTF-8。

So your UTF-8 string "survives" and exists in the executable as a bunch of bytes in UTF-8 encoding. 因此,您的UTF-8字符串“幸存”并作为UTF-8编码中的一堆字节存在于可执行文件中。

The terminal also has a character set, and that has to match, the C program does nothing to further translate strings when printed, they're just printed as they are, byte for byte. 终端也有一个字符集,并且必须匹配,C程序在打印时不会进一步翻译字符串,它们只是按字节打印。 If the terminal isn't set for UTF-8, you will just get garbage. 如果终端未设置为UTF-8,您将获得垃圾。

As I noted in a comment, bash has nothing to do with this. 正如我在评论中指出的那样,bash与此无关。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM