简体   繁体   中英

How is utf-8 coded string printed to screen in C with printf?

For below code in C:

char s[] = "这个问题";
printf("%s", s);

Knew that source file is "UTF-8 Unicode C program text" with file command.

How the string is coded after compile? Also utf-8 in the .out file?

When the binary file executed in bash, how the string is coded in memory? Is it also utf-8?

Then, how bash knows the coding scheme and show right character?

Last, now the bash know what to show, but how bytes translated to pixels on the screen? Is there some mapping from bytes to pixels?

In all these processes, is there any encoding or decoding of utf-8?

Assuming GCC, this manual page says that the preprocessor will first translate the character set of the incoming files to the so called source character set , which for gcc is UTF-8. So for an UTF-8 file, nothing happens. The default execution character set is then used for string constants, and that is (again, for GCC) UTF-8 by default.

So your UTF-8 string "survives" and exists in the executable as a bunch of bytes in UTF-8 encoding.

The terminal also has a character set, and that has to match, the C program does nothing to further translate strings when printed, they're just printed as they are, byte for byte. If the terminal isn't set for UTF-8, you will just get garbage.

As I noted in a comment, bash has nothing to do with this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM