How is utf-8 coded string printed to screen in C with printf?

Question

For below code in C:

char s[] = "这个问题";
printf("%s", s);

Knew that source file is "UTF-8 Unicode C program text" with file command.

How the string is coded after compile? Also utf-8 in the .out file?

When the binary file executed in bash, how the string is coded in memory? Is it also utf-8?

Then, how bash knows the coding scheme and show right character?

Last, now the bash know what to show, but how bytes translated to pixels on the screen? Is there some mapping from bytes to pixels?

In all these processes, is there any encoding or decoding of utf-8?

Answer 1

Assuming GCC, this manual page says that the preprocessor will first translate the character set of the incoming files to the so called source character set , which for gcc is UTF-8. So for an UTF-8 file, nothing happens. The default execution character set is then used for string constants, and that is (again, for GCC) UTF-8 by default.

So your UTF-8 string "survives" and exists in the executable as a bunch of bytes in UTF-8 encoding.

The terminal also has a character set, and that has to match, the C program does nothing to further translate strings when printed, they're just printed as they are, byte for byte. If the terminal isn't set for UTF-8, you will just get garbage.

As I noted in a comment, bash has nothing to do with this.

How is utf-8 coded string printed to screen in C with printf?

Question

1 answers

solution1
4 ACCPTED 2016-02-26 09:23:44

How is utf-8 coded string printed to screen in C with printf?

Question

1 answers

solution1 4 ACCPTED 2016-02-26 09:23:44

solution1
4 ACCPTED 2016-02-26 09:23:44