简体   繁体   中英

UTF-8 3 and 4 bytes character representation

I have this C code to print the character of 2 bytes in UTF-8 encoding scheme:

printf("%c%c", 0xC0 + cp / 0x40, 0x80 + cp % 0x40);

How I can represent 3 and 4 bytes in the same way of UTF-8?

If you've called setlocale and the locale is using UTF-8 and wchar_t stores Unicode codepoint values, you can just do:

printf("%lc", (wint_t)cp);

Otherwise, for 3-byte representations, you could do:

printf("%c%c%c", 0xE0 + cp/0x40/0x40, 0x80 + cp/0x40%0x40, 0x80 + cp%0x40);

And similarly for 4-byte. Note that this approach is not recommended since you could easily accidentally output an invalid 3-byte representation for a character whose UTF-8 representation is actually 2-byte.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM