If I write:
char a = 'A';
printf("%x %c", a, a);
it will produce the output "41 A". Similary when I write
char32_t c = U'🍌';
printf("%x %c", c, c); //even tried %lc and %llc
it will produce the output "1f34c L" instead of expected "1f34c "!
Is there something wrong here? How can I print char16_t and char32_t characters onto stdout?
Also, which format specifier should I use to get char16_t / char32_t input from scanf?
char32_t c;
scanf("%c", &c); //🍌
printf("%x %c", c, c);
this will produce the output "f0 �".
i have given value in HEX
format symbol = 0x0001F34C
there are other ways to solve it to this is how i know check following code in c we cannot print symbol using %c
or just printf
here is explain why to use wchar_t instead of char char
have UTF-8 encoding and wchar_t have UTF-32 which increases its range
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main() {
setlocale(LC_CTYPE, "");
wchar_t symbol = 0x0001F34C;
wprintf(L"%x %lc\n",symbol,symbol);
}
output: 1f34c 🍌
check this following link Printing a Unicode Symbol in C , UNICODE of emoji banana , char32_t
char16_t
and char32_t
are nothing special and they do not have great support. Only basic conversion functions are in standard. They are really just uint_least16_t
and uint_least32_t
. The only thing they are used for are converting u
and U
literals, and they may not be UTF-16 and UTF-32 - check __STDC_UTF_16__
and __STDC_UTF_32__
before assuming they are. To do anything more with them then converting from u
U
literals to multibyte encoding (and back xd), you have to do it and implement it yourself.
C language has really two encodings - locale dependent multibyte character representation and wide character representation.
Is there something wrong here?
The ''
character you typed in your source file is interpreted by the compiler as a some implementation specific value. Gcc would make an UTF-8, then gcc preprocessor will shift the values left, so
''
is equal to (int)0xF09F8D8C
on gcc - the behavior of multi-character literals 'something'
is implementation defined. Then the value of that character is assigned to char32_t
. That is not at all an UTF-32 value.
How can I print char16_t and char32_t characters onto stdout?
Convert them to multibyte string. Then just print it with %s
.
#include <stdlib.h>
#include <uchar.h>
#include <stdio.h>
#include <wchar.h>
#include <limits.h>
#include <string.h>
#include <errno.h>
#include <locale.h>
int main() {
setlocale(LC_ALL, "en_US.UTF-8");
char32_t c = U'🍌';
char buf[MB_LEN_MAX + 1] = {0};
mbstate_t ps;
memset(&ps, 0, sizeof(ps));
c32rtomb(buf, c, &ps);
printf("%s\n", buf);
}
Printing data is locale dependent, as printing is done in the locale specified by the user. The default locale is C
and has no UTF support. So first you have to set your locale to something utf compatible. Then call c32rtomb
. Note that stream chooses encoding at the first time it's printed in glibc
- make sure to call setlocale
before doing anything with the stream you want to work with.
which format specifier should I use to get char16_t / char32_t input from scanf?
None, there is none. You should use wchar_t
or plain char
strings to read characters from user in the encoding specified in his locale. Then you can convert to/from char16_t
and char32_t
if you want. If you want to specifically read UTF-32 characters, then you have to write it yourself to be sure your code reads UTF-32
characters. I recommend libunistring .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.