简体   繁体   中英

How to handle char16_t or char32_t with printf and scanf in C?

If I write:

char a = 'A';
printf("%x %c", a, a);

it will produce the output "41 A". Similary when I write

char32_t c = U'🍌';
printf("%x %c", c, c);  //even tried %lc and %llc

it will produce the output "1f34c L" instead of expected "1f34c "!

Is there something wrong here? How can I print char16_t and char32_t characters onto stdout?

Also, which format specifier should I use to get char16_t / char32_t input from scanf?

char32_t c;
scanf("%c", &c); //🍌
printf("%x %c", c, c);

this will produce the output "f0 �".

i have given value in HEX format symbol = 0x0001F34C there are other ways to solve it to this is how i know check following code in c we cannot print symbol using %c or just printf here is explain why to use wchar_t instead of char char have UTF-8 encoding and wchar_t have UTF-32 which increases its range

#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main() {
    setlocale(LC_CTYPE, "");
    wchar_t symbol = 0x0001F34C;
    wprintf(L"%x %lc\n",symbol,symbol);
}
output: 1f34c 🍌

check this following link Printing a Unicode Symbol in C , UNICODE of emoji banana , char32_t

char16_t and char32_t are nothing special and they do not have great support. Only basic conversion functions are in standard. They are really just uint_least16_t and uint_least32_t . The only thing they are used for are converting u and U literals, and they may not be UTF-16 and UTF-32 - check __STDC_UTF_16__ and __STDC_UTF_32__ before assuming they are. To do anything more with them then converting from u U literals to multibyte encoding (and back xd), you have to do it and implement it yourself.

C language has really two encodings - locale dependent multibyte character representation and wide character representation.

Is there something wrong here?

The '' character you typed in your source file is interpreted by the compiler as a some implementation specific value. Gcc would make an UTF-8, then gcc preprocessor will shift the values left, so '' is equal to (int)0xF09F8D8C on gcc - the behavior of multi-character literals 'something' is implementation defined. Then the value of that character is assigned to char32_t . That is not at all an UTF-32 value.

How can I print char16_t and char32_t characters onto stdout?

Convert them to multibyte string. Then just print it with %s .

#include <stdlib.h>
#include <uchar.h>
#include <stdio.h>
#include <wchar.h>
#include <limits.h>
#include <string.h>
#include <errno.h>
#include <locale.h>
int main() {
    setlocale(LC_ALL, "en_US.UTF-8");
    char32_t c = U'🍌';
    char buf[MB_LEN_MAX + 1] = {0};
    mbstate_t ps;
    memset(&ps, 0, sizeof(ps));
    c32rtomb(buf, c, &ps);
    printf("%s\n", buf);
}

Printing data is locale dependent, as printing is done in the locale specified by the user. The default locale is C and has no UTF support. So first you have to set your locale to something utf compatible. Then call c32rtomb . Note that stream chooses encoding at the first time it's printed in glibc - make sure to call setlocale before doing anything with the stream you want to work with.

which format specifier should I use to get char16_t / char32_t input from scanf?

None, there is none. You should use wchar_t or plain char strings to read characters from user in the encoding specified in his locale. Then you can convert to/from char16_t and char32_t if you want. If you want to specifically read UTF-32 characters, then you have to write it yourself to be sure your code reads UTF-32 characters. I recommend libunistring .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM