简体   繁体   中英

what are (some of the) UTF8 string functions in C

For dealing with ASCII we have strlen, strcat etc.. For UTF16(ie, UCS2) we have wcscat and wcslen functions.

For dealing with UTF8 and UCS4 what are the functions available in C? Assume Linux/gcc

I don't think standard C libraries have UTF-8 functions. There are surely libraries for it.

However, normal str functions can be used with UTF-8 in many cases.
strlen works well, returning the number of bytes (not characters). strcat works (it also overruns your buffer easily, but this is normal for strcat).

The reason is that the 0 character can't appear in multi-byte UTF-8 data. So if it appears in a UTF-8 string, it's surely its end, just like in ASCII.

The standard does not specify the encoding or size used for the wide character functions, so assuming it to be UCS2, UCS4 or anything else is not portable. C11 brings standardized unicode support, but I think it's to early to rely on that being implemented yet. Your best bet is to find a library to handle conversion to/from UTF8/UCS4 or any other encoding you may need.

Have a look at iconv , or the chapter on character handling in the GNU C library manual.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM