简体   繁体   中英

Manipulating strings of multibyte characters

I am a novice C programmer. I am trying to write a C program which sometimes deals with English text (fits into 8-bit chars) and sometimes Japanese text (needs 16 bits).

Do I need to set aside 16 bits for every character, even the English text if I use the same code to manipulate either country's text?

What are some of the ways of encoding multibyte characters?

What if the compiler can't store multibyte strings compactly?

I'm confused. Please help me out here. Kindly, support your answers with code examples. Also, please explain the same with context of C++ as I am learning C++ also & have beginner-level experience in this language too.

Thanks in advance.

This was a interview question asked to one of my acquaintance a few days back.

In C++ you can use std::wstring which uses wchar_t as the underlying char type. In C++11 you can also use std::u16string or std::u32string depending on the amount of storage for a character you need.

C also have wchar_t defined in <wchar.h> .

Okay, after doing a little bit of research, I think I got an answer:

mbstowcs ("multibyte string to wide character string") and wcstombs ("wide character string to multibyte string") convert between arrays of wchar_t (in which every character takes 16 bits, or two bytes) and multibyte strings (in which individual characters are stored in one byte if possible).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM