简体   繁体   中英

std::string conversion to char32_t (unicode characters)

I need to read a file using fstream in C++ that has ASCII as well as Unicode characters using the getline function.
But the function uses only std::string and these simple strings' characters can not be converted into char32_t so that I can compare them with Unicode characters. So please could any one give any fix.

char32_t corresponds to UTF-32 encoding, which is almost never used (and often poorly supported). Are you sure that your file is encoded in UTF-32?

If you are sure, then you need to use std::u32string to store your string. For reading, you can use std::basic_stringstream<char32_t> for instance. However, please note that these types are generally poorly supported.

Unicode is generally encoded using:

  • UTF-8 in text files (and web pages, etc...)

  • A platform-specific 16-bit or 32-bit encoding in programs, using type wchar_t

So generally, universally encoded files are in UTF-8. They use a variable number of bytes for encoding characters, from 1(ASCII characters) to 4. This means you cannot directly test the individual chars using a std::string

For this, you need to convert the UTF-8 string to wchar_t string, stored in a std::wstring .

For this, use a converter defined like this:

std::wstring_convert<std::codecvt_utf8<wchar_t> > converter;

And convert like that:

std::wstring unicodeString = converter.from_bytes(utf8String);

You can then access the individual unicode characters. Don't forget to put a "L" before each string literals, to make it a unicode string literal. For instance:

if(unicodeString[i]==L'仮')
{
    info("this is some japanese character");
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM