简体   繁体   中英

How to read a file's contents into a char16_t array in C++?

You can read a file's contents into a char array using the following function:

void readFileContentsIntoCharArray(char* charArray, size_t sizeOfArray) {
    std::ifstream inputFileStream;
    inputFileStream.read(charArray, sizeOfArray);
}

Now the file is written in UTF-16LE, so I want to read the file's contents into a char16_t array in order to process it more easily later on. I tried the following code.

void readUTF16FileContentsIntoChar16Array(char16_t* char16Array, size_t sizeOfArray) {
    std::ifstream inputFileStream;
    inputFileStream.read(char16Array, sizeOfArray);
}

Ofcourse it didn't work. std::ifstream doesn't accept char16_t . I've been searching for a solution for a long time, but the only relevant one I've found so far is https://stackoverflow.com/a/10504278/1031769 , which doesn't help because it uses wchar_t instead of char16_t .

How to make it work with char16_t ?

I have created a sample UTF-16LE file and this code was able to read it correctly. You can give it a try:

std::string readUTF16(const char* filename) {
    std::wifstream file(filename, std::ios::binary);
    file.imbue(std::locale(file.getloc(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::little_endian>));

    std::wstring ws;
    for(wchar_t c; file.get(c); ) {
        ws += (char16_t) c;
    }
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
    return converter.to_bytes(ws);
}

You could read the bytes into a char16_t array and then convert the endianness manually (different architectures store wide characters in different memory order).

To do that you have to be able to detect the endianness of the machine you are running on.

I use this for this example but you may want to use a proper library version that has portable compile time checking:

bool is_little_endian()
{
    char16_t const c = 0x0001;
    return *reinterpret_cast<char const*>(&c);
}

Then you could do this:

std::u16string read_utf16le(std::string const& filename)
{
    // open at end to get size.
    std::ifstream ifs(filename, std::ios::binary|std::ios::ate);

    if(!ifs)
        throw std::runtime_error(std::strerror(errno));

    auto end = ifs.tellg();
    ifs.seekg(0, std::ios::beg);
    auto size = std::size_t(end - ifs.tellg());

    if(size % 2)
        throw std::runtime_error("bad utf16 format (odd number of bytes)");

    std::u16string u16;
    u16.resize(size / 2);

    if(u16.empty())
        throw std::runtime_error("empty file");

    if(!ifs.read((char*)&u16[0], size))
        throw std::runtime_error("error reading file");

    if(!is_little_endian())
    {
        // convert from big endian (swap bytes)
        std::transform(std::begin(u16), std::end(u16), std::begin(u16), [](char16_t c){
            auto p = reinterpret_cast<char*>(&c);
            std::swap(p[0], p[1]);
            return c;
        });
    }

    return u16;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM