简体   繁体   中英

How to write and read UTF16 file on Win using C++

There is a plenty of questions on SO regarding this, but most of them do not mention writing wstring back to file. So for example I found this for reading:

// open as a byte stream
std::wifstream fin("/testutf16.txt", std::ios::binary);
// apply BOM-sensitive UTF-16 facet
fin.imbue(std::locale(fin.getloc(),
    new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
// read  
std::wstring ws;
for(wchar_t c; fin.get(c); )
{
    std::cout << std::showbase << std::hex << c << '\n';
    ws.push_back(c);
}

I tried similar stuff for writing:

    std::wofstream wofs("/utf16dump.txt", std::ios::binary);
    wofs.imbue(std::locale(wofs.getloc(),
        new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
    wofs << ws;

but it produces garbage, (or Notpad++ and vim cant interpret it). As mentioned in the title Im on Win, native C++, VS 2010.

Input file:

t€stUTF16✡
test

This is what is the result:

t€stUTF16✡
test

convert to hex:

0000000: 7400 ac20 7300 7400 5500 5400 4600 3100  t.. s.t.U.T.F.1.
0000010: 3600 2127 0d00 0a00 7400 6500 7300 7400  6.!'....t.e.s.t.
0000020: 0a                                       
                     ...

vim normal output:

t^@¬ s^@t^@U^@T^@F^@1^@6^@!'^M^@ ^@t^@e^@s^@t^@

EDIT: I ended up using UTF8. Andrei Alexandrescu says it is the best encoding so no big loss. :)

Your similar code -- isn't. You removed the std::ios::binary style, despite the fact that the documentation says

The byte stream should be written to a binary file; it can be corrupted if written to a text file.

NL->CRLF conversion in ASCII mode isn't going to do pretty things to UTF-16 files, since it will insert one byte 0x0D instead of two bytes 0x00 0x0D.

It is easy if you use the C++11 standard (because there are a lot of additional includes like "utf8" which solves this problems forever).

But if you want to use multi-platform code with older standards, you can use this method to write with streams:

  1. Read the article about UTF converter for streams
  2. Add stxutif.h to your project from sources above
  3. Open the file in ANSI mode and add the BOM to the start of a file, like this:

     std::ofstream fs; fs.open(filepath, std::ios::out|std::ios::binary); unsigned char smarker[3]; smarker[0] = 0xEF; smarker[1] = 0xBB; smarker[2] = 0xBF; fs << smarker; fs.close(); 
  4. Then open the file as UTF and write your content there:

     std::wofstream fs; fs.open(filepath, std::ios::out|std::ios::app); std::locale utf8_locale(std::locale(), new utf8cvt<false>); fs.imbue(utf8_locale); fs << .. // Write anything you want... 

For output, you want to use generate_header instead of consume_header .

See http://en.cppreference.com/w/cpp/locale/codecvt_mode

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM