简体   繁体   English

如何使用C ++在Win上写入和读取UTF16文件

[英]How to write and read UTF16 file on Win using C++

There is a plenty of questions on SO regarding this, but most of them do not mention writing wstring back to file. SO对此有很多问题,但是大多数问题都没有提到将wstring写回文件。 So for example I found this for reading: 因此,例如,我发现这是为了阅读:

// open as a byte stream
std::wifstream fin("/testutf16.txt", std::ios::binary);
// apply BOM-sensitive UTF-16 facet
fin.imbue(std::locale(fin.getloc(),
    new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
// read  
std::wstring ws;
for(wchar_t c; fin.get(c); )
{
    std::cout << std::showbase << std::hex << c << '\n';
    ws.push_back(c);
}

I tried similar stuff for writing: 我尝试过类似的东西来写作:

    std::wofstream wofs("/utf16dump.txt", std::ios::binary);
    wofs.imbue(std::locale(wofs.getloc(),
        new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
    wofs << ws;

but it produces garbage, (or Notpad++ and vim cant interpret it). 但是会产生垃圾,(或者Notpad ++和vim无法解释它)。 As mentioned in the title Im on Win, native C++, VS 2010. 如标题“即时获胜,本机C ++,VS 2010”中所述。

Input file: 输入文件:

t€stUTF16✡
test

This is what is the result: 这是什么结果:

t€stUTF16✡
test

convert to hex: 转换为十六进制:

0000000: 7400 ac20 7300 7400 5500 5400 4600 3100  t.. s.t.U.T.F.1.
0000010: 3600 2127 0d00 0a00 7400 6500 7300 7400  6.!'....t.e.s.t.
0000020: 0a                                       
                     ...

vim normal output: vim正常输出:

t^@¬ s^@t^@U^@T^@F^@1^@6^@!'^M^@ ^@t^@e^@s^@t^@ t ^ @¬s ^ @ t ^ @ U ^ @ T ^ @ F ^ @ 1 ^ @ 6 ^ @!'^ M ^ @ ^ @ t ^ @ e ^ @ s ^ @ t ^ @

EDIT: I ended up using UTF8. 编辑:我最终使用UTF8。 Andrei Alexandrescu says it is the best encoding so no big loss. Andrei Alexandrescu说,这是最好的编码,因此不会造成太大损失。 :) :)

Your similar code -- isn't. 您的类似代码-不是。 You removed the std::ios::binary style, despite the fact that the documentation says 您删除了std::ios::binary样式,尽管文档中

The byte stream should be written to a binary file; 字节流应写入二进制文件; it can be corrupted if written to a text file. 如果将其写入文本文件,则可能会损坏它。

NL->CRLF conversion in ASCII mode isn't going to do pretty things to UTF-16 files, since it will insert one byte 0x0D instead of two bytes 0x00 0x0D. ASCII模式下的NL-> CRLF转换不会对UTF-16文件做漂亮的事情,因为它将插入一个字节0x0D而不是两个字节0x00 0x0D。

It is easy if you use the C++11 standard (because there are a lot of additional includes like "utf8" which solves this problems forever). 如果使用C++11标准,这很容易(因为还有很多其他附加内容,例如"utf8" ,可以永久解决此问题)。

But if you want to use multi-platform code with older standards, you can use this method to write with streams: 但是,如果要在较早的标准下使用多平台代码,则可以使用此方法来编写流:

  1. Read the article about UTF converter for streams 阅读有关流的UTF转换器的文章
  2. Add stxutif.h to your project from sources above 从上面的源将stxutif.h添加到您的项目中
  3. Open the file in ANSI mode and add the BOM to the start of a file, like this: 以ANSI模式打开文件,然后将BOM添加到文件的开头,如下所示:

     std::ofstream fs; fs.open(filepath, std::ios::out|std::ios::binary); unsigned char smarker[3]; smarker[0] = 0xEF; smarker[1] = 0xBB; smarker[2] = 0xBF; fs << smarker; fs.close(); 
  4. Then open the file as UTF and write your content there: 然后以UTF格式打开文件并在其中写入内容:

     std::wofstream fs; fs.open(filepath, std::ios::out|std::ios::app); std::locale utf8_locale(std::locale(), new utf8cvt<false>); fs.imbue(utf8_locale); fs << .. // Write anything you want... 

For output, you want to use generate_header instead of consume_header . 对于输出,您想使用generate_header而不是consume_header

See http://en.cppreference.com/w/cpp/locale/codecvt_mode 参见http://en.cppreference.com/w/cpp/locale/codecvt_mode

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM