[英]How to write and read UTF16 file on Win using C++
There is a plenty of questions on SO regarding this, but most of them do not mention writing wstring back to file. SO对此有很多问题,但是大多数问题都没有提到将wstring写回文件。 So for example I found this for reading:
因此,例如,我发现这是为了阅读:
// open as a byte stream
std::wifstream fin("/testutf16.txt", std::ios::binary);
// apply BOM-sensitive UTF-16 facet
fin.imbue(std::locale(fin.getloc(),
new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
// read
std::wstring ws;
for(wchar_t c; fin.get(c); )
{
std::cout << std::showbase << std::hex << c << '\n';
ws.push_back(c);
}
I tried similar stuff for writing: 我尝试过类似的东西来写作:
std::wofstream wofs("/utf16dump.txt", std::ios::binary);
wofs.imbue(std::locale(wofs.getloc(),
new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
wofs << ws;
but it produces garbage, (or Notpad++ and vim cant interpret it). 但是会产生垃圾,(或者Notpad ++和vim无法解释它)。 As mentioned in the title Im on Win, native C++, VS 2010.
如标题“即时获胜,本机C ++,VS 2010”中所述。
Input file: 输入文件:
t€stUTF16✡
test
This is what is the result: 这是什么结果:
t€stUTF16✡
test
convert to hex: 转换为十六进制:
0000000: 7400 ac20 7300 7400 5500 5400 4600 3100 t.. s.t.U.T.F.1.
0000010: 3600 2127 0d00 0a00 7400 6500 7300 7400 6.!'....t.e.s.t.
0000020: 0a
...
vim normal output: vim正常输出:
t^@¬ s^@t^@U^@T^@F^@1^@6^@!'^M^@ ^@t^@e^@s^@t^@
t ^ @¬s ^ @ t ^ @ U ^ @ T ^ @ F ^ @ 1 ^ @ 6 ^ @!'^ M ^ @ ^ @ t ^ @ e ^ @ s ^ @ t ^ @
EDIT: I ended up using UTF8. 编辑:我最终使用UTF8。 Andrei Alexandrescu says it is the best encoding so no big loss.
Andrei Alexandrescu说,这是最好的编码,因此不会造成太大损失。 :)
:)
Your similar code -- isn't. 您的类似代码-不是。 You removed the
std::ios::binary
style, despite the fact that the documentation says 您删除了
std::ios::binary
样式,尽管文档中说
The byte stream should be written to a binary file;
字节流应写入二进制文件; it can be corrupted if written to a text file.
如果将其写入文本文件,则可能会损坏它。
NL->CRLF conversion in ASCII mode isn't going to do pretty things to UTF-16 files, since it will insert one byte 0x0D instead of two bytes 0x00 0x0D. ASCII模式下的NL-> CRLF转换不会对UTF-16文件做漂亮的事情,因为它将插入一个字节0x0D而不是两个字节0x00 0x0D。
It is easy if you use the C++11
standard (because there are a lot of additional includes like "utf8"
which solves this problems forever). 如果使用
C++11
标准,这很容易(因为还有很多其他附加内容,例如"utf8"
,可以永久解决此问题)。
But if you want to use multi-platform code with older standards, you can use this method to write with streams: 但是,如果要在较早的标准下使用多平台代码,则可以使用此方法来编写流:
stxutif.h
to your project from sources above stxutif.h
添加到您的项目中 Open the file in ANSI mode and add the BOM to the start of a file, like this: 以ANSI模式打开文件,然后将BOM添加到文件的开头,如下所示:
std::ofstream fs; fs.open(filepath, std::ios::out|std::ios::binary); unsigned char smarker[3]; smarker[0] = 0xEF; smarker[1] = 0xBB; smarker[2] = 0xBF; fs << smarker; fs.close();
Then open the file as UTF
and write your content there: 然后以
UTF
格式打开文件并在其中写入内容:
std::wofstream fs; fs.open(filepath, std::ios::out|std::ios::app); std::locale utf8_locale(std::locale(), new utf8cvt<false>); fs.imbue(utf8_locale); fs << .. // Write anything you want...
For output, you want to use generate_header
instead of consume_header
. 对于输出,您想使用
generate_header
而不是consume_header
。
See http://en.cppreference.com/w/cpp/locale/codecvt_mode 参见http://en.cppreference.com/w/cpp/locale/codecvt_mode
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.