简体   繁体   English

如何在C ++中创建Unicode txt文件

[英]How to create a Unicode txt file in C++

There are four kind of encoding options when creating a .txt file in Windows. 在Windows中创建.txt文件时有四种编码选项。

  • ANSI ANSI
  • UNICODE(litte endian) UNICODE(litte endian)
  • UNICODE(big endian) UNICODE(大端)
  • UTF-8 UTF-8

文本文件编码选项

C standard library supports this option, by using FILE . C标准库通过使用FILE支持此选项。

C STL

FILE* file;
file = _wfopen(L"test.txt", L"wt+,ccs=UTF-16LE");

It has been working great, but I found there is no parameter for this in std::ofstream . 它一直很好用,但我发现在std::ofstream没有这个参数。

wofstream myfile;
myfile.open("example.txt", ?????????);

So, I want to know how to create files like this in C++. 所以,我想知道如何在C ++中创建这样的文件。 Is there any solution for this in C++ STL? 在C ++ STL中有没有解决方案?

Starting with C++11, the standard C++ library allows to generate UTF16 text files with the following steps: 从C ++ 11开始,标准C ++库允许使用以下步骤生成UTF16文本文件:

  • build a locale using the C++11 class std::codecvt_utf16 - you can specify endianness in constructor 使用C ++ 11类std::codecvt_utf16构建语言环境 - 您可以在构造函数中指定字节顺序
  • open a file using a std::wofstream in which you will write unicode text 使用std::wofstream打开文件,您将在其中编写unicode文本
  • just imbue the locale into the wide stream and start writing, optionnaly starting with a Byte Order Mark character (U+FEFF) 只需将语言环境imbue到宽流中并开始编写,最后选择以字节顺序标记字符(U + FEFF)开头

Here is an example adapted from the page referenced by @HansPassant in its comment: 这是一个改编自@HansPassant在其评论中引用的页面的示例:

// codecvt_utf16: writing unicode string as UTF-16
#include <iostream>
#include <locale>
#include <string>
#include <codecvt>
#include <fstream>

int main ()
{
  std::wstring str ( { 0xa8, 0xa9 });

  std::locale loc (std::locale(), new std::codecvt_utf16<wchar_t>);
  std::basic_ofstream<wchar_t> ofs ("test.txt");
  ofs.imbue(loc);

  std::cout << "Writing to file (UTF-16)... ";
  ofs << (wchar_t) 0xfeff; // BOM
  ofs << str;
  std::cout << "done!\n";

  return 0;
}

You get an utf16 file starting with a little endian BOM and containing èé 你得到一个utf16文件,从一个小的endian BOM开始,包含èé

(hexadecimal dump: (十六进制转储:

$ od -xc test.txt
0000000      fffe    a800    a900
         376 377  \0 250  \0 251

)

There is no "C STL". 没有“C STL”。 STL stands for Standard Template Library. STL代表标准模板库。 C does not have templates. C没有模板。 You may be referring to the C standard library and C++ standard library. 您可能指的是C标准库和C ++标准库。

The C standard library has no functions for "creating unicode" or converting text to or from unicode. C标准库没有“创建unicode”或将文本转换为unicode或从unicode转换文本的功能。 There is no _wfopen in the C standard libray. 没有_wfopen C标准libray。 You're using a function from the Microsoft C Run-Time Library. 您正在使用Microsoft C运行时库中的功能。

The C++ library does have an API to convert between (UTF-8 and UTF-16) and (UTF-8 and UTF-32) and (system native wide and system native multibyte) encodings: http://en.cppreference.com/w/cpp/locale/codecvt C ++库确实有一个API可以在(UTF-8和UTF-16)和(UTF-8和UTF-32)以及(系统本机宽和系统本机多字节)编码之间进行转换: http//en.cppreference.com / W / CPP /区域/的codecvt

There is hardly any other support for unicode in the standard library. 标准库中几乎没有任何其他对unicode的支持。 You must take care that the string that you're writing is in the encoding that you want it to be and you must explicitly write a BOM if you need to. 您必须注意您正在编写的字符串是您希望的编码,如果需要,您必须明确编写BOM

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM