简体   繁体   English

如何用C ++用UTF-8 BOM将Unicode字符串写入文件?

[英]How to write Unicode string to file with UTF-8 BOM by C++?

I can use ofstream to write to UTF-8 BOM file. 我可以使用ofstream写入UTF-8 BOM文件。 I can also write Unicode string to file using wofstream and imbue with utf8_locale ( codecvt_utf8 ). 我还可以使用wofstream将Unicode字符串写入文件,并使用utf8_localecodecvt_utf8codecvt_utf8 However, I cannot find out how to write Unicode string to file with UTF-8 BOM encoding. 但是,我不知道如何将Unicode字符串写入具有UTF-8 BOM编码的文件。

BOM is just first optional bytes at the beginning of the file to specify its encoding. BOM只是文件开头的第一个可选字节,用于指定其编码。 it has nothing to do directly to std::fstream as fstream is just a file stream for reading and writing random bytes/characters. 它与std::fstream没有直接关系,因为fstream只是用于读取和写入随机字节/字符的文件流。

you just need to manually write the BOM before you continue writing your utf8 encoded string. 您只需要手动编写BOM表,然后再继续编写utf8编码的字符串。

unsigned uint8_t utf8BOM[] = {0xEF,0xBB,0xBF}; 
fileStream.write(utf8BOM,sizeof(utf8BOM));
//write the rest of the utf8 encoded string..

The example below works fine in VS 2015 or new gcc compilers: 下面的示例在VS 2015或新的gcc编译器中正常运行:

#include <iostream>
#include <string>
#include <fstream>
#include <codecvt>

int main()
{
    std::string utf8 = u8"日本医療政策機構\nPhở\n";
    std::ofstream f("c:\\test\\ut8.txt");

    unsigned char bom[] = { 0xEF,0xBB,0xBF };
    f.write((char*)bom, sizeof(bom));

    f << utf8;
    return 0;
}

In older versions of Visual Studio you have to declare UTF16 string (with L prefix), then convert from UTF16 to UTF8: 在旧版本的Visual Studio中,您必须声明UTF16字符串(带有L前缀),然后从UTF16转换为UTF8:

#include <iostream>
#include <string>
#include <fstream>
#include <Windows.h>

std::string get_utf8(const std::wstring &wstr)
{
    if (wstr.empty()) return std::string();
    int sz = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), 0, 0, 0, 0);
    std::string res(sz, 0);
    WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), &res[0], sz, 0, 0);
    return res;
}

std::wstring get_utf16(const std::string &str)
{
    if (str.empty()) return std::wstring();
    int sz = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), 0, 0);
    std::wstring res(sz, 0);
    MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &res[0], sz);
    return res;
}

int main()
{
    std::string utf8 = get_utf8(L"日本医療政策機構\nPhở\n");

    std::ofstream f("c:\\test\\ut8.txt");

    unsigned char bom[] = { 0xEF,0xBB,0xBF };
    f.write((char*)bom, sizeof(bom));

    f << utf8;
    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM