简体   繁体   中英

How to write Unicode string to file with UTF-8 BOM by C++?

I can use ofstream to write to UTF-8 BOM file. I can also write Unicode string to file using wofstream and imbue with utf8_locale ( codecvt_utf8 ). However, I cannot find out how to write Unicode string to file with UTF-8 BOM encoding.

BOM is just first optional bytes at the beginning of the file to specify its encoding. it has nothing to do directly to std::fstream as fstream is just a file stream for reading and writing random bytes/characters.

you just need to manually write the BOM before you continue writing your utf8 encoded string.

unsigned uint8_t utf8BOM[] = {0xEF,0xBB,0xBF}; 
fileStream.write(utf8BOM,sizeof(utf8BOM));
//write the rest of the utf8 encoded string..

The example below works fine in VS 2015 or new gcc compilers:

#include <iostream>
#include <string>
#include <fstream>
#include <codecvt>

int main()
{
    std::string utf8 = u8"日本医療政策機構\nPhở\n";
    std::ofstream f("c:\\test\\ut8.txt");

    unsigned char bom[] = { 0xEF,0xBB,0xBF };
    f.write((char*)bom, sizeof(bom));

    f << utf8;
    return 0;
}

In older versions of Visual Studio you have to declare UTF16 string (with L prefix), then convert from UTF16 to UTF8:

#include <iostream>
#include <string>
#include <fstream>
#include <Windows.h>

std::string get_utf8(const std::wstring &wstr)
{
    if (wstr.empty()) return std::string();
    int sz = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), 0, 0, 0, 0);
    std::string res(sz, 0);
    WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), &res[0], sz, 0, 0);
    return res;
}

std::wstring get_utf16(const std::string &str)
{
    if (str.empty()) return std::wstring();
    int sz = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), 0, 0);
    std::wstring res(sz, 0);
    MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &res[0], sz);
    return res;
}

int main()
{
    std::string utf8 = get_utf8(L"日本医療政策機構\nPhở\n");

    std::ofstream f("c:\\test\\ut8.txt");

    unsigned char bom[] = { 0xEF,0xBB,0xBF };
    f.write((char*)bom, sizeof(bom));

    f << utf8;
    return 0;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM