I'm trying to read a file which has UTF-16LE coding with BOM. I tried this code
#include <iostream>
#include <fstream>
#include <locale>
#include <codecvt>
int main() {
std::wifstream fin("/home/asutp/test");
fin.imbue(std::locale(fin.getloc(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
if (!fin) {
std::cout << "!fin" << std::endl;
return 1;
}
if (fin.eof()) {
std::cout << "fin.eof()" << std::endl;
return 1;
}
std::wstring wstr;
getline(fin, wstr);
std::wcout << wstr << std::endl;
if (wstr.find(L"Test") != std::string::npos) {
std::cout << "Found" << std::endl;
} else {
std::cout << "Not found" << std::endl;
}
return 0;
}
The file can contain Latin and Cyrillic. I created the file with a string "Test тест". And this code returns me
/home/asutp/CLionProjects/untitled/cmake-build-debug/untitled
Not found
Process finished with exit code 0
I'm on Linux Mint 18.3 x64, Clion 2018.1
Tried
Ideally you should save files in UTF8, because Window has much better UTF8 support (aside from displaying Unicode in console window), while POSIX has limited UTF16 support. Even Microsoft products favor UTF8 for saving files in Windows.
As an alternative, you can read the UTF16 file in to a buffer and convert that to UTF8
std::ifstream fin("utf16.txt", std::ios::binary);
fin.seekg(0, ios::end);
size_t size = (size_t)fin.tellg();
//skip BOM
fin.seekg(2, ios::beg);
size -= 2;
std::u16string u16((size / 2) + 1, '\0');
fin.read((char*)&u16[0], size);
std::string utf8 = std::wstring_convert<
std::codecvt_utf8_utf16<char16_t>, char16_t>{}.to_bytes(u16);
std::ifstream fin("utf16.txt", std::ios::binary); //skip BOM fin.seekg(2); //read as raw bytes std::stringstream ss; ss << fin.rdbuf(); std::string bytes = ss.str(); //make sure len is divisible by 2 int len = bytes.size(); if(len % 2) len--; std::wstring sw; for(size_t i = 0; i < len;) { //little-endian int lo = bytes[i++] & 0xFF; int hi = bytes[i++] & 0xFF; sw.push_back(hi << 8 | lo); } std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert; std::string utf8 = convert.to_bytes(sw);
Replace by this - std::wstring::npos
(not std::string::npos
) -, and your code must work :
...
//std::wcout << wstr << std::endl;
if (wstr.find(L"Test") == std::wstring::npos) {
std::cout << "Not Found" << std::endl;
} else {
std::cout << "found" << std::endl;
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.