[英]C++ UTF-8/ASCII to UTF-16 in MFC
How can I convert a (text) file from UTF-8/ASCII to UTF-16 before it will be displaying in a MFC program?在 MFC 程序中显示之前,如何将(文本)文件从 UTF-8/ASCII 转换为 UTF-16? Because MFC uses 16 bits per character and the most (text) files on windows use UTF-8 or ASCII.因为 MFC 每个字符使用 16 位,并且 windows 上的大多数(文本)文件使用 UTF-8 或 ASCII。
The simple answer is called MultiByteToWideChar and WideCharToMultiByte to do the reverse conversion.简单的答案称为MultiByteToWideChar和WideCharToMultiByte进行反向转换。 There's also CW2A and CA2W that are a little simpler to use.还有CW2A和CA2W使用起来更简单一些。
However, I would strongly recommand against using these functions directly.但是,我强烈建议不要直接使用这些功能。 You have the pain of handling character buffers manually with the risk of creating memory corruption or security holes.您有手动处理字符缓冲区的痛苦,并有创建 memory 损坏或安全漏洞的风险。
It's much better to use a library based on std::string and/or iterators.使用基于 std::string 和/或迭代器的库要好得多。 For example, utf8cpp .例如, utf8cpp 。 This one has the advantage to be small, header-only and multiplatform.这个的优点是体积小、只有标题和多平台。
Actually, you can do it very simply, using the CStdioFile
and CString
classes provided by MFC
.实际上,你可以很简单地做到这一点,使用MFC
提供的CStdioFile
和CString
类。 The MFC
library is a very powerful and comprehensive one (albeit notwithstanding some major oddities, and even bugs); MFC
库是一个非常强大和全面的库(尽管有一些主要的奇怪之处,甚至是错误); but, if you're already using it, then use it to its fullest extent:但是,如果您已经在使用它,那么请充分利用它:
...
const wchar_t* inpPath = L"<path>\\InpFile.txt"; // These values are given just...
const wchar_t* outPath = L"<path>\\outFile.txt"; // ... for illustrative purposes!
CStdioFile inpFile(inpPath, CFile::modeRead | CFile::typeText);
CStdioFile outFile(outPath, CFile::modeWrite | CFile::modeCreate | CFile::typeText
| CFile::typeUnicode); // Note the Unicode flag - will create UTF-16LE file!
CString textBuff;
while (inpFile.ReadString(textBuff)) {
outFile.WriteString(textBuff);
outFile.WriteString(L"\n");
}
inpFile.Close();
outFile.Close();
...
Of course, you will need to change the code (a bit) if you want the input and output files to have the same path, but that wouldn't mean changing the basic premise!当然,如果您希望输入和 output 文件具有相同的路径,则需要更改代码(一点),但这并不意味着更改基本前提!
With this approach, there is no concern for any library calls to convert character strings - just let MFC
do it for you, when it's reading/writing it's (Unicode) CString
object!使用这种方法,无需担心任何库调用来转换字符串 - 只需让MFC
为您完成,当它读取/写入它的(Unicode) CString
对象时!
Note: Compiled and tested with MSVC (VS-2019), 64-bit, in Unicode mode.注意:在 Unicode 模式下使用 64 位 MSVC (VS-2019) 编译和测试。
EDIT: Maybe I misunderstood your question, If you don't want to actually convert the file, but just display the contents, then take away all references in my code to outFile
and just do stuff with each textBuffer
object you read.编辑:也许我误解了你的问题,如果你不想实际转换文件,而只是显示内容,然后将我的代码中的所有引用移到outFile
并只对你阅读的每个textBuffer
做一些事情。 The CString
class takes care of all the required ASCII/UTF-8/UTF-16LE conversions. CString
class 负责所有必需的 ASCII/UTF-8/UTF-16LE 转换。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.