C ++：從文件讀取alt鍵符號

Question

我正在嘗試從一個Unicode UTF-8文件讀取Alt鍵符號，然后寫入另一個文件。

輸入文件看起來像這樣>

ỊịỌọỤụṄṅ

輸出文件如下所示>

239 187 191 225 187 138 225 187 139 225 187 140 225 187 141 225 187 164 225 187 165 225 185 132 225 185 133（每3位數組合后為'\\ n'，而不是''）

碼：

#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <Windows.h>


///convert as ANSI - display as Unicode
std::wstring test1(const char* filenamein)
{
    std::wifstream fs(filenamein);
    if(!fs.good()) 
    { 
        std::cout << "cannot open input file [" << filenamein << "]\n" << std::endl;  

        return NULL; 
    }

    wchar_t c; 
    std::wstring s;

    while(fs.get(c)) 
    { 
        s.push_back(c); 
        std::cout << '.' << std::flush; 
    }

    return s;

}

int printToFile(const char* filenameout, std::wstring line)
{
    std::wofstream fs;

    fs.open(filenameout);

    if(!fs.is_open())
        return -1;

    for(unsigned i = 0; i < line.length(); i++)
    {
        if(line[i] <= 126)  //if its standard letter just print to file
            fs << (char)line[i];
        else  //otherwise do this.
        {
            std::wstring write = L"";

            std::wostringstream ss;
            ss << (int)line[i];

            write = ss.str();

            fs << write;
            fs << std::endl;
        }
    }

    fs << std::endl;


    //2nd test, also fails
    char const *special_character[] = { "\u2780", "\u2781", "\u2782",
  "\u2783", "\u2784", "\u2785", "\u2786", "\u2787", "\u2788", "\u2789" };

    //prints out four '?'
    fs << special_character[0] << std::endl;
    fs << special_character[1] << std::endl;
    fs << special_character[2] << std::endl;
    fs << special_character[3] << std::endl;

    fs.close();

    return 1;
}

int main(int argc, char* argv[])
{
    std::wstring line = test1(argv[1]);

    if(printToFile(argv[2], line) == 1)
        std::cout << "Writing success!" << std::endl;
    else std::cout << "Writing failed!" << std::endl;



    return 0;
}

我所期望的與該表中的值類似：

http://tools.oratory.com/altcodes.html

Answer 1

好的，根據您的代碼和注釋，我了解以下內容：

您有一個包含UTF-8編碼字符串的輸入文件
您正在Windows上將其閱讀為寬字符，但沒有插入任何語言環境

所以這是實際發生的事情：

您的代碼一次正確地讀取一個字節的文件，作為ANSI文件（就像它是Win1252編碼的一樣）。 然后，程序將顯示所有字節的代碼值。 我可以確認您在帖子中顯示的字節列表是utf-8編碼字符串ỊịỌọỤụṄṅ ，不同之處在於notepad ++在開始時添加了字節順序標記（U + FEFF），而這在UTF8文件中通常不使用-BOM是3個字節239187191（十進制）或0xef 0xbb 0xbf（十六進制）

那你該怎么辦？

一種簡單的解決方案（使用Windows時）是要求notepad ++將文件編碼為UTF16LE，這是Windows中的本機unicode格式。 這樣，您實際上將讀取unicode字符。

另一種方法是指示您的代碼將文件作為UTF8處理。 在Linux上這是微不足道的，但在Windows上卻很難，因為從VC2010開始僅對UTF8進行了正確處理。 SO的另一篇文章展示了如何在C ++流中注入UTF8語言環境。

很抱歉沒有提供代碼，但是我只有一個不支持UTF8流的舊VC2008 ...而且我討厭提供未經測試的代碼。

C ++：從文件讀取alt鍵符號

問題描述

1 個解決方案

解決方案1
2 已采納 2016-03-08 19:59:57

C ++：從文件讀取alt鍵符號

問題描述

1 個解決方案

解決方案1 2 已采納 2016-03-08 19:59:57

解決方案1
2 已采納 2016-03-08 19:59:57