將字符串的第一個字符傳遞到另一個字符串中，並使用std :: stoi獲取整數值，以測試它是使用UTF-8還是Unicode（UTF-16）

Question

想知道是否有人可以在這個問題上提供幫助？

眾所周知，以UTF-8和Unicode（UTF-16）編碼的.txt文件具有隱藏字符。

我正在編寫一個程序，該程序采用具有不同編碼UTF-8和Unicode（UTF-16）的選定.txt文件。 我需要獲取字符串的第一個字符並將其存儲。 我需要對該字符串進行處理，就是將其放入一個單獨的字符串中，然后使用std :: stoi來獲取隱藏字符的int值。

    //OPEN THE FILE IN BINARY
  std::fstream mazeFile(mazeFileLoc, std::ios::in | std::ios::binary);

  if (mazeFile.is_open())
  {
      //STORE THE FIRST CHARACTER AS AN CHAR VALUE
    char test = mazeFile.get();
    std::cout << "First Character is : " << test << std::endl;

    //PUT THE CHAR VALUE IN A STRING
    std::string strTest;
    strTest.insert(strTest.begin(), test);
    std::cout << "String First Character is : " << strTest << std::endl;

    //USE STOI TO GET THE INT VALUE OF STRING
    int testIntVal = std::stoi(strTest);
    std::cout << "Int Value of first character is : " << testIntVal << std::endl;

    mazeFile.close();
  }

我遇到的問題是，當我使用stoi時，在運行時會標記一個錯誤。

有誰知道為什么這可能會標記錯誤而不進行轉換？

Git鏈接： https : //github.com/xSwalshx/ANN.git

Answer 1

std::stoi需要如下進行異常處理：

int testIntVal; 
try
{
    testIntVal = std::stoi(strTest);
    std::cout << "Int Value of first character is : " << testIntVal << std::endl;
}
catch(...)
{
    std::cout << "not a valid integer\n";
}

這不是檢查文件編碼的正確方法。

您必須檢查BOM（字節順序標記），如果文件具有BOM，則可以確定格式。

如果文件沒有BOM，則您必須猜測格式是什么，您不確定。 如果文本查看器將內容顯示為“ 123”，則將其存儲為

0x31 0x32 0x33 //in UTF8 (same for ASCII characters)
0x31 0x00 0x32 0x00 0x33 0x00 //in UTF16
0x00 0x31 0x00 0x32 0x00 0x33 //in UTF16 big-endian

請注意，對於ASCII字符中的偶數字節，UTF16-LE的零，對於奇數字節的UTF16-LE的零，而UTF8沒有零。 您可以從一個很弱的假設開始，即該文件僅包含ASCII字符。 然后猜測一下編碼。 請參見下面的示例。

為了簡化操作，您應該使用UTF8來存儲文本。 在Windows中，只需將UTF16轉換為UTF8並存儲，然后讀取UTF8並轉換為UTF16。 這也將與其他系統兼容。

const int FORMAT_UTF8 = 0;
const int FORMAT_UTF16 = 1;
const int FORMAT_UTF16BE = 2;

int get_file_encoding(const char* filename)
{
    printf("filename: %s ", filename);
    unsigned char buf[100] = { 0 };
    std::ifstream fin(filename, std::ios::binary);
    fin.read((char*)buf, sizeof(buf));
    int size = fin.gcount();

    //check for BOM
    if(size >= 3 && memcmp(buf, "\xef\xbb\xbf", 3) == 0)
    {
        printf("UTF8\n");
        return FORMAT_UTF8;
    }

    if(size >= 2 && memcmp(buf, "\xff\xfe", 2) == 0)
    {
        printf("UTF16\n");
        return FORMAT_UTF16;
    }

    if(size >= 2 && memcmp(buf, "\xfe\xff", 2) == 0)
    {
        printf("UTF16 big endian\n");
        return FORMAT_UTF16BE;
    }

    //BOM not found, let's take a guess!
    for(int i = 0; i < size - 1; i += 2)
    {
        if(buf[i + 1] == 0)
        {
            printf("assume UTF16\n");
            return FORMAT_UTF16;
        }

        if(buf[i] == 0)
        {
            printf("assume UTF16 big endian\n");
            return FORMAT_UTF16BE;
        }
    }

    printf("Assume ASCII or UTF8\n");
    return FORMAT_UTF8;
}

將字符串的第一個字符傳遞到另一個字符串中，並使用std :: stoi獲取整數值，以測試它是使用UTF-8還是Unicode（UTF-16）

問題描述

1 個解決方案

解決方案1
0 2018-11-30 16:57:50

將字符串的第一個字符傳遞到另一個字符串中，並使用std :: stoi獲取整數值，以測試它是使用UTF-8還是Unicode（UTF-16）

問題描述

1 個解決方案

解決方案1 0 2018-11-30 16:57:50

解決方案1
0 2018-11-30 16:57:50