在C ++中處理非Ascii Chars

Question

我在C ++中遇到了非Ascii字符的一些問題。 我有一個包含非ascii字符的文件，我通過文件處理在C ++中閱讀。 在讀取文件（比如1.txt）后，我將數據存儲到字符串流中並將其寫入另一個文件（比如2.txt）。

假設1.txt包含：

ação

在2.txt中，我應該得到相同的輸出，但非Ascii字符在2.txt中打印為其十六進制值。

另外，我很確定C ++只將Ascii字符作為Ascii處理。

請幫助您了解如何在2.txt中正確打印這些字符

編輯：

首先是整個過程的Psuedo代碼：

1.Shell script to Read from DB one Value and stores in 11.txt
2.CPP Code(a.cpp) reading 11.txt and Writing to f.txt

數據存在於正在讀取的DB中： Instalação

文件11.txt包含： InstalaÃ§Ã£o

File F.txt包含： InstalaÃ§Ã£o

屏幕上的Instalação輸出： Instalação

a.cpp

#include <iterator>
#include <iostream>
#include <algorithm>
#include <sstream>
#include<fstream>
#include <iomanip>

using namespace std;
int main()
{
    ifstream myReadFile;
    ofstream f2;
    myReadFile.open("11.txt");
    f2.open("f2.txt");
    string output;
    if (myReadFile.is_open()) 
    {
        while (!myReadFile.eof())
        {
            myReadFile >> output;
                //cout<<output;

            cout<<"\n";

            std::stringstream tempDummyLineItem;
            tempDummyLineItem <<output;
            cout<<tempDummyLineItem.str();
            f2<<tempDummyLineItem.str();
        }
    }
    myReadFile.close();
    return 0;
}

Locale說：

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Answer 1

至少如果我理解你所追求的是什么，我會做這樣的事情：

#include <iterator>
#include <iostream>
#include <algorithm>
#include <sstream>
#include <iomanip>

std::string to_hex(char ch) {
    std::ostringstream b;
    b << "\\x" << std::setfill('0') << std::setw(2) << std::setprecision(2)
        << std::hex << static_cast<unsigned int>(ch & 0xff);
    return b.str();
}

int main(){
    // for test purposes, we'll use a stringstream for input
    std::stringstream infile("normal stuff. weird stuff:\x01\xee:back to normal");

    infile << std::noskipws;

    // copy input to output, converting non-ASCII to hex:
    std::transform(std::istream_iterator<char>(infile),
        std::istream_iterator<char>(),
        std::ostream_iterator<std::string>(std::cout),
        [](char ch) {
            return (ch >= ' ') && (ch < 127) ?
                std::string(1, ch) :
                to_hex(ch);
    });
}

Answer 2

聽起來像是一個utf8問題。 因為你沒有用c ++標記你的問題11 這是一篇關於unicode和c ++流的優秀文章。

從您更新的代碼中，讓我解釋一下發生了什么。 您創建一個文件流來讀取您的文件。 在內部，文件流只識別chars ，否則告訴它。 在大多數機器上， char只能容納8位數據，但文件中的字符使用的位數超過8位。 為了能夠正確讀取您的文件，您需要知道它是如何編碼的。 最常見的編碼是UTF-8，每個字符使用1到4個chars 。

一旦知道了編碼，就可以使用wifstream（用於UTF-16）或imbue()用於其他編碼的語言環境。

更新：如果您的文件是ISO-88591（來自上面的評論），請嘗試此操作。

wifstream myReadFile;
myReadFile.imbue(std::locale("en_US.iso88591"));
myReadFile.open("11.txt");

在C ++中處理非Ascii Chars

問題描述

2 個解決方案

解決方案1
2 2013-07-15 08:04:18

解決方案2
1 2013-07-15 08:31:58

在C ++中處理非Ascii Chars

問題描述

2 個解決方案

解決方案1 2 2013-07-15 08:04:18

解決方案2 1 2013-07-15 08:31:58

解決方案1
2 2013-07-15 08:04:18

解決方案2
1 2013-07-15 08:31:58