简体   繁体   中英

How to read correctly european characters (from file and command shell) in C++?

In my program I'm reading a text file using ifstream for opening it using stringstream for reading each line (using getline for tokenize); when I get an european character, like "è", it saves this character with "├¿", and this works as expected, because I'm using string and not wstring. But when I get a line from cmd (I'm using Windows) the word "è" is saved as "è" inside the string. My objective is to compare strings read from file and from command shell, but if they are encoded in different ways I'm stucked, because "è".compare("├¿") is naturally != 0. I would like to have both "wrong" or both correct, because my aim is not showing them but just counting occurrencies. I'm programming using latest version of Code::Blocks, with MinGW 32-bit and gcc 4.7.1

UPDATE (code)

ifstream file;
stringstream stream;

file.open(path);

while( file ){

    while( getline(file,line) ){

        it = 1;
        stream << line;

        if( line.compare("")!=0 ){
            while( getline(stream,token,'\t')) {

                if( it == 1 ){
                    ID = atoi( token.c_str() );
                }
                if( it == 2 ){
                    word = token;

                    if( !case_sensitive ){
                        word = get_lower_case( word );
                    }
                }
                if( it == tags_index ){
                    pos = token;
                }

                it++;
            }

            data.push_back(make_row(ID,word,pos));
        }

        stream.clear();
    }
}

This is part of the function I use to read file (I have a struct for store each entry of a tabulated file, my problem is with "word").

getline(cin,sentence);

[...]

stringstream stream;
string token;
vector<string> tokens;

stream << sentence;
while( getline(stream,token,' ') ){
    tokens.push_back(token);
}
stream.clear();

This is how I read the input stream in the command shell.

You can try setting (imbuing) the locale

#include <iostream>
#include <locale>

int main()
{
    auto loc = std::locale("it_IT"); // Example: Italian locale
    std::cin.imbue(loc); // imbue it to input stream, can use a fstream here
    std::cout.imbue(loc); // imbue it to output stream

    // rest of the program
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM