简体   繁体   中英

getline() doesn't read accented characters correctly

I'm trying to get accented characters from user using getline() command, but it does not print them correctly.

I tried to include some libraries as locale , but it was in vain.

Here's my code:

#include <iostream>
#include <cstdlib>
#include <string>
#include <locale>

using namespace std;

class Pers {
public:
    string name;
    int age;
    string weapon;
};

int main()
{
    setlocale(LC_ALL, "");
    Pers pers;

    cout << "Say the name of your character: ";
    getline(cin, pers.name);
    cout << pers.name;
}

When I type: Mark Coração, this is what I get:

重音字符无法正确显示

How do I fix it?

Actually, the problem does not come from getline() .

std::cout (respectively std::cin ) does not support special characters. For this, you have to use std::wcout (respectively std::wcin ) which uses wide characters (the size of standard characters limits you to what you can find in the ascii table).
You need to use bigger characters to store the special characters too, that is the case of wide characters.
std::string handles standard characters, std::wstring handles wide characters.

A way to do this could be:

std::wstring a(L"Coração");
std::wcout << a << std::endl;

Output:

Coração


To make it work with getline() :

std::wstring a;
getline(std::wcin, a)
std::wcout << a << std::endl;

I hope it can help.

There are 2 levels in a same problem. The problem is that you are using characters outside the ASCII charset. The 2 levels are:

  • how they are converted to narrow characters on input
  • how they will be displayed on output

Windows console is a rather disturbing application in that respect: it is able to internaly process UCS2 characters that is any unicode character in the Basic Multilingual Plane, said differently any character with a code point of at most 0xFFFF. On input into narrow characters, it tries to map any character not represented in the current charset to what it thinks is closer, on output, it just outputs the value of each byte in its current charset. So the most reliable way is to ensure that the current locale has a correct collating sequence and that the console has a correct code page (charset in Windows language). After seeing the displayed output, I assume that you are using the code page 437 which contains semi-graphics character but few non ascii ones.

As you only need Western European characters, I would advise you to use the code page 1252. It is a Windows variant of the standard Latin1 or ISO-8859-1 charset (characters with codepoint of at most 0xFF).

So if possible you should try to configure the system in a non english west eurapean language (Portugues would be fine, but French seems to be enough, so I would assume that Spanish would go too).

And you must configure the console in an correct code page: chcp 1252 .

If it is not enough (I cannot currently test anything), you could try to use wide character ( wstring , wcin , wcout ). But without changing code page from 437, the console would not display accented character.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM