简体   繁体   中英

Recognizing lithuanian letters from fstream in C++

I have gotten a task from my IT teacher to find out how many letters, figures, whitespaces and other symbols there are in the given text. The problem is that the text is written with lithuanian letters (Š, š, Ę, ę, Ų, ų, etc.) and I don't know how to recognize them in C++. To calculate the count of each type of symbol I read the text line by line with getline() function from an fstream to a string and then iterate through the string comparing each character with its literal, for example (c >= 'A' && c <= 'Z') means that it's an uppercase letter, but it doesn't work with lithuanian characters. I guess the text file is saved in Unicode format. Please help me to recognize lithuanian letters in the text.

I think you probably have to open your file binary, like (fileName, ios::in | ios::binary); and read the file byte by byte

As I understand your text stored in utf-8 encoding. If it was utf-16 or utf-32 - your getline() function would almost always return one or zero symbols and I think you would noticed this. UTF-8 described here: https://ru.wikipedia.org/wiki/UTF-8 . You can use standart library to convert utf-8 string to wstring: UTF8 to/from wide char conversion in STL . Then you can use map < wchar, int > to calculate count of different symbols.

I had to manage utf8 and ended up using utf8-cpp

For all practical utf8 related problems, I recommend reading this:

utf8 everywhere

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM