简体   繁体   English

在C ++中从fstream识别立陶宛字母

[英]Recognizing lithuanian letters from fstream in C++

I have gotten a task from my IT teacher to find out how many letters, figures, whitespaces and other symbols there are in the given text. 我的IT老师有一项任务,要找出给定文本中有多少个字母,数字,空格和其他符号。 The problem is that the text is written with lithuanian letters (Š, š, Ę, ę, Ų, ų, etc.) and I don't know how to recognize them in C++. 问题在于文本是用立陶宛字母(Š,š,Ę,ę,Ų,ų等)编写的,我不知道如何在C ++中识别它们。 To calculate the count of each type of symbol I read the text line by line with getline() function from an fstream to a string and then iterate through the string comparing each character with its literal, for example (c >= 'A' && c <= 'Z') means that it's an uppercase letter, but it doesn't work with lithuanian characters. 为了计算每种类型的符号的计数,我使用getline()函数逐行从fstream读取文本到string ,然后遍历字符串,比较每个字符与其文字,例如(c >= 'A' && c <= 'Z')表示它是一个大写字母,但不适用于立陶宛语字符。 I guess the text file is saved in Unicode format. 我猜该文本文件以Unicode格式保存。 Please help me to recognize lithuanian letters in the text. 请帮助我识别文本中的立陶宛字母。

I think you probably have to open your file binary, like (fileName, ios::in | ios::binary); 我认为您可能必须打开二进制文件,例如(fileName,ios :: in | ios :: binary); and read the file byte by byte 并逐字节读取文件

As I understand your text stored in utf-8 encoding. 据我了解,您的文本以utf-8编码存储。 If it was utf-16 or utf-32 - your getline() function would almost always return one or zero symbols and I think you would noticed this. 如果它是utf-16或utf-32-您的getline()函数几乎总是返回一个或零个符号,我想您会注意到这一点。 UTF-8 described here: https://ru.wikipedia.org/wiki/UTF-8 . 这里描述的UTF-8: https : //ru.wikipedia.org/wiki/UTF-8 You can use standart library to convert utf-8 string to wstring: UTF8 to/from wide char conversion in STL . 您可以使用standart库将utf-8字符串转换为wstring: UTF8到STL中的宽字符转换 Then you can use map < wchar, int > to calculate count of different symbols. 然后,您可以使用map <wchar,int>来计算不同符号的计数。

I had to manage utf8 and ended up using utf8-cpp 我必须管理utf8并最终使用utf8-cpp

For all practical utf8 related problems, I recommend reading this: 对于所有与utf8相关的实际问题,我建议阅读以下内容:

utf8 everywhere 随处可见utf8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM