I've faced an issue and couldn't find an answer on the internet. Even though I found many similar questions, none of the answers worked for me. I'm using Visual Studio 2015 on Windows 10.
So part of my code is:
wstring books[50];
wstring authors[50];
wstring genres[50];
wstring takenBy[50];
wstring additional;
bool taken[50];
_setmode(_fileno(stdout), _O_U8TEXT);
wifstream fd("bookList.txt");
i = 0;
while (!fd.eof())
{
getline(fd, books[i]);
getline(fd, authors[i]);
getline(fd, genres[i]);
getline(fd, takenBy[i]);
fd >> taken[i];
getline(fd, additional);
i++;
}
What I need, is to read a text file encoded in UTF-8 with C++. But, when I read the file, those wide strings are changed and when I print them, the output text is absolutely different.
Input:
ąčę
Output:
ÄÄÄ
How do I avoid it and read the text correctly?
UTF-8 is (probably) not in wide strings. Read about UTF-8 everywhere . UTF-8 use 8 bits bytes (sometimes several of them) to encode Unicode characters. So in C++ an unicode character is parsed from a sequence of 1 to 6 bytes (ie char
-s).
You need some UTF-8 parser and the C11 or C++11 standards don't provide any. So you need some external library. Look into libunistring (which is a simple C UTF-8 parsing library) or something else ( Qt , POCO , Glib , ICU , ...). You could decide to parse and convert UTF-8 into wide UTF-32 (using u32string
-s and char32_t
) and backwards, or you'll better decide to work internally in UTF-8 (using std::string
and char
)
Hence you'll parse and print sequences of char
-s (using UTF-8 encoding) and your program would use plain std::string
-s and plain char
-s (not std::wstring
or wchar_t
) but process UTF-8 sequences ...
This is easy with Boost.Spirit :
#define BOOST_SPIRIT_UNICODE
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
using namespace boost::spirit;
int main()
{
std::string in("ąčę");
std::string out;
qi::parse(in.begin(), in.end(), +unicode::char_, out);
std::cout << out << std::endl;
}
The following example reads a sequence of tuples (book, authors, takenBy):
#define BOOST_SPIRIT_UNICODE
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_tuple.hpp>
#include <iostream>
#include <string>
#include <tuple>
#include <vector>
using namespace boost::spirit;
int main()
{
std::string in("Book_1\nAuthors_1\nTakenBy_1\n"\
"Book ąčę\nAuthors_2\nTakenBy_2\n");
std::vector<
std::tuple<
std::string, /* book */
std::string, /* authors */
std::string /* takenBy */
>
> out;
auto ok = qi::parse(in.begin(), in.end(),
*(
+(unicode::char_ - qi::eol) >> qi::eol /* book */
>> +(unicode::char_ - qi::eol) >> qi::eol /* authors */
>> +(unicode::char_ - qi::eol) >> qi::eol /* takenBy */
),
out);
if(ok)
{
for(auto& entry : out)
{
std::string book, authors, takenBy;
std::tie(book, authors, takenBy) = entry;
std::cout << "book: " << book << std::endl
<< "authors: " << authors << std::endl
<< "takenBy: " << takenBy << std::endl;
}
}
}
It's only a demo using std::tuple
and an unnamed parser, which is the third parameter of qi::parse
. You can use a struct instead of the tuple to represent books, authors, genres and etc. The unnamed parser may be replaced by a grammar and you can read the content of the file into a string to be passed to qi::parse
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.