简体   繁体   English

std :: string本地编码为UTF-8但char不能保存utf字符?

[英]std::string is natively encoded in UTF-8 but char can not hold utf characters?

After reading std::wstring VS std::string , I was under the impression that for Linux , I don't need to worry about using any wide character facilities of the language. 在阅读std :: wstring VS std :: string后 ,我的印象是,对于Linux ,我不需要担心使用该语言的任何宽字符设施。
*things like: std::wifstream , std::wofstream , std::wstring , whar_t , etc. *类似于: std :: wifstreamstd :: wofstreamstd :: wstringwhar_t等。

This seems to go fine when I'm using only std::strings for the non-ascii characters, but not when I'm using chars to handle them. 当我只使用std :: strings作为非ascii字符时,这似乎很好,但是当我使用字符来处理它们时却没有。

For example: I have a file with just a unicode checkmark in it. 例如:我有一个只带有unicode复选标记的文件。
I can read it in, print it to the terminal, and output it to a file. 我可以读取它,将其打印到终端,然后将其输出到文件中。

// ✓ reads in unicode to string
// ✓ outputs unicode to terminal
// ✓ outputs unicode back to the file
#include <iostream>
#include <string>
#include <fstream>

int main(){
  std::ifstream in("in.txt");
  std::ofstream out("out.txt");

  std::string checkmark;
  std::getline(in,checkmark); //size of string is actually 3 even though it just has 1 unicode character

  std::cout << checkmark << std::endl;
  out << checkmark;

}

The same program does not work however, if I use a char in place of the std::string: 同样的程序工作,但是,如果我在的地方的std :: string的使用字符:

// ✕ only partially reads in unicode to char
// ✕ does not output unicode to terminal
// ✕ does not output unicode back to the file
#include <iostream>
#include <string>
#include <fstream>

int main(){
  std::ifstream in("in.txt");
  std::ofstream out("out.txt");

  char checkmark;
  checkmark = in.get();

  std::cout << checkmark << std::endl;
  out << checkmark;

}

nothing appears in the terminal(apart from a newline). 终端中没有任何内容(除了换行符)。
The output file contains â instead of the checkmark character. 输出文件包含â而不是复选标记字符。

Since a char is only one byte, I could try to use a whar_t, but it still does not work: 由于char只有一个字节,我可以尝试使用whar_t,但它仍然不起作用:

// ✕ only partially reads in unicode to char
// ✕ does not output unicode to terminal
// ✕ does not output unicode back to the file
#include <iostream>
#include <string>
#include <fstream>

    int main(){
      std::wifstream in("in.txt");
      std::wofstream out("out.txt");

      wchar_t checkmark;
      checkmark = in.get();

      std::wcout << checkmark << std::endl;
      out << checkmark;

    }

I've also read about setting the following locale, but it does not appear to make a difference. 我还阅读了有关设置以下语言环境的信息,但它似乎没有什么区别。

setlocale(LC_ALL, "");

In the std::string case you read one line, which in our case contains a multi-byte Unicode character. 在std :: string的情况下,你读了一行,在我们的例子中包含一个多字节的Unicode字符。 In the char case you read a single byte, which is not even a single complete character. 在char情况下,您读取一个字节,甚至不是一个完整的字符。

Edit: for UTF-8 you should read into an array of char. 编辑:对于UTF-8,您应该读入一个char数组。 Or just std::string since that already works. 或者只是std :: string,因为那已经有效了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM