简体   繁体   English

打印Unicode字符

[英]Print unicode char

I tried a very simple code in C++: 我在C ++中尝试了一个非常简单的代码:

#include <iostream>
#include <string>

int main()
{
  std::wstring test = L"asdfa-";
  test += u'ç';
  std::wcout << test;
}

But the result was: 但是结果是:

asdfa-?

It was not possible print 'ç', with cout or wcout, how can I can print this string correctally? 无法使用cout或wcout打印'ç',如何正确打印此字符串?

OS: Linux. 操作系统:Linux。

Ps: I use wstring instead of string , because sometimes I need calculate the length of the string, and this size must be the same of what is on the screen. Ps:我使用wstring代替string ,因为有时我需要计算字符串的长度,并且此大小必须与屏幕上的大小相同。

Ps: I need concatenate the unicode char, it can't be on the string constructor. 附:我需要连接unicode char,它不能放在字符串构造函数上。

First, here's something that does work: 首先,这可行的:

#include <iostream>
#include <string>

int main() {
    std::string test = "asdfa-";
    test += "ç";
    std::cout << test;
}

I used just regular strings here and let C++ keep everything in UTF-8. 我在这里只使用常规字符串,让C ++将所有内容保留在UTF-8中。 I think you already know that this would work because you mentioned that you wanted to concatenate the ç rather than just leaving it in the string constructor. 我想您已经知道这会起作用,因为您提到要连接ç而不是仅将其保留在字符串构造函数中。

Dealing with char , char16_t , char32_t , and wchar_t in C++ has never really been fun. 在C ++中处理charchar16_tchar32_twchar_t从未如此有趣。 You have to be careful with the L , u , and U prefixes. 您必须注意LuU前缀。

However, where possible, if you deal with utf-8 strings, and avoid characters, you can generally get things to work much better. 但是,如果可能的话,如果您处理utf-8字符串并避免使用字符,则通常可以使事情更好地工作。 And since most consoles (with the possible exception of old Windows machines) understand utf-8 pretty well, this is the approach that often just works the best. 而且由于大多数控制台(可能是旧的Windows计算机除外)都非常了解utf-8,因此这通常是效果最好的方法。 So if you have wide characters, see if you can convert them to regular std::string objects and work in that domain. 因此,如果您有宽字符,请查看是否可以将它们转换为常规std::string对象并在该域中工作。

One general way of handling this would be: 一种通用的处理方法是:

  1. Input (convert from multibyte to wide using current locale) 输入(使用当前语言环境从多字节转换为宽字节)

  2. Your App: work with wide strings 您的应用:使用宽字符串

  3. Output or saving to a file (convert from wide to multibyte) 输出或保存到文件(从宽字节转换为多字节)

For wide string manipulations like num of characters, substring etc. there is wcsXXX class of functions. 对于宽字符串操作(如字符数,子字符串等),有wcsXXX类的函数。

If you are using libstdc++ on Linux: you forgot an essential call at the beginning of the program 如果您在Linux上使用libstdc++ :您在程序开始时忘记了基本调用

std::locale::global(std::locale(""));

This is assuming you are on Linux and your locale supports UTF-8. 这是假设您在Linux上并且您的语言环境支持UTF-8。

If you are using libc++ : forget about using wstream s. 如果您使用的是libc++ :忘记使用wstream This library does not support I/O of wide characters in a useful way (ie translation to UTF-8 like libstdc++ does). 该库不以有用的方式支持宽字符的I / O(即,像libstdc++一样转换为UTF-8)。

Windows has a wholly separate set of quirks regarding Unicode. Windows有一套完全独立的有关Unicode的怪癖。 You are lucky if you don't have to deal with them. 如果您不必与他们打交道,那么您会很幸运。

demo with gcc/libstdc++ and a call to std::locale 使用gcc / libstdc ++和对std :: locale的调用进行演示

demo with gcc/libstdc++ and no call to std::locale 使用gcc / libstdc ++进行演示,且未调用std :: locale

Different versions of clang/libc++ behave differently with this example: some output ? 在此示例中,不同版本的clang / libc ++的行为有所不同:一些输出? instead of the non-ascii char, some output nothing; 代替非ASCII字符,有些什么都不输出; some crash on call to std::locale, some don't. 调用std :: locale时有些崩溃,有些则没有。 None do the right thing, which is printing the ç , or maybe I just haven't found one that works. 没有人做正确的事,那就是打印ç ,或者也许我只是没有找到一个可行的方法。 I don't recommend using libc++ if you need anything related to locale or wchar_t. 如果您需要与语言环境或wchar_t相关的任何内容,我建议您不要使用libc ++。

I solved this problem using a conversion function: 我使用转换函数解决了这个问题:

#include <iostream>
#include <string>
#include <codecvt>
#include <locale>

std::string wstr2str(const std::wstring& wstr) {
  std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
  return myconv.to_bytes(wstr);
}

int main()
{
  std::wstring test = L"asdfa-";
  test += L'ç';
  std::string str = wstr2str(test)
  std::cout << str;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM