简体   繁体   English

unicode hello world for C?

[英]unicode hello world for C?

I am trying to output things like 안, 蠀, ☃ from C 我想从C输出像안,蠀,things这样的东西

#include <wchar.h>
int main()
{
    fwprintf(stdout, L"안, 蠀, ☃\n");
    return 0;
}

output is ?, ?, ? 输出是?,?,?

How do I print those characters? 如何打印这些字符?

Edit: 编辑:

#include <wchar.h>
#include <locale.h>
int main()
{
    setlocale(LC_CTYPE, "");
    fwprintf(stdout, L"안, 蠀, ☃\n");
    return 0;
}

this did the trick. 这样做了。 output is 안, 蠀, ☃ . 输出是안,蠀,☃。 except that the chinese character and snowman appears as box in my urxvt probably because I did not enable those locales. 除了中文字符和雪人在我的urxvt中显示为框,可能是因为我没有启用这些区域设置。

$ locale -a
C
en_US
en_US.iso88591
en_US.iso885915
en_US.utf8
ja_JP.utf8
ko_KR
ko_KR.euckr
ko_KR.utf8
korean
korean.euc
POSIX
zh_CN.utf8

which locale do I have to enable additionally so that it'll display chinese character and snowman? 我必须另外启用哪个区域设置才能显示中文字符和雪人? maybe do I need font? 也许我需要字体?

will the above program work on Windows? 以上程序将在Windows上运行吗?

You have to set your output terminal as Unicode compatible. 您必须将输出终端设置为Unicode兼容。

On Linux (with Bash shell), try: 在Linux(使用Bash shell)上,尝试:

$ LANG=en.UTF-8

and also make sure that your terminal emulator can actually display Unicode and is configured to do so. 并确保您的终端仿真器实际上可以显示Unicode并配置为执行此操作。

There are many individual stages in the process of getting Unicode output - all of which must be correctly configured. 获取Unicode输出的过程中有许多单独的阶段 - 所有这些阶段都必须正确配置。

First, are you compiling with unicode support enabled? 首先,您是否在启用unicode支持的情况下进行编译? you will need to do so under Windows (-D UNICODE -D __UNICODE). 你需要在Windows下执行此操作(-D UNICODE -D __UNICODE)。

Second, are you emitting to a command line which supports unicode, both in principle but also having a font containing the glyphs of the characters you are emitting? 第二,你是否发出了一个支持unicode的命令行,原则上它还包含一个包含你正在发出的字符的字形的字体?

Third, do the unicode encodings used by your compiler and your command line match? 第三,编译器使用的unicode编码与命令行匹配吗? it's no use having UCS2 in your binary when your command line expected UTF8. 当命令行期望UTF8时,在二进制文件中使用UCS2是没有用的。

You basically need to really understand Unicode and its encodings, to get this right. 你基本上需要真正理解Unicode及其编码,才能做到这一点。 Don't imagine it's straightforward or you don't need to learn all the underlying concepts; 不要想象它是直截了当的,或者你不需要学习所有基本概念; this stuff doesn't work by accident because there are too many things which have to be exactly correct. 这些东西不会偶然起作用,因为有太多东西必须完全正确。

The C wchar_t is defined as: C wchar_t定义为:

Type wchar_t is a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales (22.1.1). 类型wchar_t是一种不同的类型,其值可以表示支持的语言环境(22.1.1)中指定的最大扩展字符集的所有成员的不同代码。 [...] [...]

The difference between multibyte characters and wchar_t : 多字节字符和wchar_t之间的区别:

multibyte characters may require more than one byte for a given character depending on the encoding (eg: UTF-8, UTF-16) 对于给定字符,多字节字符可能需要多个字节,具体取决于编码(例如:UTF-8,UTF-16)

whereas

wchar_t has a fixed size ie sizeof(wchar_t) which is implementation defined. wchar_t具有固定大小,即sizeof(wchar_t),它是实现定义的。 Note, that this width defines what encoding(s) your wchar_t can support. 请注意,此宽度定义了wchar_t可以支持的编码。 So, if sizeof(wchar_t) == 2 there's no way you'd be able to use UTF-32 encoding. 因此,如果sizeof(wchar_t) == 2则无法使用UTF-32编码。

Also remember that wchar_t does not have a sense of encoding by itself. 还要记住, wchar_t本身没有编码感。 You'd first have to tell the compiler what sort of encoding it has to use for wchar_t data. 您首先必须告诉编译器它必须为wchar_t数据使用哪种编码。 The erroneous output is most probably because the characters are being treated in the default encoding which can't support those characters properly and a failed match leads to a 'notdef' style '?' 错误的输出很可能是因为正在使用默认编码处理字符,这些字符不能正确支持这些字符,失败的匹配会导致'notdef'样式'?' output. 输出。

You have to configure your system to accept those characters. 您必须将系统配置为接受这些字符。 What are you using? 你在用什么? Windows, Linux? Windows,Linux?

Just as Alnitak suggested, one has to specify a locale with a character set/encoding that includes the characters you want to show. 正如Alnitak建议的那样,必须指定一个包含要显示的字符的字符集/编码的语言环境。 (Unicode/)UTF-8 should cover all Unicode characters. (Unicode /)UTF-8应涵盖所有Unicode字符。

Your terminal should use a font that has respective glyphs. 您的终端应使用具有各自字形的字体。

Windows' CMD.EXE is notoriously weak when it comes to character sets beyond 8 bits. Windows'CMD.EXE在超过8位的字符集方面是出了名的弱。 Perhaps, you'd need a GUI pane instead of relying on stdout. 也许,您需要一个GUI窗格而不是依赖于stdout。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM