简体   繁体   中英

Linux vs. Windows: How does the console render unicode characters?

This is quite a low-level (low in the sense of "closer to the metal") question.

I was wondering if any of you could point me to documentation, explanations, etc. of how, upon receiving a Unicode character (or any character code, but I'm particularly interested in the Unicode Standard) the console in Windows, good ol' cmd.exe (using, say, codepage 65001) and xterm in Linux started with, say, LC_CTYPE=en_US.UTF-8 look up the corresponding glyph (and where).

I know it may be harder to know in Windows, but I can't really find much information.

Thank you.

As far as I can tell, cmd.exe is bound to whatever 256-character code page you defined as the "codepage for non-Unicode programs" or whatever it was called.

To elaborate, if I set the above setting to Japanese, cmd.exe suddenly replaces backslashes with yen signs (as does every other non-Unicode app on the system) and correctly interprets ShiftJIS codes, for example. Setting it to Dutch gives me an accented I (I forgot which), while another codepage would give a half-filled vertical solid instead on the same character.

Not Unicode. Unicode would let me do all three at the same time.

The console uses a TextWriter with an encoding created from the codepage. That means that the characters written are encoded into bytes using the specific Encoding object for the codepage.

the console doesn't support Unicode. :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM