简体繁体中英

UTF-8 on Windows with Ada

原文 2018-02-16 15:27:32 7 2 windows/ character-encoding/ ada

It is my understanding that by default, Character is Latin_1, Wide_Character is UCS-2, and Wide_Wide_Character is UCS-4, but that GNAT can have specified pragma Wide_Character_Encoding(UTF8); or -gnatW8 and that those characters and their strings will be UTF-8 encoded instead.

At least on Linux and FreeBSD, the results fit with my expectations. But on Windows the results are odd.

For either Wide or Wide_Wide variants, once a character moves beyond the ASCII set, I get a garbled mess. I beleive this is called emojibake by some. So I figured it was a codepage issue. After all, the default codepage in Windows, and therefore what the Console Host would load with, is 437 which isn't the UTF-8 codepage. chcp 65001 and now instead of the mess of extra characters, there's an immediate exception raised ADA.IO_EXCEPTIONS.DEVICE_ERROR : a-ztexio.adb:1295 . Looking at where the exception occurred, it seems to be in the putc binding of fputc() . But this is Standard_Output, shouldn't an EOF never happen?

Is there some kind of special consideration Windows needs? How can I get UTF-8 output?

edit :
I tried piping the output into a text file. The supposed UTF-8 encoded program still generates emojibake in the file. Not sure why this would immediately throw an exception in the console though.

So then I tried directly opening and writing to a file instead of the console/pipe. Oddly this works exactly as it should. The text is completely correct.

I've never seen this kind of behavior with any other language, so it should still be possible to get proper UTF-8 at the console, right?

2 answers

The deficiency so many others, not just here, describe in the Windows Console Host has either been fixed or never existed in the first place. Based on this document , I feel it was probably always very misunderstood. Windows doesn't treat the console like files, and it's easy to fall into that trap.

Using this very straight forward code, along with what Windows needs and expects behind the scenes...

It correctly produces the following, as long as either pragma Wide_Character_Encoding(UTF8); or -gnatW8 is used.

Piping the output of this test program into a file works as it should. Similarly, piping the output of this test program into another program works as it should. And also similarly, taking the file from piped output, and piping it into another program works as it should.

Full UTF-8 behavior as one would expect under Linux, on Windows.

What needs to be done is twofold. In the package initializer, the Console Host needs to be told what it's working with, which can be done like this.

Character output is then done through fputwc . According to MS Docs fputc should never be used for UNICODE on Windows, which is part of the problem GNAT has. String output and character/string input is all similar.

Based on others comments and some further research to confirm, I'm pretty sure this is a deficiency of the Windows Console Host.

edit : don't listen to this

UTF-8 in Windows

UNICODE, UTF-8 and Windows mess

Java, UTF-8, and Windows console

How to display utf-8 in windows console

PHP UTF-8 to Windows command line encoding

Encoding difference UTF-8 Android 4.2.2 <=> Windows 7

Editing UTF-8 text file on Windows

How to transcode Windows-1251 to UTF-8?

Windows files and encoding UTF-8 with PHP

Windows 10 CLI UTF-8 encoding

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question UTF-8 in Windows UNICODE, UTF-8 and Windows mess Java, UTF-8, and Windows console How to display utf-8 in windows console PHP UTF-8 to Windows command line encoding Encoding difference UTF-8 Android 4.2.2 <=> Windows 7 Editing UTF-8 text file on Windows How to transcode Windows-1251 to UTF-8? Windows files and encoding UTF-8 with PHP Windows 10 CLI UTF-8 encoding

Related Tags

UTF-8 on Windows with Ada

Question

2 answers

solution1
1 ACCPTED 2018-02-20 07:16:39

solution2
0 2018-02-16 16:16:32

UTF-8 on Windows with Ada

Question

2 answers

solution1 1 ACCPTED 2018-02-20 07:16:39

solution2 0 2018-02-16 16:16:32

solution1
1 ACCPTED 2018-02-20 07:16:39

solution2
0 2018-02-16 16:16:32