(C++). Non-unicode language (Simplified Chinese) input/output

Question

I am new to programming and was working on some examples in my C++ textbook. I was able to do most of the examples, but a few problems came up when I tried to do the following: Attempting to display Chinese characters on a program similar to "Hello World!"

For the question, regarding input/output of non-unicode characters such as Simplified Chinese, I would like to offer some information as to what I have attempted so far:
I was running the "Hello world!" program on Code:Blocks using C++ and attempted to replace the text "Hello world" with the Chinese characters "你好". I ran the program, but in the command prompt the output was just gibberish （乱码）. So, I searched online for information and found out that I had to change my regional setting to "Simplified, China". I did this, rebooted my computer and ran the program again. This time, the program's output was in non-unicode characters, however, they were the incorrect characters （These: 浣犲ソ锛） and I also believe it to be Japanese as well... Some resources in Chinese on the internet stated it to be the coding for "你好", but I'm not too sure. I just want the text I write behind (std::cout << "---\\n";) to display correctly like it would when I was using English. How would I get it to where it will display what I write in Code:Block on the Command Prompt?

Lastly, there was a prompt that popped up stating that the encoding was changed because I used illegal characters...

Answer 1

Having tried the following:

#include <iostream>

int main()
{
        std::cout << "你好" << std::endl;
        return 0;
}

I got the output:

你好

Which to me appears to be the same characters (i humbly apologise if i do not see the difference that you do). This makes me think that the problem is in the mismatch of the character-to-byte conversion when saving the file and/or compiling on one hand and the display byte-to-character conversion during the execution.

My correct output was on XUbuntu using g++ 4.8.4. The cpp file was saved with vim, and it looks like this:

 00000000:  23 69 6e 63 6c 75 64 65  20 3c 69 6f 73 74 72 65  #include <iostre
 00000010:  61 6d 3e 0a 0a 69 6e 74  20 6d 61 69 6e 28 29 0a  am>..int main().
 00000020:  7b 0a 09 73 74 64 3a 3a  63 6f 75 74 20 3c 3c 20  {..std::cout << 
 00000030:  22 e4 bd a0 e5 a5 bd 22  20 3c 3c 20 73 74 64 3a  "......" << std:
 00000040:  3a 65 6e 64 6c 3b 0a 09  72 65 74 75 72 6e 20 30  :endl;..return 0
 00000050:  3b 0a 7d 0a -- -- -- --  -- -- -- -- -- -- -- --  ;.}.------------

As you can see each character gets saved as a sequence of 3 bytes of UTF-8 (coding bits in bold):

你 — 1110 0100 10 111101 10 100000 — character 77664
好 — 1110 0101 10 100101 10 111101 — character 22909

Since at one time you got 4 characters of text, i believe that somehow these bytes actually get compiled as UTF-8 just fine, but then are read as something else. If they are read as UTF-16, that would attempt to generate 3 characters (2 bytes per character), but it is not a likely scenario, since the standard is created in such a way as to avoid such confusion, and also because you actually got 4 characters, and it's impossible for UTF-16 to use less han 2 bytes to generate a character.

At this point i must say that i do not have enough information to try to help you further. Please consider providing the exact code that you are trying to compile, and if possible a hexadecimal representation of it as well.

(C++). Non-unicode language (Simplified Chinese) input/output

Question

1 answers

solution1
1 2015-09-08 10:30:09

(C++). Non-unicode language (Simplified Chinese) input/output

Question

1 answers

solution1 1 2015-09-08 10:30:09

solution1
1 2015-09-08 10:30:09