简体   繁体   中英

Why are these Unicode characters not printed although I set my environment to UTF8?

How to print some Unicode characters? Although I set UTF-8 encoding, I'm not able to print it. I am getting Unicode characters in hex as (c2 82 c2 81 c2 80 0) in a continuous stream of bytes. But when I tried to print its Unicode character by pointing a character pointer at beginning, it's not printing. Why?

   char s[]={0xc2,0x82,0xc2,0x81,0xc2,0x80,0x00};
   printf("%s",s);

Using C in a Linux environment.

You won't see much even if your terminal is configured to work with UTF-8 because the characters you are 'displaying' are:

0xC2 0x82 = U+0082
0xC2 0x81 = U+0081
0xC2 0x80 = U+0080

These are control-characters from the C1 set. I have a data file which documents:

# C1 Controls (0x80 - 0x9F) are from ISO/IEC 6429:1992
# It does not define names for 80, 81, or 99.

80 U+0080
81 U+0081
82 U+0082 BPH BREAK PERMITTED HERE

So you don't see anything because you aren't displaying any graphic characters. If you change your 0x82 to 0xA2, for example (and 0x81 to 0xA1, and 0x80 to 0xA0), then you'll be more likely to get some visible output:

0xC2 0xA2 = U+00A2
0xC2 0xA1 = U+00A1
0xC2 0xA0 = U+00A0

A0 U+00A0 NO-BREAK SPACE
A1 U+00A1 INVERTED EXCLAMATION MARK
A2 U+00A2 CENT SIGN

$ ./x
¢¡ 
$

And if you're really good, you can see the no-break space after the inverted exclamation mark¡

0xc282c281c280 is a single integer. You want to initialise the array with a sequence: char s[] = { 0xc2, 0x82, 0xc2, 0x81, 0xc2, 0x80, 0x00 };

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM