简体   繁体   中英

Widechar to Bytes using bits pattern?

If the number of bytes in UTF-8 encoded wide char is known, would it be possible get bytes using the following method?

For example:

Wide character ¿ code 191 to bytes gives -62 and -65

I've tried to fit the 8 bits in 191 into the slots but didn't get the same result

110[0][0][0][1][0]   10[1][1][1][1][1][1]

      127                   255

First, don't convert to signed bytes. That just confuses matters. So code point 191 yields the byte sequence 194 191

Decimal: 194                   191
Binary:  110[0][0][0][1][0]    10[1][1][1][1][1][1]

To generate these bytes, you start from the right edge. You get six bits from the 191 and two more from the 194, with an additional three bits leftover, yielding:

Binary:  00000[0][0][0]    [1][0][1][1][1][1][1][1]
Decimal: 0                 191

Wikipedia has a surprisingly good writeup on how this all works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM