简体繁体中英

Format of v in the JVM modified UTF-8

原文 2017-01-10 01:21:41 0 1 java/ utf-8/ jvm

In the JVM specification, in the description of the modified UTF-8 , it states the format of v for the "two-times-three-byte format":

This means supplementary characters are represented by six bytes, u, v, w, x, y, and z

Table 4.14. v: 1010 (bits 20-16)-1

Since v is 8 bits, it means that (bits 20-16)-1 has to be 4 bits. How can the -1 shrink bits 20-26 from 5 to 4 bits?

(Supplementary question: is there any reason to say "two-times-three-byte" rather than "six-byte"?)

1 answers

Unicode code points are ranged from U+0000 to U+10FFFF .

Values greater than U+FFFF are called supplementary code points . Their binary representation is uuuuuxxxxxxxxxxxxxxxx (21 bits), where uuuuu is between 00001 and 10000 .

In UTF-16 supplementary code points are encoded by surrogate pairs as described in 3.9 Unicode Encoding Forms, D91 . That is, uuuuuxxxxxxxxxxxxxxxx is represented by two 16-bit characters:
110110wwwwxxxxxx 110111xxxxxxxxxx , where wwww = uuuuu - 1 .

00001 ≤ uuuuu ≤ 10000 , therefore, 0000 ≤ wwww ≤ 1111

Now, modified UTF-8 encodes supplementary code points as if they were two characters: high surrogate and low surrogate. Each of these surrogate characters is represented by 3 bytes in UTF-8. Hence 'two-times-three' figure.

How to convert the "Java modified UTF-8" to the regular UTF-8 and back?

Java UTF-8 filenames with IBM JVM (AIX)

Encoding String to “modified UTF-8” for the DataInput

Java modified UTF-8 strings in Python

Jena result in UTF-8 format

How to better setting up JVM encoding properties to UTF-8

jsp not passing UTF-8 data in proper format

how to implement UTF-8 format in Swing application?

Read a CSV file in UTF-8 format

Should source code be saved in UTF-8 format

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to convert the "Java modified UTF-8" to the regular UTF-8 and back? Java UTF-8 filenames with IBM JVM (AIX) Encoding String to “modified UTF-8” for the DataInput Java modified UTF-8 strings in Python Jena result in UTF-8 format How to better setting up JVM encoding properties to UTF-8 jsp not passing UTF-8 data in proper format how to implement UTF-8 format in Swing application? Read a CSV file in UTF-8 format Should source code be saved in UTF-8 format

Related Tags

Format of v in the JVM modified UTF-8

Question

1 answers

solution1 1 ACCPTED 2017-01-11 01:07:15

solution1
1 ACCPTED 2017-01-11 01:07:15