简体   繁体   中英

Bit shift and convert character to unicode escape string

I found a java class that convert byte or char to hexadecimal value. But I cannot understand the code clearly. Can you explain what the code do or where I can find more resources about this?

public class UnicodeFormatter {

    static public String byteToHex(byte b) {
        // Returns hex String representation of byte b
        char hexDigit[] = {
            '0', '1', '2', '3', '4', '5', '6', '7',
            '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
        };
        char[] array = {hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f]};
        return new String(array);
    }

    static public String charToHex(char c) {
        // Returns hex String representation of char c
        byte hi = (byte) (c >>> 8);
        byte lo = (byte) (c & 0xff);
        return byteToHex(hi) + byteToHex(lo);
    }
} // class

First, let us start with some definitions:

  • A char , in Java, occupies 2 bytes ;
  • Each byte is composed by 8 bits ;
  • Each hexadecimal digit represents 4 binary digits or bits ;

Therefore, a byte can be represented by 2 hexadecimal digits, ie, two groups of 4 bits. This is exactly what is being done in the byteToHex method: it firsts splits the byte in two groups of 4 bits, and then maps each into an hexadecimal symbol, using the hexDigit array. Since the decimal value of each group of 4 bits can never be greater or equal than 16 ( 2^4 ), each group will always have a mapping in the hexDigits array.

For example, suppose you want to convert the number 29 to hexadecimal:

  1. 29 is represented in binary as 00011101 ;
  2. Splitting 00011101 in two groups of 4 bits yields 0001 and 1101 ;
  3. Programatically, the first group, 0001 can be obtained by shifting away the least significant 4 bits ( 1101 ) from the binary representation of 29 . Then, 0001 would become the first 4 bits. This is accomplished in Java with ( b >> 4 );
  4. The second group, is obtained by b & 0x0f , which is equivalent to 00011101 & 00001111 = 00001101 = 1101 . By bit- AND ing the binary number with 0x0f you are clearing (setting to 0) everything except the least significant 4 bits.
  5. Finally, each group is converted to a decimal number, yielding 1 ( 0001 ) and 13 ( 1101 ), which are then mapped to 1 and D respectively, in the hexadecimal system.
  6. The number 29 is therefore represented by 1D in hexadecimal.

A similar logic can be applied to the method charToHex . The only difference is instead of converting a single byte, you are converting 2, since a char is 2 bytes.

Basically what it is doing here is the same as turning 23 into a string by changing it to 2*10+3, then turning 2 and 3 into characters.

To break it down, we first divide by 16, since we're working in hex.

b >> 4 means shift the bits 4 spaces, so

12345678 >> 4 = 00001234  

then the value in positions 1234 get looked up in the hexDigit array.

Then we do a modulus operation, also known as getting the remainder. In the decimal example, this is finding the 3 by chopping off everything to the left. For binary, they are using AND here.

0x0f in bits is 00001111, so when ANDed with a byte, it will change the left 4 spaces into 0s, leaving only the right 4.

12345678 & 0x0f = 00005678

and again we look up the value in positions 5678 in the hexDigit array. Note that I'm using 1-8 as position markers, the actual data will be all 0s and 1s.

Edit: The second function does basically the same operation, it uses the same >>> and & functions to split the unicode char into bytes. It appears to be assuming the unicode character is 16 bits, so it shifts it 8 places to get the left 8 bits, and uses & 0xff to get the right 8 bits.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM