Character set not being applied correctly

Question

I have a Spark Java web service that receives requests in UTF-8. When extended characters like umlauts or letters with tildes are received, they don't correctly contain the correct character when converted to a string. To debug:

1) I receive the request and display its bytes as Hex values (this contains the correct characters).

2) I then convert the received bytes to a string (specifying the charset of UTF-8).

3) Finally, I again display the string from step 2 as Hex values.

Unfortunately, the hex values from step 1 don't match the hex values from step 3. Below is the code I'm using:

    byte[] bytes = request.bodyAsBytes();


    LOGGER.debug( "1 - Body as bytes: " );
    LOGGER.debug( javax.xml.bind.DatatypeConverter.printHexBinary(bytes) );
    LOGGER.debug( "1 - End of body" );

    //  charset hard coded to UTF-8 for testing...
    String charSet = requestHeadersDto.getCharacterSet().equals( "" ) ? DEFAULT_CHAR_SET : requestHeadersDto.getCharacterSet();
    LOGGER.debug( "Charset: " + charSet );
    String xml = new String( bytes , charSet );


    LOGGER.debug( "2 - Body as bytes: " );
    LOGGER.debug( javax.xml.bind.DatatypeConverter.printHexBinary( xml.getBytes() ) );
    LOGGER.debug( "2 - End of body" );

What am I doing wrong? TIA.

Answer 1

xml.getBytes()

Should be:

xml.getBytes(charSet)

or

xml.getBytes(Charset.forName(charSet))

Character set not being applied correctly

Question

1 answers

solution1
1 ACCPTED 2018-06-27 20:23:38

Character set not being applied correctly

Question

1 answers

solution1 1 ACCPTED 2018-06-27 20:23:38

solution1
1 ACCPTED 2018-06-27 20:23:38