简体   繁体   中英

Character set not being applied correctly

I have a Spark Java web service that receives requests in UTF-8. When extended characters like umlauts or letters with tildes are received, they don't correctly contain the correct character when converted to a string. To debug:

1) I receive the request and display its bytes as Hex values (this contains the correct characters).

2) I then convert the received bytes to a string (specifying the charset of UTF-8).

3) Finally, I again display the string from step 2 as Hex values.

Unfortunately, the hex values from step 1 don't match the hex values from step 3. Below is the code I'm using:

    byte[] bytes = request.bodyAsBytes();


    LOGGER.debug( "1 - Body as bytes: " );
    LOGGER.debug( javax.xml.bind.DatatypeConverter.printHexBinary(bytes) );
    LOGGER.debug( "1 - End of body" );

    //  charset hard coded to UTF-8 for testing...
    String charSet = requestHeadersDto.getCharacterSet().equals( "" ) ? DEFAULT_CHAR_SET : requestHeadersDto.getCharacterSet();
    LOGGER.debug( "Charset: " + charSet );
    String xml = new String( bytes , charSet );


    LOGGER.debug( "2 - Body as bytes: " );
    LOGGER.debug( javax.xml.bind.DatatypeConverter.printHexBinary( xml.getBytes() ) );
    LOGGER.debug( "2 - End of body" );

What am I doing wrong? TIA.

xml.getBytes()

Should be:

xml.getBytes(charSet)

or

xml.getBytes(Charset.forName(charSet))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM