Java String.getBytes( charsetName ) vs String.getBytes ( Charset object )

Question

I need to encode a String to byte array using UTF-8 encoding. I am using Google guava, it has Charsets class already define Charset instance for UTF-8 encoding. I have 2 ways to do:

String.getBytes( charsetName )

 try { byte[] bytes = my_input.getBytes ( "UTF-8" ); } catch ( UnsupportedEncodingException ex) { }

String.getBytes( Charset object )

 // Charsets.UTF_8 is an instance of Charset byte[] bytes = my_input.getBytes ( Charsets.UTF_8 );

My question is which one I should use? They return the same result. For way 2 - I don't have to put try/catch! I take a look at the Java source code and I see that way 1 and way 2 are implemented differently.

Anyone has any ideas?

Answer 1

If you are going to use a string literal (eg "UTF-8") ... you shouldn't. Instead use the second version and supply the constant value from StandardCharsets (specifically, StandardCharsets.UTF_8 , in this case).

The first version is used when the charset is dynamic . This is going to be the case when you don't know what the charset is at compile time; it's being supplied by an end user, read from a config file or system property, etc.

Internally, both methods are calling a version of StringCoding.encode() . The first version of encode() is simply looking up the Charset by the supplied name first, and throwing an exception if that charset is unknown / not available.

Answer 2

The first API is for situations when you do not know the charset at compile time; the second one is for situations when you do. Since it appears that your code needs UTF-8 specifically, you should prefer the second API:

byte[] bytes = my_input.getBytes ( Charsets.UTF_8 ); // <<== UTF-8 is known at compile time

The first API is for situations when the charset comes from outside your program - for example, from the configuration file, from user input, as part of a client request to the server, and so on. That is why there is a checked exception thrown from it - for situations when the charset specified in the configuration or through some other means is not available.

Answer 3

Since they return the same result, you should use method 2 because it generally safer and more efficient to avoid asking the library to parse and possibly break on a user-supplied string. Also, avoiding the try-catch will make your own code cleaner as well.

The Charsets.UTF_8 can be more easily checked at compile-time, which is most likely the reason you do not need a try-catch .

Answer 4

如果您已经拥有Charset，那么请使用第二个版本，因为它不易出错。

Java String.getBytes( charsetName ) vs String.getBytes ( Charset object )

Question

4 answers

solution1
18 2014-04-26 21:49:53

solution2
10 2014-04-26 21:39:27

solution3
3 2014-04-26 21:39:02

solution4
2 2014-04-26 21:39:43

Java String.getBytes( charsetName ) vs String.getBytes ( Charset object )

Question

4 answers

solution1 18 2014-04-26 21:49:53

solution2 10 2014-04-26 21:39:27

solution3 3 2014-04-26 21:39:02

solution4 2 2014-04-26 21:39:43

solution1
18 2014-04-26 21:49:53

solution2
10 2014-04-26 21:39:27

solution3
3 2014-04-26 21:39:02

solution4
2 2014-04-26 21:39:43