简体   繁体   中英

Maximum string length for given number of bytes

I need to validate maximum length of a String value that is going to be stored as VARCHAR2(4000 bytes) column in database. What maximum length should I take? I assumed 2000, because Java String is encoded in UTF-16, but am I missing something? Is there any case that 2000-charactr string can take more than 4000 bytes?

No, a 2000 character String in Java can't take more than 4000 bytes of character data. You may occasionally hear it said that UTF-16 is a variable length encoding because it can take 2 or 4 bytes to represent a Unicode code point. While this is true, it is irrelevant because Java's "character" is not a Unicode code point, but a UTF-16 code unit, which is always 2 bytes. Therefore, a 2000-character String in Java is exactly 4000 bytes of UTF-16 data.

Tangential warning: based on your use of VARCHAR2, it seems to me that you are using an Oracle database. Oracle has two main character set settings, the database character set and national character set. The first is used by VARCHAR2 columns (among others), and the other is used by NVARCHAR2 columns (again, among others). UTF-16 is not supported for use as the database character set, but is for the national character set. I don't know what your data layer looks like so I can't say how this will affect you, but you can read this Oracle document on character sets for more information.

Taken altogether, a 2000-character string in Java can end up being more than 4000 bytes elsewhere – if somewhere along the way, it gets converted to a different encoding.

In UTF-16, according to what I've read online, Java can represent characters with either one or two 16-bit values. The best way to check is use a sample string in what you might encode and print out the length and then use this as a reference for your application development.

Here is sample code you can use to test this out:

String s = "Hello, world!"; 
int byteCountUTF16 = s.getBytes("UTF-16").length;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM