简体   繁体   English

给定字节数的最大字符串长度

[英]Maximum string length for given number of bytes

I need to validate maximum length of a String value that is going to be stored as VARCHAR2(4000 bytes) column in database. 我需要验证将作为VARCHAR2(4000 bytes)列存储在数据库中的String值的最大长度。 What maximum length should I take? 我应该选择什么最大长度? I assumed 2000, because Java String is encoded in UTF-16, but am I missing something? 我假设使用2000,因为Java String是用UTF-16编码的,但是我缺少什么吗? Is there any case that 2000-charactr string can take more than 4000 bytes? 是否有2000个字符的字符串占用4000多个字节的情况?

No, a 2000 character String in Java can't take more than 4000 bytes of character data. 不, Java中的2000个字符的String不能接受超过4000个字节的字符数据。 You may occasionally hear it said that UTF-16 is a variable length encoding because it can take 2 or 4 bytes to represent a Unicode code point. 您可能偶尔会听到它说UTF-16是可变长度编码,因为它可能需要2或4个字节来表示Unicode代码点。 While this is true, it is irrelevant because Java's "character" is not a Unicode code point, but a UTF-16 code unit, which is always 2 bytes. 尽管这是正确的,但它无关紧要,因为Java的“字符”不是Unicode代码点,而是UTF-16代码单元,始终为2个字节。 Therefore, a 2000-character String in Java is exactly 4000 bytes of UTF-16 data. 因此, Java中的2000个字符的字符串恰好是4000个字节的UTF-16数据。

Tangential warning: based on your use of VARCHAR2, it seems to me that you are using an Oracle database. 切线警告:根据您对VARCHAR2的使用,在我看来您正在使用Oracle数据库。 Oracle has two main character set settings, the database character set and national character set. Oracle有两个主要的字符集设置,即数据库字符集和国家字符集。 The first is used by VARCHAR2 columns (among others), and the other is used by NVARCHAR2 columns (again, among others). 第一个由VARCHAR2列使用(除其他外),另一个由NVARCHAR2列使用(除其他外)。 UTF-16 is not supported for use as the database character set, but is for the national character set. 不支持将 UTF-16用作数据库字符集,但将其用于国家字符集。 I don't know what your data layer looks like so I can't say how this will affect you, but you can read this Oracle document on character sets for more information. 我不知道您的数据层是什么样子,所以我不能说这将如何影响您,但是您可以阅读有关字符集的Oracle文档以获取更多信息。

Taken altogether, a 2000-character string in Java can end up being more than 4000 bytes elsewhere – if somewhere along the way, it gets converted to a different encoding. 综上所述,在Java中,一个2000个字符的字符串最终可能在其他地方超过4000个字节–如果在此过程中的某个地方,它将转换为其他编码。

In UTF-16, according to what I've read online, Java can represent characters with either one or two 16-bit values. 根据我在网上阅读的内容,在UTF-16中,Java可以用一个或两个16位值表示字符。 The best way to check is use a sample string in what you might encode and print out the length and then use this as a reference for your application development. 最好的检查方法是使用示例字符串进行编码并打印出长度,然后将其用作应用程序开发的参考。

Here is sample code you can use to test this out: 这是您可以用来测试的示例代码:

String s = "Hello, world!"; 
int byteCountUTF16 = s.getBytes("UTF-16").length;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 给定最大非零值的限制,如何在Java中生成给定长度的随机二进制数? - How to generate a random binary number of a given length in Java given a limit on the maximum non-zero values? Java中字符串的最大长度 - Maximum length of a String in Java 给定数字中最大连续1个数字 - Maximum number of consecutive 1 in a given number 给定一个字符串作为输入,返回出现在字符串中的最大次数的char - Given a string as input, return the char which occurs the maximum number of times in the string UTF-8 编码后,如何截断 java 字符串以适应给定的字节数? - How do I truncate a java string to fit in a given number of bytes, once UTF-8 encoded? 将字符串字符分组为给定长度 - Group string characters into a given length Java中的格式字符串,以字节为单位 - Format String in Java regarding its length in bytes 将其转换为字符串后字节长度发生变化 - Length of bytes changes after converting it to String 给定一组点,求出最大三角形数 - Given a set of points, find the maximum number of triangles 给定数字n作为输入,返回长度为n的新字符串数组,其中包含字符串“ 0”,“ 1”,“ 2”,依此类推,直到n-1 - Given a number n as input, return a new string array of length n, containing the strings “0”, “1”, “2” so on till n-1
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM