简体繁体中英

Does Java String.getBytes(“UTF-8”) preserve lexicograhpical order?

原文 2012-08-15 23:08:48 2 2 java/ sorting/ utf-8/ arrays/ lexicographic

如果我有一个字典排序的Java字符串[s1,s2,s3,s4, ...., sn] ，然后使用UTF-8编码将每个字符串转换为字节数组bx = sx.getBytes("UTF-8") ，是字节数组的列表[b1,b2,b3,...bn]也是字典排序的？

2 answers

Yes. According to RFC 3239 :

The byte-value lexicographic sorting order of UTF-8 strings is the same as if ordered by character numbers. Of course this is of limited interest since a sort order based on character numbers is almost never culturally valid.

As Ian Roberts pointed out, this applies for " true UTF-8 (such as String.getBytes will give you)", but beware of DataInputStream 's fake UTF-8 , which will sort [U+000000] after [U+000001] and [U+00F000] after [U+10FFFF].

You get a list/array of objects X, in a given orden.

You create a new list/array Y of such objects, applying a method.

Y will have the ordering that you created it with (normally you will have just kept X order). No reordering happens.

Also, lexycographical ordering for a byte[] is meaningless.

UTF-8 in Java's String.GetBytes(Charset)

java string.getBytes(“UTF-8”) javascript equivalent

Equivalent of Java's String.getBytes(“UTF-8”) in c++?

JavaScript equivalent of Java's String.getBytes(StandardCharsets.UTF_8)

Java String.getBytes(“UTF8”) JavaScript analog

java String.getBytes behavior

Confusion with string.getBytes()

Java String.getBytes( charsetName ) vs String.getBytes ( Charset object )

Java getBytes UTF-8 encoding

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question UTF-8 in Java's String.GetBytes(Charset) java string.getBytes(“UTF-8”) javascript equivalent Equivalent of Java's String.getBytes(“UTF-8”) in c++? Recommended method for handling UnsupportedEncodingException from String.getBytes(“UTF-8”) JavaScript equivalent of Java's String.getBytes(StandardCharsets.UTF_8) Java String.getBytes(“UTF8”) JavaScript analog java String.getBytes behavior Confusion with string.getBytes() Java String.getBytes( charsetName ) vs String.getBytes ( Charset object ) Java getBytes UTF-8 encoding

Related Tags

Does Java String.getBytes(“UTF-8”) preserve lexicograhpical order?

Question

2 answers

solution1
5 ACCPTED 2012-08-15 23:51:11

solution2
-2 2012-08-15 23:13:41

Does Java String.getBytes(“UTF-8”) preserve lexicograhpical order?

Question

2 answers

solution1 5 ACCPTED 2012-08-15 23:51:11

solution2 -2 2012-08-15 23:13:41

solution1
5 ACCPTED 2012-08-15 23:51:11

solution2
-2 2012-08-15 23:13:41