简体   繁体   English

Arrays 字符数组的排序不当行为

[英]Arrays sort misbehavior for Character array

String s5 = "peek";
int i[] = {12, 25, 7, 3, 45};
Arrays.sort(i);

for(int x : i) {
  System.out.print(x + ",");
}

Arrays.sort(s5.toCharArray());

System.out.println(s5); // expected here eekp, but showing peek

for(char c : s5.toCharArray()) {
  System.out.print(c + ",");  //expected here e,e,k,p , but showing p,e,e,k
}

Output: Output:

3,7,12,25,45,peek
p,e,e,k,

for the line System.out.println(s5) I expected "eekp" but it's showing "peek".对于System.out.println(s5)行,我期望“eekp”,但它显示“peek”。

for the line System.out.print(c + ",") I expected "e,e,k,p" but it's showing "p,e,e,k"对于System.out.print(c + ",")行,我期望 "e,e,k,p" 但它显示的是 "p,e,e,k"

Arrays sort seems to work well for integers but not for character array, seems like am doing something wrong. Arrays 排序似乎适用于整数,但不适用于字符数组,似乎做错了什么。 could you please tell me?你能告诉我吗?

Strings are immutable (mostly)字符串是不可变的(大部分)

Arrays.sort(s5.toCharArray());

The above sorts a copy of the character array and does not modify the actual string value array.上面对字符数组的副本进行排序,并没有修改实际的字符串值数组。

The internal code of toCharArray in JDK 1.8 JDK 1.8中toCharArray的内部代码

  char result[] = new char[value.length];
  System.arraycopy(value, 0, result, 0, value.length);

Store and sort the copy(new object in a reference)存储和排序副本(参考中的新 object)

char[] copy = s5.toCharArray();
Arrays.sort(copy);

Do not try unless for fun除非为了好玩,否则不要尝试

  Field field = String.class.getDeclaredField("value");
  field.setAccessible(true);
  String s5 = "peek";
  Arrays.sort((char[]) field.get(s5));
  System.out.println(s5);

Change Arrays.sort(s5.toCharArray());更改Arrays.sort(s5.toCharArray()); To char[] s6 = s5.toCharArray()): Arrays.sort(s6); To char[] s6 = s5.toCharArray()): Arrays.sort(s6);

Then print the values of s6然后打印 s6 的值

toCharArray() gives you a new array from the source but doesn't modify the source. toCharArray() 为您提供来自源的新数组,但不会修改源。 When you print you are printing the source array not the array containing the sorted char当您打印时,您正在打印源数组而不是包含排序字符的数组

The Answer by Horse is correct.马的答案是正确的。 Your code is discarding a newly created array immediately after sorting.您的代码在排序后立即丢弃新创建的数组。 You passed a new anonymous array to Arrays.sort , which sorted that new array, and then the new array went out-of-scope, as you neglected to put the array into a variable.您将一个新的匿名数组传递给Arrays.sort ,它对该新数组进行排序,然后新数组超出范围,因为您忽略了将数组放入变量中。

In addition, I can show code using Unicode code points rather than char .此外,我可以使用 Unicode 代码点而不是char来显示代码。

Unicode code point numbers Unicode 代码点编号

The char type in Java is obsolete, unable to represent even half of the characters defined in Unicode. Java 中的char类型已过时,甚至无法表示 Unicode 中定义的一半字符。 Instead, you should be working with the code point numbers, one number assigned to each character in the Unicode standard.相反,您应该使用代码点编号,在 Unicode 标准中为每个字符分配一个编号。

You can get a stream ( IntStream ) of those integer code point numbers by calling String#codePoints .您可以通过调用String#codePoints获得 integer 代码点编号的 stream ( IntStream )。

Here is some example code.这是一些示例代码。 As an input we use a string that contains a couple of emoji characters.作为输入,我们使用包含几个表情符号字符的字符串。 Using char type, we incorrectly see four ?使用char类型,我们错误地看到了四个? where we expect two character numbers.我们期望两个字符数字。 Then we use code points, and we correctly see a pair of numbers (128075, 128506) for the pair of emoji characters.然后我们使用代码点,我们正确地看到一对表情符号字符的一对数字(128075、128506)。 Lastly, we use Java streams to sort the code point numbers, and collect the sorted numbers to build a new String object.最后,我们使用 Java 流对码位编号进行排序,并收集排序后的编号以构建新的String object。

String input = "👋🗺 Hello World.";
System.out.println( "input = " + input );

// Wrong way, using obsolete `char` type.
char[] chars = input.toCharArray();
System.out.println( "chars before sort: " + Arrays.toString( chars ) );  // Notice that we get *four* `?` where we expect *two* letters (each of two emoji).
Arrays.sort( chars );
System.out.println( "chars after sort: " + Arrays.toString( chars ) );

// Examining code points.
System.out.println( "Print the code point number for each letter in the string. Notice we get *two* larger numbers, one for each emoji." );
input.codePoints().forEach( System.out :: println );

// Sorting by code point numbers.
String sorted =
        input
                .codePoints()
                .sorted()
                .collect( StringBuilder :: new , StringBuilder :: appendCodePoint , StringBuilder :: append )
                .toString();
System.out.println( "sorted = " + sorted );

When run.跑的时候。

input = 👋🗺 Hello World.
chars before sort: [?, ?, ?, ?,  , H, e, l, l, o,  , W, o, r, l, d, .]
chars after sort: [ ,  , ., H, W, d, e, l, l, l, o, o, r, ?, ?, ?, ?]
Print the code point number for each letter in the string. Notice we get *two* larger numbers, one for each emoji.
128075
128506
32
72
101
108
108
111
32
87
111
114
108
100
46
sorted =   .HWdellloor👋🗺

The first two characters in sorted are actually SPACE characters, originally found between the words in input . sorted中的前两个字符实际上是空格字符,最初是在input中的单词之间找到的。

Instead of the streams, we could use arrays.除了流,我们可以使用 arrays。

int[] codePoints = input.codePoints().toArray();
System.out.println( "codePoints before sorting = " + Arrays.toString( codePoints ) );
Arrays.sort( codePoints );
System.out.println( "codePoints after sorting = " + Arrays.toString( codePoints ) );

When run.跑的时候。 Notice how the 32 number twice represents the two SPACE characters.请注意32数字如何两次代表两个空格字符。

codePoints before sorting = [128075, 128506, 32, 72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 46]
codePoints after sorting = [32, 32, 46, 72, 87, 100, 101, 108, 108, 108, 111, 111, 114, 128075, 128506]

For more info, read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) .有关更多信息,请阅读每个软件开发人员绝对、肯定必须了解 Unicode 和字符集的绝对最小值(没有借口!)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM