简体   繁体   English

字符的java unicode值

[英]java unicode value of char

When I do Collection.sort(List), it will sort based on String's compareTo() logic,where it compares both the strings char by char. 当我执行Collection.sort(List)时,它将基于String的compareTo()逻辑进行排序,在该逻辑中,将两个字符串逐个字符进行比较。

    List<String> file1 = new ArrayList<String>();
    file1.add("1,7,zz");
    file1.add("11,2,xx");
    file1.add("331,5,yy");
    Collections.sort(file1);

My understanding is char means it specifies the unicode value, I want to know the unicode values of char like ,(comma) etc. How can I do it? 我的理解是char表示它指定了unicode值,我想知道char的unicode值,例如,(逗号)等。我该怎么办? Any url contains the numeric value of these? 任何网址都包含这些网址的数值吗?

My understanding is char means it specifies the unicode value, I want to know the unicode values of char like ,(comma) etc 我的理解是char表示它指定了unicode值,我想知道char的unicode值,例如,(逗号)等

Well there's an implicit conversion from char to int , which you can easily print out: 好吧,有一个从charint的隐式转换,您可以轻松地打印出来:

int value = ',';
System.out.println(value); // Prints 44

This is the UTF-16 code unit for the char . 这是char的UTF-16代码单元。 (As fge notes, a char in Java is a UTF-16 code unit, not a Unicode character. There are Unicode code points greater than 65535, which are represented as two UTF-16 code units.) (如fge所述,Java中的char是UTF-16代码单元,而不是Unicode字符。存在大于65535的Unicode代码点,它们表示为两个UTF-16代码单元。)

Any url contains the numeric value of these? 任何网址都包含这些网址的数值吗?

Yes - for more information about Unicode, go to the Unicode web site . 是-有关Unicode的更多信息,请访问Unicode网站

Uhm no, char is not a "unicode value" (and the word to use is Unicode code point ). 嗯, char不是一个“ unicode值”(使用的单词是Unicode code point )。

A char is a code unit in the UTF-16 encoding. char是UTF-16编码的代码单元。 And it so happens that in Unicode's Basic Multilingual Plane (ie, Unicode code points ranging from U+0000 to U+FFFF, for code points defined in this range), yes, there is a 1-to-1 mapping between char and Unicode. 碰巧的是,在Unicode的基本多语言平面中(即,对于此范围内定义的代码点,Unicode代码点的范围从U + 0000到U + FFFF),是的,在char和Unicode之间存在char映射。

In order to know the numeric value of a code point you can just do: 为了知道代码点的数值 ,您可以执行以下操作:

System.out.println((int) myString.charAt(0));

But this IS NOT THE CASE for code points outside the BMP. 但这不是BMP外部代码点的情况。 For these, one code point translates to two chars. 对于这些,一个代码点转换为两个字符。 See Character.toChars() . 请参见Character.toChars() And more generally, all static methods in Character relating to code points. 更一般而言, Character所有静态方法都与代码点有关。 There are quite a few! 有很多!

This also means that String 's .length() is actually misleading, since it returns the number of chars, not the number of graphemes. 这也意味着String.length()实际上具有误导性,因为它返回的是字符数, 而不是字素数。

Demonstration with one Unicode emoticon (the first in that page): 带有一个Unicode表情符号的演示(该页面的第一个):

System.out.println(new String(Character.toChars(0x1f600)).length())

prints 2 . 打印2 Whereas: 鉴于:

final String s = new String(Character.toChars(0x1f600));
System.out.println(s.codePointCount(0, s.length());

prints 1 . 打印1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM