在Java中使用String.matches（）

Question

Let's see the following expressions about String.matches() in Java. 让我们在Java中看到有关String.matches()的以下表达式。

System.out.println("55CCEE".matches("[0-9A-Za-z]{6}"));  //true
System.out.println("CC77HH".matches("[0-9A-Za-z]{6}"));  //true
System.out.println("CC1156".matches("[0-9A-Za-z]{6}"));  //true

System.out.println("С".matches("[0-9A-Za-z]{1}"));       //false
System.out.println("СС".matches("[0-9A-Za-z]{2}"));      //false
System.out.println("СС5588".matches("[0-9A-Za-z]{6}"));  //false
System.out.println("СС5589".matches("[0-9A-Za-z]{6}"));  //false

The first three cases look fine and work as expected. 前三个案例看起来很好并按预期工作。 The rest of the cases, however seem to be failing and return false which wasn't expected. 然而，其余的案件似乎都失败了并且返回了false ，这是没有预料到的。 Why does this happen? 为什么会这样？

Answer 1

In the second set, you don't have a C in your string, you have a С which is 0xd0 0xa1 in UTF-8. 在第二组中，你没有C在你的字符串，你有С是0xd0 0xa1在UTF-8。

That char is Cyrillic - CYRILLIC CAPITAL LETTER ES . 这个字母是西里尔字母 - CYRILLIC CAPITAL LETTER ES 。 See the Cyrillic code chart (PDF) at unicode.org. 请参阅unicode.org上的西里尔语代码表（PDF）。

Answer 2

The 'C' character in the lines that are failing is a different Unicode character than the C in the [AZ] character class. 失败行中的“C”字符是与[AZ]字符类中的C不同的Unicode字符。

    int unicodeFrom3rdLine = 'C';
    int unicodeFrom4thLine = 'С';
    System.out.println(Integer.toHexString(unicodeFrom3rdLine));
    System.out.println(Integer.toHexString(unicodeFrom4thLine));

produces 产生

    43
    421

Even though they look the same, they're actually different characters, the former being the normal C and latter being Cyrillic . 即使它们看起来一样，但它们实际上是不同的角色，前者是正常的C，后者是西里尔。

在Java中使用String.matches（）

问题描述

2 个解决方案

解决方案1
10 已采纳 2012-01-14 09:06:51

解决方案2
2 2012-01-14 09:23:53

在Java中使用String.matches（）

问题描述

2 个解决方案

解决方案1 10 已采纳 2012-01-14 09:06:51

解决方案2 2 2012-01-14 09:23:53

解决方案1
10 已采纳 2012-01-14 09:06:51

解决方案2
2 2012-01-14 09:23:53