简体   繁体   English

从键盘读取时,希腊字符串与正则表达式不匹配

[英]Greek String doesn't match regex when read from keyboard

public static void main(String[] args) throws IOException {
   String str1 = "ΔΞ123456";
   System.out.println(str1+"-"+str1.matches("^\\p{InGreek}{2}\\d{6}")); //ΔΞ123456-true

   BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
   String str2 = br.readLine(); //ΔΞ123456 same as str1.
   System.out.println(str2+"-"+str2.matches("^\\p{InGreek}{2}\\d{6}")); //Δ�123456-false

   System.out.println(str1.equals(str2)); //false
}

The same String doesn't match regex when read from keyboard. 从键盘读取时,相同的String与正则表达式不匹配。
What causes this problem, and how can we solve this? 是什么导致了这个问题,我们如何解决这个问题呢?
Thanks in advance. 提前致谢。

EDIT: I used System.console() for input and output. 编辑:我使用System.console()进行输入和输出。

public static void main(String[] args) throws IOException {
        PrintWriter pr = System.console().writer();

        String str1 = "ΔΞ123456";
        pr.println(str1+"-"+str1.matches("^\\p{InGreek}{2}\\d{6}")+"-"+str1.length());

        String str2 = System.console().readLine();
        pr.println(str2+"-"+str2.matches("^\\p{InGreek}{2}\\d{6}")+"-"+str2.length());

        pr.println("str1.equals(str2)="+str1.equals(str2));
}

Output: 输出:

ΔΞ123456-true-8 ΔΞ123456真-8
ΔΞ123456 ΔΞ123456
ΔΞ123456-true-8 ΔΞ123456真-8
str1.equals(str2)=true str1.equals(STR2)=真

There are multiple places where transcoding errors can take place here. 有很多地方可以在这​​里进行转码错误。

  1. Ensure that your class is being compiled correctly (unlikely to be an issue in an IDE): 确保正确编译您的类(不太可能是IDE中的问题):
    • Ensure that the compiler is using the same encoding as your editor (ie if you save as UTF-8, set your compiler to use that encoding ) 确保编译器使用与编辑器相同的编码(即,如果保存为UTF-8, 请将编译器设置为使用该编码
    • Or switch to escaping to the ASCII subset that most encodings are a superset of (ie change the string literal to "\Δ\Ξ123456" ) 或者切换到转义为大多数编码都是超集的ASCII子集( "\Δ\Ξ123456"字符串文字更改为"\Δ\Ξ123456"
  2. Ensure you are reading input using the correct encoding: 确保使用正确的编码读取输入:
    • Use the Console to read input - this class will detect the console encoding 使用控制台读取输入 - 此类将检测控制台编码
    • Or configure your Reader to use the correct encoding (probably windows-1253) or set the console to Java's default encoding 或者将Reader配置为使用正确的编码 (可能是windows-1253)或将控制台设置为Java的默认编码

Note that System.console() returns null in an IDE, but there are things you can do about that . 请注意, System.console()在IDE中返回null,但您可以对此进行操作

If you use Windows, it may be caused by the fact that console character encoding ("OEM code page") is not the same as a system encoding ("ANSI code page"). 如果您使用Windows,可能是因为控制台字符编码(“OEM代码页”)与系统编码(“ANSI代码页”)不同。

InputStreamReader without explicit encoding parameter assumes input data to be in the system default encoding, therefore characters read from the console are decoded incorrectly. 没有显式编码参数的InputStreamReader假定输入数据采用系统默认编码,因此从控制台读取的字符解码不正确。

In order to correctly read non-us-ascii characters in Windows console you need to specify console encoding explicitly when constructing InputStreamReader (required codepage number can be found by executing mode con cp in the command line): 为了在Windows控制台中正确读取非us-ascii字符,您需要在构造InputStreamReader时显式指定控制台编码(可以通过在命令行中执行mode con cp找到所需的代码页编号):

BufferedReader br = new BufferedReader(
    new InputStreamReader(System.in, "CP737")); 

The same problem applies to the output, you need to construct PrintWriter with proper encoding: 同样的问题适用于输出,您需要使用适当的编码构造PrintWriter

PrintWriter out = new PrintWrtier(new OutputStreamWriter(System.out, "CP737"));

Note that since Java 1.6 you can avoid these workarounds by using Console object obtained from System.console() . 请注意 ,从Java 1.6开始,您可以通过使用从System.console()获取的Console对象来避免这些变通方法。 It provides Reader and Writer with correctly configured encoding as well as some utility methods. 它为ReaderWriter提供了正确配置的编码以及一些实用方法。

However, System.console() returns null when streams are redirected (for example, when running from IDE). 但是, System.console()在重定向流时返回null (例如,从IDE运行时)。 A workaround for this problem can be found in McDowell's answer. 可以在McDowell的答案中找到解决此问题的方法。

See also: 也可以看看:

I get true in both cases with nothing changed on your code. 在这两种情况下,我都认为你的代码没有任何改变。 (I tested with greek layout keyboard - I'm from Greece :]) (我用希腊布局键盘测试 - 我来自希腊:])
Probably your keyboard is sending ascii in 8859-7 ISO and not UTF-8. 可能你的键盘在8859-7 ISO中发送ascii而不是UTF-8。 Mine sends UTF-8. 我发送UTF-8。

EDIT: I still get true with the addition of the equals command.. 编辑:我仍然通过添加equals命令来实现..

System.out.println(str1.equals(str2));


Check if you can get it working by changing everything to greek in the regional options (if you are using windows). 通过在区域选项中将所有内容更改为希腊语(如果您使用的是Windows),检查是否可以使其正常工作。

Rundll32 Shell32.dll,Control_RunDLL Intl.cpl,,0

If this is the case then you can act accordingly.. as 'axtavt' said 如果是这种情况,那么你可以采取相应行动......正如'axtavt'所说

The keyboard is likely not sending the characters as UTF-8, but as the operating system's default character encoding. 键盘可能不会将字符作为UTF-8发送,而是作为操作系统的默认字符编码。

See also 也可以看看

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM