简体   繁体   English

使用Java在文件中搜索Unicode字符串

[英]search a unicode string in a file using java

How to search a unicode string in a file using java? 如何使用Java搜索文件中的Unicode字符串? Below is the code that I have tried.It works strings other than unicode. 下面是我尝试过的代码,它可以处理unicode以外的字符串。

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    import java.io.*;
    import java.util.*;
    class file1
    {
   public static void main(String arg[])throws Exception
   {
    BufferedReader bfr1 = new BufferedReader(new InputStreamReader(
            System.in));
    System.out.println("Enter File name:");
    String str = bfr1.readLine();
    BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
    String s;
    int count=0;
    int flag=0;

    System.out.println("Enter the string to be found");
    s=br.readLine();
    BufferedReader bfr = new BufferedReader(new FileReader(str));
    String bfr2=bfr.readLine();
    Pattern p = Pattern.compile(s);
            Matcher matcher = p.matcher(bfr2);
            while (matcher.find()) {
            count++;
            }System.out.println(count);
   }}

Well, there are three potential sources of problems I can see: 好吧,我可以看到三种潜在的问题来源:

  • The regular expression may be incorrect. 正则表达式可能不正确。 Do you really need to use a regular expression? 您真的需要使用正则表达式吗? Are you trying to match a pattern, or just a simple string? 您是要匹配模式还是仅匹配简单的字符串?
  • You may be failing to get non-ASCII input from the command line. 您可能无法从命令行获取非ASCII输入。 You should dump out the input string in terms of its Unicode characters (see code later). 您应该使用Unicode字符转储输入字符串(请参阅后面的代码)。
  • You may well be reading the file in the wrong encoding. 您可能正在以错误的编码读取文件。 Currently you're using FileReader which always uses the platform default encoding. 当前,您正在使用FileReader ,该文件始终使用平台默认编码。 What's the encoding of the file you're trying to read? 您要读取的文件的编码是什么? I would recommend using FileInputStream wrapped in an InputStreamReader using an explicit encoding (eg UTF-8) which matches the file. 我建议使用使用与文件匹配的显式编码(例如UTF-8)包装在InputStreamReader中的FileInputStream

To debug the real values in strings, I would usually use something like this: 为了调试字符串中的实际值,我通常会使用以下内容:

private static void dumpString(String text) {
    for (int i = 0; i < text.length(); i++) {
        char c = text.charAt(i);
        System.out.printf("%d: %4h (%c)", i, c, c);
        System.out.println();
    }
}

That way you can see the exact UTF-16 code point in each char in the string. 这样,您可以在字符串的每个char中看到确切的UTF-16代码点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM