简体   繁体   English

UTF-8 到 Java 中的字符串

[英]UTF-8 to String in Java

I am having a little problem with the UTF-8 charset.我对 UTF-8 字符集有一点问题。 I have a UTF-8 encoded file which I want to load and analyze.我有一个 UTF-8 编码文件,我想加载和分析它。 I am using BufferedReader to read the file line by line.我正在使用 BufferedReader 逐行读取文件。

BufferedReader buffReader = new BufferedReader(new InputStreamReader
(new FileInputStream(file),"UTF-8"));

My problem is that the normals String methods (trim() and equals() for example) in Java are not suitable to use with the line read from the BufferReader in every iteration of the loop that I created to read all the content of the BufferedReader.我的问题是 Java 中的法线字符串方法(例如,trim() 和 equals())不适合在我为读取 BufferedReader 的所有内容而创建的循环的每次迭代中从 BufferReader 读取的行中使用. For example, in the encoded file, I have < menu > which I want my program to treat it as it is, however, for now, it is seen as ?? < menu >例如,在编码文件中,我有< menu >我希望我的程序按原样处理它,但是,现在,它被视为?? < menu > ?? < menu > mixed with some others strange characters. ?? < menu >夹杂着其他一些奇怪的字符。 I want to know if there is a way to remove all the charset codifications and keep just the plain text so I can use all the methods of the String class without complications.我想知道是否有一种方法可以删除所有字符集编码并只保留纯文本,这样我就可以使用字符串 class 的所有方法而不会出现复杂情况。 Thank you谢谢

If your jdk is not getting too old (1.5) you can do it like this:如果您的 jdk 不是太旧(1.5),您可以这样做:

Locale frLocale = new Locale("fr", "FR");
Scanner scanner = new Scanner(new FileInputStream(file), "UTF-8");
scanner.useLocale(frLocale);

for (; scanner.hasNextLine(); numLine++) {
 line = scanner.nextLine();
}

The scanner can also use delimiters other than whitespace.扫描仪还可以使用除空格以外的分隔符。 This example reads several items in from a string:此示例从字符串中读取多个项目:

         String input = "1 fish 2 fish red fish blue fish";
         Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
         System.out.println(s.nextInt());
         System.out.println(s.nextInt());
         System.out.println(s.next());
         System.out.println(s.next());
         s.close(); 

prints the following output:

         1
         2
         red
         blue 

see Doc for Scanner here 在此处查看扫描仪文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM