[英]java can not read a line from file
I'm reading a file with the following piece of code: 我正在使用以下代码读取文件:
Scanner in = new Scanner(new File(fileName));
while (in.hasNextLine()) {
String[] line = in.nextLine().trim().split("[ \t]");
.
.
.
}
When I open the file with the vim, some lines begin with the following special character: 当我用vim打开文件时,一些行以下面的特殊字符开头:
but the java code can't read these lines. 但是java代码无法读取这些行。 When it reaches these lines it thinks that it's the end of the file and hasNextLine() function returns false!! 当它到达这些行时,它认为它是文件的结尾并且hasNextLine()函数返回false!
EDIT: this is the hex dump of the mentioned (problematic) line: 编辑:这是上述(有问题)行的十六进制转储:
0000000: e280 9c20 302e 3230 3133 3220 302e 3231 ... 0.20132 0.21 0000010: 3431 392d 302e 3034 0a 419-0.04. 0000000:e280 9c20 302e 3230 3133 3220 302e 3231 ... 0.20132 0.21 0000010:3431 392d 302e 3034 0a 419-0.04。
@VGR got it right. @VGR做对了。
tl;dr: Use Scanner in = new Scanner(new File(fileName), "ISO-8859-1");
tl; dr: Scanner in = new Scanner(new File(fileName), "ISO-8859-1");
使用Scanner in = new Scanner(new File(fileName), "ISO-8859-1");
What appears to be happening is that: 似乎正在发生的是:
MalformedInputException
底层库抛出MalformedInputException
Here's a MCVE: 这是一个MCVE:
import java.io.*;
import java.util.*;
class Test {
public static void main(String[] args) throws Exception {
Scanner in = new Scanner(new File(args[0]), args[1]);
while (in.hasNextLine()) {
String line = in.nextLine();
System.out.println("Line: " + line);
}
System.out.println("Exception if any: " + in.ioException());
}
}
Here's an example of a normal invocation: 这是一个正常调用的示例:
$ printf 'Hello\nWorld\n' > myfile && java Test myfile UTF-8
Line: Hello
Line: World
Exception if any: null
Here's what you're seeing (except that you don't retrieve and show the hidden exception). 这是你所看到的(除了你没有检索并显示隐藏的异常)。 Notice in particular that no lines are shown: 特别注意没有显示任何行:
$ printf 'Hello\nWorld \234\n' > myfile && java Test myfile UTF-8
Exception if any: java.nio.charset.MalformedInputException: Input length = 1
And here it is when decoded as ISO-8859-1, a decoding in which all byte sequences are valid (even though 0x9C has no assigned character and therefore doesn't show up in a terminal): 这里解码为ISO-8859-1,这是一种解码,其中所有字节序列都有效(即使0x9C没有指定字符,因此不会显示在终端中):
$ printf 'Hello\nWorld \234\n' > myfile && java Test myfile ISO-8859-1
Line: Hello
Line: World
Exception if any: null
If you're only interested in ASCII data and don't have any UTF-8 strings, you can simply ask the scanner to use ISO-8859-1
by passing it as a second parameter to the Scanner
constructor: 如果您只对ASCII数据感兴趣并且没有任何UTF-8字符串,您可以通过将其作为第二个参数传递给Scanner
构造函数来让Scanner
仪使用ISO-8859-1
:
Scanner in = new Scanner(new File(fileName), "ISO-8859-1");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.