简体   繁体   English

java无法从文件中读取一行

[英]java can not read a line from file

I'm reading a file with the following piece of code: 我正在使用以下代码读取文件:

 Scanner in = new Scanner(new File(fileName));
    while (in.hasNextLine()) {
        String[] line = in.nextLine().trim().split("[ \t]");
       .
       .
       .
    }

When I open the file with the vim, some lines begin with the following special character: 当我用vim打开文件时,一些行以下面的特殊字符开头:

在此输入图像描述

but the java code can't read these lines. 但是java代码无法读取这些行。 When it reaches these lines it thinks that it's the end of the file and hasNextLine() function returns false!! 当它到达这些行时,它认为它是文件的结尾并且hasNextLine()函数返回false!

EDIT: this is the hex dump of the mentioned (problematic) line: 编辑:这是上述(有问题)行的十六进制转储:

0000000: e280 9c20 302e 3230 3133 3220 302e 3231 ... 0.20132 0.21 0000010: 3431 392d 302e 3034 0a 419-0.04. 0000000:e280 9c20 302e 3230 3133 3220 302e 3231 ... 0.20132 0.21 0000010:3431 392d 302e 3034 0a 419-0.04。

@VGR got it right. @VGR做对了。

tl;dr: Use Scanner in = new Scanner(new File(fileName), "ISO-8859-1"); tl; dr: Scanner in = new Scanner(new File(fileName), "ISO-8859-1");使用Scanner in = new Scanner(new File(fileName), "ISO-8859-1");

What appears to be happening is that: 似乎正在发生的是:

  • Your file is not valid UTF-8 due to that lone 0x9C character. 由于单个0x9C字符,您的文件无效UTF-8。
  • The Scanner is reading the file as UTF-8 since this is the system default 扫描程序将文件读​​取为UTF-8,因为这是系统默认值
  • The underlying libraries throw a MalformedInputException 底层库抛出MalformedInputException
  • The Scanner catches and hides it (a well meaning but misguided design decision) 扫描仪捕获并隐藏它(一个意义深薄但被误导的设计决定)
  • It starts reporting that it has no more lines 它开始报告它没有更多的线
  • You won't know anything's gone wrong unless you actually ask the Scanner 除非你真的问扫描仪,否则你不会知道出了什么问题

Here's a MCVE: 这是一个MCVE:

import java.io.*;
import java.util.*;

class Test {
  public static void main(String[] args) throws Exception {
    Scanner in = new Scanner(new File(args[0]), args[1]);
    while (in.hasNextLine()) {
      String line = in.nextLine();
      System.out.println("Line: " + line);
    }
    System.out.println("Exception if any: " + in.ioException());
  }
}

Here's an example of a normal invocation: 这是一个正常调用的示例:

$ printf 'Hello\nWorld\n' > myfile && java Test myfile UTF-8
Line: Hello
Line: World
Exception if any: null

Here's what you're seeing (except that you don't retrieve and show the hidden exception). 这是你所看到的(除了你没有检索并显示隐藏的异常)。 Notice in particular that no lines are shown: 特别注意没有显示任何行:

$ printf 'Hello\nWorld \234\n' > myfile && java Test myfile UTF-8
Exception if any: java.nio.charset.MalformedInputException: Input length = 1

And here it is when decoded as ISO-8859-1, a decoding in which all byte sequences are valid (even though 0x9C has no assigned character and therefore doesn't show up in a terminal): 这里解码为ISO-8859-1,这是一种解码,其中所有字节序列都有效(即使0x9C没有指定字符,因此不会显示在终端中):

$ printf 'Hello\nWorld \234\n' > myfile && java Test myfile ISO-8859-1
Line: Hello
Line: World
Exception if any: null

If you're only interested in ASCII data and don't have any UTF-8 strings, you can simply ask the scanner to use ISO-8859-1 by passing it as a second parameter to the Scanner constructor: 如果您只对ASCII数据感兴趣并且没有任何UTF-8字符串,您可以通过将其作为第二个参数传递给Scanner构造函数来让Scanner仪使用ISO-8859-1

Scanner in = new Scanner(new File(fileName), "ISO-8859-1");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM