简体   繁体   中英

Scanner stops reading randomly when ran from exported .jar but not when ran from Eclipse

so today I ran into some trouble while using the java Scanner. I use the Scanner class many times in my project and I never ran into any problem.

Basically, I always do something like this:

try (Scanner scanner = new Scanner(file)) {
    while(scanner.hasNextLine()) {
        String line = scanner.nextLine();
        ...
    }
} catch ...
} finally ...

and the Scanner works just fine because it's just some simple code. Today, however, I used the code above to read text files with about 17000 lines.

At first the code worked just fine (when running it through Eclipse) as I expected but then, after exporting the project, the Scanner would stop reading after about 400 lines.

I googled a bit and in the end I solved the problem thanks to these answers:

All I had to do was change the constructor from

Scanner scanner = new Scanner(file)

to

Scanner scanner = new Scanner(new FileInputStream(sql)))

It is some weird encoding problem, I get it. But why when I ran the code from Eclipse it worked flawlessy and when I ran it from my exported jar the Scanner stopped after reading about 400 lines?

The code does the exact same thing in both cases because I set up Eclipse so that it would use the same working directory as the exported .jar archive (because it has got some data subdirectories):

  1. Takes the same .gz archive
  2. Extract a file from the .gz
  3. Read the file like I showed above

Not sure if it helps but Eclipse is set up to save source files in UTF-8 format.

Thanks in advance

But why when I ran the code from Eclipse it worked flawlessy and when I ran it from my exported jar the Scanner stopped after reading about 400 lines?

Scanner has two different constructors that accept a File as argument. From the docs :

Scanner(File source)

Bytes from the file are converted into characters using the underlying platform's default charset .

and

Scanner(File source, String charsetName)

Bytes from the file are converted into characters using the specified charset .

So if you do not specify the charsetName , it will use the environment's default charset.

The environment encoding when you run your project outside Eclipse is probably other than UTF-8. To check that this is the case you can write a simple program like this:

class CheckDefaultCharset {
    public static void main(String... args) {
        System.out.println(Charset.defaultCharset());
    }
}

And run it on both environments.

For example, when running the above code from Eclipse I get:

UTF-8

And when running in the PowerShell (Windows 7), I get:

windows-1252

To avoid this type of problem it would be better to always specify the encoding of the files you intend to use when using Scanner .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM