Java IO fails to read text file

Question

when I try to read thesaurus.txt, it reads it as "ÿþ ", although the first entry is " <pat>a cappella ". What could be causing this?

    File file = new File("thesaurus.txt");
    Scanner scan;
    try {
        scan = new Scanner(file);
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
        scan = null;
    }
    String entry;
    ArrayList<String> thes = new ArrayList<String>();
    while(scan.hasNext())
    {
        entry = scan.nextLine();
        if(entry != "")
        {
             thes.add(entry);
        }
    }
    return thes;

Answer 1

Yout input file is probably an UTF-16 (LE) file that starts with a byte order mark .

If you look at this file as if it is ISO 8859-1 you'll see those two characters: ÿþ which have codes FF and FE in that character encoding, which are exactly what you would expect when there's a UTF-16 BOM present.

You should explicitly specify the character encoding when reading the file, instead of relying on the default character encoding of your system:

scan = new Scanner(file, "UTF-16");

Java IO fails to read text file

Question

1 answers

solution1
3 2015-02-20 22:55:49

Java IO fails to read text file

Question

1 answers

solution1 3 2015-02-20 22:55:49

solution1
3 2015-02-20 22:55:49