简体   繁体   中英

Java IO fails to read text file

when I try to read thesaurus.txt, it reads it as "ÿþ ", although the first entry is " <pat>a cappella ". What could be causing this?

    File file = new File("thesaurus.txt");
    Scanner scan;
    try {
        scan = new Scanner(file);
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
        scan = null;
    }
    String entry;
    ArrayList<String> thes = new ArrayList<String>();
    while(scan.hasNext())
    {
        entry = scan.nextLine();
        if(entry != "")
        {
             thes.add(entry);
        }
    }
    return thes;

Yout input file is probably an UTF-16 (LE) file that starts with a byte order mark .

If you look at this file as if it is ISO 8859-1 you'll see those two characters: ÿþ which have codes FF and FE in that character encoding, which are exactly what you would expect when there's a UTF-16 BOM present.

You should explicitly specify the character encoding when reading the file, instead of relying on the default character encoding of your system:

scan = new Scanner(file, "UTF-16");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM