简体   繁体   中英

Java: reading text from a file results with strange formatting

Usually, when I read text files, I do it like this:

 File file = new File("some_text_file.txt");
 Scanner scanner = new Scanner(new FileInputStream(file));
 StringBuilder builder = new StringBuilder();
 while(scanner.hasNextLine()) {
     builder.append(scanner.nextLine());
     builder.append('\n');
 }
 scanner.close();
 String text = builder.toString();

There may be better ways, but this method has always worked for me perfectly.

For what I am working on right now, I need to read a large text file (over 700 kilobytes in size). Here is a sample of the text when opened in Notepad (the one that comes standard with any Windows operating system):

"lang"
{
    "Language"      "English"
    "Tokens"
    {
        "DOTA_WearableType_Daggers"     "Daggers"
        "DOTA_WearableType_Glaive"      "Glaive"
        "DOTA_WearableType_Weapon"      "Weapon"
        "DOTA_WearableType_Armor"       "Armor"

However, when I read the text from the file using the method that I provided above, the output is:

样本输出

I could not paste the output for some reason. I have also tried to read the file like so:

 File file = new File("some_text_file.txt");
 Path path = file.toPath();
 String text = new String(Files.readAllBytes(path));

... with no change in result.

How come the output is not as expected? I also tried reading a text file that I wrote and it worked perfectly fine.

It looks like encoding problem. Use a tool that can detect encoding to open the file (like Notepad++) and find how it is encoded. Then use the other constructor for Scanner:

Scanner scanner = new Scanner(new FileInputStream(file), encoding);

Or you can simply experiment with it, trying different encodings. It looks like UTF-16 to me.

最终扫描仪扫描仪=新扫描仪(新FileInputStream(文件),“UTF-16”);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM