简体   繁体   中英

java utf-8 text file reading bug?

well I have a simple text file where I have my textual data filled, which requires to be saved as utf-8, since I have some unicode symbols...

Well i just wrote a normal text file with notepad and saved as txt with utf-8

But i seem to be getting some kind of weird thing in front: 在此处输入图片说明

It's some kind of weird dot which can't even normally be pasted anywhere else. I could maybe try removing the first symbol, but I don't think that's a real solution, besides I'm not sure if it will always come up...

This is the code part:

FileInputStream fstream = new FileInputStream(fileName);
        // Get the object of DataInputStream
        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String values;

        //Read File Line By Line

        System.out.println("Generating queries from: " + fileName);
        String fields = br.readLine(); 
        System.out.println("The fields are: " + fields); 

Anyone came accross this and knows a solution?

Thanks in advance.

It is probably a Unicode Byte Order Mark (BOM) . Some text editors (on Windows) start a UTF-8 text file with a BOM to flag that it is Unicode.

If you need to deal with this in Java, test to see if the first Unicode codepoint you read from the file is 0xffef , and if it is then remove it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM