简体   繁体   中英

Reading from a file containing unmappable characters

I am attempting to use File and Scanner to read through a.txt file and grab the useful information within into a separate file. Some of these files contain Chinese characters and its causing my Scanner to throw the following error "java.nio.charset.UnmappableCharacterException:". The Chinese characters are of no importance, so how do I make the scanner ignore the Chinese characters and keep searching the rest of the file for useful information?

Here is the code:

            try {
                File source = new File(this.parentDirectory + File.separator + this.fileName.getText());
                Scanner reader = new Scanner(source);
                StringBuilder str = new StringBuilder();
                while (reader.hasNextLine()) {
                    str.append(reader.nextLine());
                    str.append("\n");
                }
                if (reader.ioException() != null) {
                    throw reader.ioException();
                }
                reader.close();
                this.input.setText(str.toString());
            } catch (FileNotFoundException e1) {
                JOptionPane.showMessageDialog(this, "File not found!");
                return;
            } catch (IOException e1) {
                // TODO Auto-generated catch block
                e1.printStackTrace();
            }

A scanner implicitly converts between an external sequence of bytes, and the 16-bit Unicode characters used by all Java Strings.

You need to know the actual encoding used for the external data (ie, the file content). Then you declare your Scanner as

  Scanner reader = new Scanner(file, charset);

Having done that correctly, then there should be no 'unmappable' characters.

If you don't specify the charset explicitly, then the platform default is used, which is probably UTF-8.

Alternatively, it seems that you're not really using the Scanner to any significant degree; you're just using it to collect lines. You could drop down a level and use a FileInputStream to read the file as a sequence of bytes, and use whatever heuristics you think appropriate to determine the 'useful' parts of the file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM