简体   繁体   中英

SAXException iso-8859-2

I have an XML file which starts with <?xml version="1.0" encoding="iso-8859-2"?> . I read it the following way:

SAXParserFactory.newInstance().newSAXParser().parse(is, handler);

where is is an InputStream and handler is some arbitrary handler. Then I get this exception:

org.apache.harmony.xml.ExpatParser$ParseException: At line 41152, column 17: not well-formed (invalid token)

Actually there is a degree sign at that position, enclosed in a CDATA like this:

<![CDATA[something °]]>

Using the charset iso-8859-2, the parser should accept almost any character, including this one. This seems not to be the case. What am I doing wrong?

EDIT

I'm doing all this on Android.

Weird: it seems that the parser completely ignores the encoding attribute. I converted the file to UTF-8 while leaving the header as is, and now my program can read it without error. Why is that??

(I'm making the InputStream like this: new BufferedInputStream(new FileInputStream(filename)) , ie without a reader, so that cannot be the error.)

I worked around the error by recognizing the encoding manually. I peeked the XML header and looked for the encoding attribute (if available), extracted as a String, created a Java Charset object from it by Charset.forName() , then made a Reader with the given encoding and an InputSource over that Reader like this:

String encoding;
Charset charset;
[...]
    Reader reader = new BufferedReader(new InputStreamReader(inputStream, charset));
    InputSource inputSource = new InputSource(reader);
    inputSource.setEncoding(encoding);
    SAXParserFactory.newInstance().newSAXParser().parse(inputSource, myHandler);

Unfortunately I still don't know why the encoding could not be recognized automatically by the parser.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM