SAXException iso-8859-2

Question

I have an XML file which starts with <?xml version="1.0" encoding="iso-8859-2"?> . 我有一个以<?xml version="1.0" encoding="iso-8859-2"?>开头的XML文件。 I read it the following way: 我按以下方式阅读：

SAXParserFactory.newInstance().newSAXParser().parse(is, handler);

where is is an InputStream and handler is some arbitrary handler. where is一个InputStream，而handler是一些任意处理程序。 Then I get this exception: 然后我得到这个异常：

org.apache.harmony.xml.ExpatParser$ParseException: At line 41152, column 17: not well-formed (invalid token)

Actually there is a degree sign at that position, enclosed in a CDATA like this: 实际上，在该位置上有一个度数符号，封装在CDATA中，如下所示：

<![CDATA[something °]]>

Using the charset iso-8859-2, the parser should accept almost any character, including this one. 使用charset iso-8859-2，解析器应该接受几乎所有字符，包括该字符。 This seems not to be the case. 似乎并非如此。 What am I doing wrong? 我究竟做错了什么？

EDIT 编辑

I'm doing all this on Android. 我正在Android上进行所有操作。

Weird: it seems that the parser completely ignores the encoding attribute. 很奇怪：解析器似乎完全忽略了编码属性。 I converted the file to UTF-8 while leaving the header as is, and now my program can read it without error. 我将文件转换为UTF-8，同时保留标题不变，现在我的程序可以读取它而没有错误了。 Why is that?? 这是为什么？？

(I'm making the InputStream like this: new BufferedInputStream(new FileInputStream(filename)) , ie without a reader, so that cannot be the error.) （我使InputStream像这样： new BufferedInputStream(new FileInputStream(filename)) ，即没有阅读器，因此不会是错误。）

Answer 1

I worked around the error by recognizing the encoding manually. 我通过手动识别编码来解决该错误。 I peeked the XML header and looked for the encoding attribute (if available), extracted as a String, created a Java Charset object from it by Charset.forName() , then made a Reader with the given encoding and an InputSource over that Reader like this: 我偷看了XML标头，并寻找了encoding属性（如果可用），提取为字符串，然后通过Charset.forName()从中创建了Java Charset对象，然后使用给定的编码器创建了Reader，并在该Reader上设置了InputSource，例如这个：

String encoding;
Charset charset;
[...]
    Reader reader = new BufferedReader(new InputStreamReader(inputStream, charset));
    InputSource inputSource = new InputSource(reader);
    inputSource.setEncoding(encoding);
    SAXParserFactory.newInstance().newSAXParser().parse(inputSource, myHandler);

Unfortunately I still don't know why the encoding could not be recognized automatically by the parser. 不幸的是，我仍然不知道为什么解析器无法自动识别编码。

SAXException iso-8859-2

问题描述

1 个解决方案

解决方案1
0 已采纳 2013-03-27 10:39:15

SAXException iso-8859-2

问题描述

1 个解决方案

解决方案1 0 已采纳 2013-03-27 10:39:15

解决方案1
0 已采纳 2013-03-27 10:39:15