[英]SAXException iso-8859-2
I have an XML file which starts with <?xml version="1.0" encoding="iso-8859-2"?>
. 我有一个以
<?xml version="1.0" encoding="iso-8859-2"?>
开头的XML文件。 I read it the following way: 我按以下方式阅读:
SAXParserFactory.newInstance().newSAXParser().parse(is, handler);
where is
is an InputStream and handler
is some arbitrary handler. where
is
一个InputStream,而handler
是一些任意处理程序。 Then I get this exception: 然后我得到这个异常:
org.apache.harmony.xml.ExpatParser$ParseException: At line 41152, column 17: not well-formed (invalid token)
Actually there is a degree sign at that position, enclosed in a CDATA like this: 实际上,在该位置上有一个度数符号,封装在CDATA中,如下所示:
<![CDATA[something °]]>
Using the charset iso-8859-2, the parser should accept almost any character, including this one. 使用charset iso-8859-2,解析器应该接受几乎所有字符,包括该字符。 This seems not to be the case.
似乎并非如此。 What am I doing wrong?
我究竟做错了什么?
EDIT 编辑
I'm doing all this on Android. 我正在Android上进行所有操作。
Weird: it seems that the parser completely ignores the encoding attribute. 很奇怪:解析器似乎完全忽略了编码属性。 I converted the file to UTF-8 while leaving the header as is, and now my program can read it without error.
我将文件转换为UTF-8,同时保留标题不变,现在我的程序可以读取它而没有错误了。 Why is that??
这是为什么??
(I'm making the InputStream like this: new BufferedInputStream(new FileInputStream(filename))
, ie without a reader, so that cannot be the error.) (我使InputStream像这样:
new BufferedInputStream(new FileInputStream(filename))
,即没有阅读器,因此不会是错误。)
I worked around the error by recognizing the encoding manually. 我通过手动识别编码来解决该错误。 I peeked the XML header and looked for the
encoding
attribute (if available), extracted as a String, created a Java Charset
object from it by Charset.forName()
, then made a Reader with the given encoding and an InputSource over that Reader like this: 我偷看了XML标头,并寻找了
encoding
属性(如果可用),提取为字符串,然后通过Charset.forName()
从中创建了Java Charset
对象,然后使用给定的编码器创建了Reader,并在该Reader上设置了InputSource,例如这个:
String encoding;
Charset charset;
[...]
Reader reader = new BufferedReader(new InputStreamReader(inputStream, charset));
InputSource inputSource = new InputSource(reader);
inputSource.setEncoding(encoding);
SAXParserFactory.newInstance().newSAXParser().parse(inputSource, myHandler);
Unfortunately I still don't know why the encoding could not be recognized automatically by the parser. 不幸的是,我仍然不知道为什么解析器无法自动识别编码。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.