简体   繁体   English

Java XMLStreamReader.getText()在XML编码字符上窒息?

[英]Java XMLStreamReader.getText() chokes on XML encoded characters?

I am trying to parse a giant (> 1GB) xml file using Java's XMLStreamReader. 我试图使用Java的XMLStreamReader解析一个巨大的(> 1GB)xml文件。 I use the getText() method to pull the contents of a node. 我使用getText()方法来提取节点的内容。 The xml file I have is encoded as ISO-8859-1, and some characters have special encoding, for example & is encoded as & 我拥有的xml文件编码为ISO-8859-1,有些字符具有特殊编码,例如&编码为& in the file. 在文件中。

So if the file contains, for example: 因此,如果文件包含,例如:

<person>Jack</person>
<person>Jill</person>
<persons>Jack &amp; Jill</persons>

And I try to get the contents of each node using getText(), the 3rd node only returns Jack . 我尝试使用getText()获取每个节点的内容,第三个节点只返回Jack Any time a &xxx; 任何时候a &xxx; character is encountered, no characters after it (in the same node) are parsed or returned. 遇到字符,解析或返回后没有字符(在同一节点中)。

Where is the problem? 问题出在哪儿? Is the xml file encoded correctly? xml文件是否正确编码? Am I using the Java parser correctly? 我正确使用Java解析器吗?

Thanks! 谢谢!

I suspect that the problem is that the parser has split the contents of the 3rd persons elements into multiple processing events. 我怀疑问题在于解析器已将第三人称元素的内容拆分为多个处理事件。 (This behaviour of next() is documented .) Calling getText() is only giving you the text for the current event. (此行为next()记录在案 。)调用getText()只给你当前事件的文本。

Try using getElementText() instead. 请尝试使用getElementText()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM