简体   繁体   English

发布使用SaxParser解析XML文档 - 2047字符限制?

[英]Issue Parsing XML Document using SaxParser - 2047 character limit?

I have created a class that extends the SaxParser DefaultHandler class. 我创建了一个扩展SaxParser DefaultHandler类的类。 My intent is to store the XML input in a series of objects while preserving the data integrity of the original XML data. 我的目的是将XML输入存储在一系列对象中,同时保留原始XML数据的数据完整性。 During testing, I notice that some of the node data was being truncated arbitrarily on input. 在测试期间,我注意到一些节点数据在输入时被任意截断。

For example: 例如:

Input: <temperature>-125</temperature>  Output: <sensitivity>5</sensitivity>
Input: <address>101_State</city>             Output: <address>te</address> 

To further complicate things, the above errors occurs "randomly" for 1 out of every ~100 instances of the same XML tags. 为了使事情进一步复杂化,上述错误“随机”发生在相同XML标记的每100个实例中的1个中。 Meaning the input XML file has roughly 100 tags that contain <temperature>-125</temperature> but only one of them produces an output of <sensitivity>5</sensitivity> . 这意味着输入XML文件大约有100个包含<temperature>-125</temperature>标签,但只有一个产生<sensitivity>5</sensitivity> The other tags accurately produce <sensitivity>-125</sensitivity> . 其他标签准确地产生<sensitivity>-125</sensitivity>

I have overwritten the abstract "characters(char[] ch, int start, int length)" method to simple grab the character content between XML tags: 我已经覆盖了抽象的“characters(char [] ch,int start,int length)”方法来简单地抓取XML标签之间的字符内容:

public void characters(char[] ch, int start, int length)
            throws SAXException {

            value = new String(ch, start, length);

            //debug
            System.out.println("'" + value + "'" + "start: " + start + "length: " + length);
        }

My println statements produce the following output for the specific temperature tag that results in erroneous output : 我的println语句为特定温度标记生成以下输出,导致错误输出:

> '-12'start: 2045length: 3 '5'start:
> 0length: 1

This tells me that the characters methods is being called twice for this specific xml element. 这告诉我,对于这个特定的xml元素,字符方法被调用两次。 It is being called once for all other xml tags. 对于所有其他xml标记,它被调用一次。 The "start" value of the secong line signifies to me that the char[] chars is being reset in the middle of this XML tag. secong行的“start”值表示char []字符正在此XML标记的中间重置。 And the character method is being called again with the new char []. 然后使用新的char []再次调用字符方法。

Is anyone familiar with this issue? 有人熟悉这个问题吗? I was wondering if I was reaching the limit of a char []'s capacity. 我想知道我是否达到了char []容量的极限。 But a quick query renders this unlikely. 但是快速查询会使这种情况不太可能。 My char [] seems to be resetting at ~ 2047 characters 我的char []似乎重置为~2047个字符

Thanks, 谢谢,

LB

The characters callback method need not be provided with a complete chunk of data by the SAX Parser. SAX Parser不需要为字符回调方法提供完整的数据块。 The parser could invoke the characters() method multiple times, sending a chunk of data at a time. 解析器可以多次调用characters()方法,一次发送一块数据。

The resolution is to accumulate all the data in a buffer , until the next call happens to another method (a non-characters call). 解决方案是将所有数据累积到缓冲区中 ,直到下一次调用发生在另一个方法(非字符调用)上。

I spent 2 whole days looking for the solution. 我整整花了2天时间寻找解决方案。

Change your characters method to this: 将您的字符方法更改为:

public void characters(char[] ch, int start, int length) throws SAXException {

  if(value == null)
    value = new String(ch, start, length);
  else
    value += new String(ch, start, length);

  //debug
  System.out.println("'" + value + "'" + "start: " + start + "length: " + length);

}

And its done!!! 它完成了!

Make sure you add value = ""; 确保你添加value = ""; at the end of endElementMethod endElementMethod的末尾

public void endElement( String uri, String localName, String qName ) throws SAXException 
{
    ...
    value = "";
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM