简体   繁体   English

SAX解析器跳过一些不被解析的元素?

[英]SAX parser to skip some elements which are not to be parsed?

So, I have a file like 所以,我有一个像

<root>
  <transaction ts="1">
    <abc><def></def></abc>
  </transaction>
  <transaction ts="2">
    <abc><def></def></abc>
  </transaction>
</root>

So, I have a condition which says if ts="2" then do something ... Now the problem is when it finds ts="1" it still scans through tags < abc>< def> and then reaches < transaction ts="2"> 所以,我有一个条件说如果ts =“ 2”然后做某件事...现在的问题是,当它找到ts =“ 1”时,它仍然扫描标签<abc> <def>,然后到达<transaction ts = “ 2“>

Is there a way when the condition doesn`t match the parsing breaks and look for the next transaction tag directly? 当条件与解析中断不匹配并直接寻找下一个事务标记时,是否有办法?

Is there a way when the condition doesn`t match the parsing breaks and look for the next transaction tag directly? 当条件与解析中断不匹配并直接寻找下一个事务标记时,是否有办法?

No. You'll have to write the SAX parser to know when to skip looking at the tags in the bad transaction block. 不。您必须编写SAX解析器才能知道何时跳过不良事务块中的标记。 That said, you'll probably find switching to STAX to be easier to do stuff like this than SAX. 就是说,您可能会发现切换到STAX可以比SAX更容易执行此类操作。

The sax parser calls your callbacks always for each XML element. sax解析器始终为每个XML元素调用回调。
You can solve your question by setting a field isIgnoreCurrentTransaction , once you detect the condition to ignore. 一旦检测到要忽略的条件,可以通过设置字段isIgnoreCurrentTransaction来解决您的问题。 Then in your other sax callbacks you check for isIgnoreCurrentTransaction amd simply do nothing in that case. 然后,在其他sax回调中,检查isIgnoreCurrentTransaction并在这种情况下不执行任何操作。

You can use a control flag in your SAX implementation which is raised when you detect your condition on a certain tag and lower the flag again once you exit the tag. 您可以在SAX实现中使用一个控制标志,当您检测到某个标签上的条件时会升高控制标志,并在退出标签后再次降低该标志。 You can use that flag to skip any processing when the parser runs through the children of the tag you are not interested in. 当解析器运行您不感兴趣的标记的子代时,可以使用该标志跳过任何处理。

Note however that your example XML is not valid. 但是请注意,您的示例XML无效。 You need to use proper nesting of your tags before you can process it with a SAX implementation, as stated in the comments. 如注释所述,您需要先使用标签的正确嵌套,然后才能使用SAX实现对其进行处理。

A SAX parser must scan thru all sub trees (like your "< abc>< def>< /def>< /abc>") to know where the next element starts. SAX解析器必须扫描所有子树(例如“ <abc> <def> </ def> </ abc>”)以知道下一个元素的起始位置。 No way to get around it, which is also the reason why you cannot parallelize a XML Parser for a single XML document. 无法解决它,这也是为什么您不能为单个XML文档并行化XML分析器的原因。

The only two ways of tuning I can think of in your case: 在您的情况下,我可以想到的仅有两种调优方法:

1) If you have many XML documents to parse, you can run one Parser for each document in its own thread. 1)如果要解析许多XML文档,则可以在其自己的线程中为每个文档运行一个解析器。 This would at least parallelize the overall work and utilize all CPU's and Cores you have available. 这样至少可以使整体工作并行化,并利用所有可用的CPU和内核。

2) If you just need to read up to a certain condition (like you mentioned < transaction ts="2">) you can skip parsing as soon as that condition is reached. 2)如果您只需要阅读特定条件(如您提到的<transaction ts =“ 2”>),则可以在达到该条件后立即跳过解析。 If skipping the parser would help, the way to this is by throwing an Exception. 如果跳过解析器会有所帮助,则方法是抛出Exception。

Your implementation of startElement within the ContentHandler would look like this: 您在ContentHandlerstartElement实现如下所示:

public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException {
    if(atts == null) return;
        if(localName.equals("transaction") && "2".equals(atts.getValue("ts"))) {
            // TODO: Whatever should happen when condition is reached
            throw new SAXException("Condition reached. Just skip rest of parsing");
        }
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM