简体   繁体   English

woodstox 跳过部分 xml

[英]woodstox skip part of xml

Java: 1.6爪哇:1.6
Woodstox: 4.1.4伍德斯托克斯:4.1.4

I just want to skip part of xml file, while parsing.我只想在解析时跳过部分 xml 文件。 Let's look at that simple xml:让我们看一下那个简单的xml:

<family>
    <mom>
        <data height="160"/>
    </mom>
    <dad>
        <data height="175"/>
    </dad>
</family>

I just want do skip dad element.我只想跳过爸爸元素。 So it look's like using skipElement method like shown below is a good idea:所以看起来像使用如下所示的 skipElement 方法是一个好主意:

FileInputStream fis = ...;
XMLStreamReader2 xmlsr = (XMLStreamReader2) xmlif.createXMLStreamReader(fis);

String currentElementName = null;
while(xmlsr.hasNext()){
            
    int eventType = xmlsr.next();
                        
    switch(eventType){
            
        case (XMLEvent2.START_ELEMENT):
            currentElementName = xmlsr.getName().toString();
                    
            if("dad".equals(currentElementName) == true){
                logger.info("isStartElement: " + xmlsr.isStartElement());
                logger.info("Element BEGIN: " + currentElementName);
                xmlsr.skipElement();
            }

                    ...
    }
}

We just find start of element dad, and skip it.我们只是找到元素爸爸的开始,然后跳过它。 But not so fast, because Exception will be thrown.但没那么快,因为会抛出异常。 This is the output:这是输出:

isStartElement: true
Element BEGIN: dad
Exception in thread "main" java.lang.IllegalStateException: Current state not START_ELEMENT

That is not what expected.这不是预期的。 This is indeed very unexpected, because method skipElement is executed in START_ELEMENT state.这确实很意外,因为方法skipElement是在START_ELEMENT状态下执行的。 What is going on?到底是怎么回事?

I tried this in java 1.6 (jdk1.6.0_30) with woodstox-core-lgpl-4.1.4.jar, stax2-api-3.1.1.jar on the library path.我在 java 1.6 (jdk1.6.0_30) 中用woodstox-core-lgpl-4.1.4.jar, stax2-api-3.1.1.jar 在库路径上尝试了这个。 My java file is this:我的java文件是这样的:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;

import org.codehaus.stax2.XMLStreamReader2;
import org.codehaus.stax2.evt.XMLEvent2;

public class Skip {

    public static void main(String[] args) throws FileNotFoundException,
            XMLStreamException {
        System.setProperty("javax.xml.stream.XMLInputFactory",
                "com.ctc.wstx.stax.WstxInputFactory");
        System.setProperty("javax.xml.stream.XMLOutputFactory",
                "com.ctc.wstx.stax.WstxOutputFactory");
        System.setProperty("javax.xml.stream.XMLEventFactory",
                "com.ctc.wstx.stax.WstxEventFactory");

        FileInputStream fis = new FileInputStream(new File("family.xml"));
        XMLInputFactory xmlif = XMLInputFactory.newFactory();
        XMLStreamReader2 xmlsr = (XMLStreamReader2) xmlif
                .createXMLStreamReader(fis);

        String currentElementName = null;
        while (xmlsr.hasNext()) {

            int eventType = xmlsr.next();

            switch (eventType) {

            case (XMLEvent2.START_ELEMENT):
                currentElementName = xmlsr.getName().toString();

                if ("dad".equals(currentElementName) == true) {
                    System.out.println("isStartElement: "
                            + xmlsr.isStartElement());
                    System.out.println("Element BEGIN: " + currentElementName);
                    xmlsr.skipElement();
                }
                else {
                    System.out.println(currentElementName);
                }

            }
        }
    }
}

Works like a charm.奇迹般有效。 Output is输出是

family
mom
data
isStartElement: true
Element BEGIN: dad

Since Woodstox is a StAX (JSR-173) compliant parser, you could use a StAX StreamFilter to exclude events corresponding to certain elements.由于 Woodstox 是符合 StAX (JSR-173) 的解析器,因此您可以使用 StAX StreamFilter排除与某些元素对应的事件。 I prefer this approach so that you can keep the filtering logic separate from your application logic.我更喜欢这种方法,这样您就可以将过滤逻辑与应用程序逻辑分开。

Demo演示

import javax.xml.stream.*;
import javax.xml.transform.stream.StreamSource;

public class Demo {

    public static void main(String[] args) throws Exception {
        XMLInputFactory xif = XMLInputFactory.newFactory();
        StreamSource xml = new StreamSource("src/forum14326598/input.xml");
        XMLStreamReader xsr = xif.createXMLStreamReader(xml);
        xsr = xif.createFilteredReader(xsr, new StreamFilter() {

            private boolean accept = true;

            @Override
            public boolean accept(XMLStreamReader reader) {
                if((reader.isStartElement() || reader.isEndElement()) && "dad".equals(reader.getLocalName())) {
                    accept = !accept;
                    return false;
                } else {
                    return accept;
                }
            }

        });

        while(xsr.hasNext()) {
            if(xsr.isStartElement()) {
                System.out.println("start: " + xsr.getLocalName());
            } else if(xsr.isCharacters()) {
                if(xsr.getText().trim().length() > 0) {
                    System.out.println("chars: " + xsr.getText());
                }
            } else if(xsr.isEndElement()) {
                System.out.println("end: " + xsr.getLocalName());
            }
            xsr.next();
        }
    }

}

Output输出

start: family
start: mom
start: data
end: data
end: mom
end: family

I've found the reason, why I was getting the IllegalStateException.我找到了我得到 IllegalStateException 的原因。 The very useful was flup's answer.非常有用的是flup的回答。 Thanks a lot.非常感谢。
It is worth to read answer given by Blaise too. Blaise 给出的答案也值得一读。

But getting to the heart of the matter.但要切入问题的核心。 The problem was not skipElement() method itself.问题不在于 skipElement() 方法本身。 The problem was caused becouse of methods used to read attributes.该问题是由用于读取属性的方法引起的。 There are three dots (...) in my question.我的问题中有三个点(...)。 So let's look what was there:那么让我们看看那里有什么:

switch(eventType){

case (XMLEvent2.START_ELEMENT):
    currentElementName = xmlsr.getName().toString();
    logger.info("currentElementName: " + currentElementName);


    if("dad".equals(currentElementName) == true){
        logger.info("isStartElement: " + xmlsr.isStartElement());
        logger.info("Element BEGIN: " + currentElementName);
        xmlsr.skipElement();
    }


    case (XMLEvent2.ATTRIBUTE):
        int attributeCount = xmlsr.getAttributeCount(); 
        ...
        break;


}

Important thing.重要的事情。 There is no break statement for START_ELEMENT. START_ELEMENT 没有中断语句。 So every time START_ELEMENT event occurs the code for event ATTRIBUTE is also executed.因此,每次 START_ELEMENT 事件发生时,也会执行事件 ATTRIBUTE 的代码。 That looks OK according to Java Docs, becouse methods getAttributeCount(), getAttributeValue() etc. can be executed for both START_ELEMENT and ATTRIBUTE.根据 Java Docs,这看起来不错,因为方法 getAttributeCount()、getAttributeValue() 等可以为 START_ELEMENT 和 ATTRIBUTE 执行。

But after calling method skipElement(), event START_ELEMENT is changed to END_ELEMENT.但是在调用方法skipElement() 之后,事件START_ELEMENT 更改为END_ELEMENT。 So calling method getAttributeCount() is not allowed.因此不允许调用方法 getAttributeCount()。 This call is the reason why IllegalStateException is thrown.此调用是引发 IllegalStateException 的原因。

The simplest way to avoid that Exception is just calling break statement after calling skipElement() method.避免该异常的最简单方法是在调用 skipElement() 方法后调用 break 语句。 In that case code for getting attributes will not be executed, thus Exception will not be thrown.在这种情况下,获取属性的代码将不会被执行,因此不会抛出异常。

        if("dad".equals(currentElementName) == true){
            logger.info("isStartElement: " + xmlsr.isStartElement());
            logger.info("Element BEGIN: " + currentElementName);
            xmlsr.skipElement();
            break;                  //the cure for IllegalStateException
        }

I'm sorry I gave you no chance to answer my original question becouse of to much code hidden.很抱歉,由于隐藏了太多代码,我没有机会回答我原来的问题。

It looks like the method xmlsr.skipElement() is the one that must consume the XMLEvent2.START_ELEMENT event.看起来方法 xmlsr.skipElement() 是必须使用 XMLEvent2.START_ELEMENT 事件的方法。 And since you already consumed it (xmlsr.next()), that method throws you an error.而且由于您已经使用了它(xmlsr.next()),因此该方法会引发错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM