[英]Using StAX to read all text elements
無論其中有什么標簽,我都需要解析一個xml文件,並讀取其所有葉子的文本(僅文本元素)。 我正在使用StAX,但似乎無法提前知道一個元素僅是文本(因此getElementText拋出一個異常,表示不離開元素)。 因此,我決定使用過濾器,僅過濾標記元素,並以這種方式迭代拋出文檔:
InputStream in = null;
try {
in = new FileInputStream("file.xml");
DatiEstratti de = DatiEstratti.getInstance();
// Processamento ad eventi
XMLInputFactory factory = (XMLInputFactory) XMLInputFactory.newInstance();
XMLEventReader eventReader = factory.createXMLEventReader(in);
// usa il filtro per filtrare solo i tag element
eventReader = factory.createFilteredReader(eventReader, new ElementOnlyFilter());
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
if (event.getEventType() == XMLStreamConstants.START_ELEMENT) {
StartElement startElement = event.asStartElement();
XMLEvent peekEvent = eventReader.peek();
if(peekEvent.isEndElement()){
// questa è la prima volta che viene fatto un pop
// quindi è una foglia.
// recupera il dato.
String value = eventReader.getElementText();
logger.info("dato : " + value);
}
String nome = startElement.getName().getLocalPart();
String prefix = startElement.getName().getPrefix();
if (prefix != null) {
nome = prefix + ":" + nome;
}
de.push(nome);
logger.info("push : " + de.stampaPercorso());
} else if ((event.getEventType() == XMLStreamConstants.END_ELEMENT)) {
de.pop();
logger.info("pop : " + de.stampaPercorso());
if (0 > de.nLivelliPercorso()) {
break;
}
}
//handle more event types here...
}
...過濾器在哪里:
public class ElementOnlyFilter implements EventFilter, StreamFilter {
/* implementation of EventFilter interface */
@Override
public boolean accept(XMLEvent event) {
return acceptInternal(event.getEventType( ));
}
/* implementation of StreamFilter interface */
@Override
public boolean accept(XMLStreamReader reader) {
return acceptInternal(reader.getEventType( ));
}
/* internal utility method */
private boolean acceptInternal(int eventType) {
return eventType == XMLStreamConstants.START_ELEMENT
|| eventType == XMLStreamConstants.END_ELEMENT;
}
}
問題是當我請假時遇到了以下異常:
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[3,42]
Message: parser must be on START_ELEMENT to read next text
at com.sun.xml.internal.stream.XMLEventReaderImpl.getElementText(XMLEventReaderImpl.java:114)
at javax.xml.stream.util.EventReaderDelegate.getElementText(EventReaderDelegate.java:88)
at xmlparser.XmlParser.main(XmlParser.java:63)
我想知道方式。 這段代碼有問題嗎? 我認為peek()不會更改讀者,因此getElementText()應該由開始元素調用。 還有另一種方法可以實現我的目標嗎?
首先,如果您過濾以僅包括開始和結束元素事件,那么您將根本看不到葉節點內包含的文本。 我將使用未經過濾的流的不同方法,如下所示:
XMLEventReader eventReader = factory.createXMLEventReader(in);
StringBuilder content = null;
while(eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
if(event.isStartElement()) {
// other start element processing here
content = new StringBuilder();
} else if(event.isEndElement()) {
if(content != null) {
// this was a leaf element
String leafText = content.toString();
// do something with the leaf node
} else {
// not a leaf
}
// in all cases, discard content
content = null;
} else if(event.isCharacters()) {
if(content != null) {
content.append(event.asCharacters().getData());
}
}
// other event types here
}
訣竅是在end元素部分的結尾處content = null
if(event.isEndElement())
在if(event.isEndElement())
塊的if(event.isEndElement())
如果content
為非null,則您知道在此元素和元素之間沒有中間的end元素事件其相應的開始標記,即它是一個葉節點。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.