简体   繁体   中英

stax - get xml node as string

xml looks like so:

<statements>
   <statement account="123">
      ...stuff...
   </statement>
   <statement account="456">
      ...stuff...
   </statement>
</statements>

I'm using stax to process one " <statement> " at a time and I got that working. I need to get that entire statement node as a string so I can create "123.xml" and "456.xml" or maybe even load it into a database table indexed by account.

using this approach: http://www.devx.com/Java/Article/30298/1954

I'm looking to do something like this:

String statementXml = staxXmlReader.getNodeByName("statement");

//load statementXml into database

I had a similar task and although the original question is older than a year, I couldn't find a satisfying answer. The most interesting answer up to now was Blaise Doughan's answer, but I couldn't get it running on the XML I am expecting (maybe some parameters for the underlying parser could change that?). Here the XML, very simplyfied:

<many-many-tags>
    <description>
        ...
        <p>Lorem ipsum...</p>
        Devils inside...
        ...
    </description>
</many-many-tags>

My solution:

public static String readElementBody(XMLEventReader eventReader)
    throws XMLStreamException {
    StringWriter buf = new StringWriter(1024);

    int depth = 0;
    while (eventReader.hasNext()) {
        // peek event
        XMLEvent xmlEvent = eventReader.peek();

        if (xmlEvent.isStartElement()) {
            ++depth;
        }
        else if (xmlEvent.isEndElement()) {
            --depth;

            // reached END_ELEMENT tag?
            // break loop, leave event in stream
            if (depth < 0)
                break;
        }

        // consume event
        xmlEvent = eventReader.nextEvent();

        // print out event
        xmlEvent.writeAsEncodedUnicode(buf);
    }

    return buf.getBuffer().toString();
}

Usage example:

XMLEventReader eventReader = ...;
while (eventReader.hasNext()) {
    XMLEvent xmlEvent = eventReader.nextEvent();
    if (xmlEvent.isStartElement()) {
        StartElement elem = xmlEvent.asStartElement();
        String name = elem.getName().getLocalPart();

        if ("DESCRIPTION".equals(name)) {
            String xmlFragment = readElementBody(eventReader);
            // do something with it...
            System.out.println("'" + fragment + "'");
        }
    }
    else if (xmlEvent.isEndElement()) {
        // ...
    }
}

Note that the extracted XML fragment will contain the complete extracted body content, including white space and comments. Filtering those on demand, or making the buffer size parametrizable have been left out for code brevity:

'
    <description>
        ...
        <p>Lorem ipsum...</p>
        Devils inside...
        ...
    </description>
    '

You can use StAX for this. You just need to advance the XMLStreamReader to the start element for statement. Check the account attribute to get the file name. Then use the javax.xml.transform APIs to transform the StAXSource to a StreamResult wrapping a File. This will advance the XMLStreamReader and then just repeat this process.

import java.io.File;
import java.io.FileReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;

public class Demo {

    public static void main(String[] args) throws Exception  {
        XMLInputFactory xif = XMLInputFactory.newInstance();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
        xsr.nextTag(); // Advance to statements element

        while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
            TransformerFactory tf = TransformerFactory.newInstance();
            Transformer t = tf.newTransformer();
            File file = new File("out" + xsr.getAttributeValue(null, "account") + ".xml");
            t.transform(new StAXSource(xsr), new StreamResult(file));
        }
    }

}

Stax is a low-level access API, and it does not have either lookups or methods that access content recursively. But what you actually trying to do? And why are you considering Stax?

Beyond using a tree model (DOM, XOM, JDOM, Dom4j), which would work well with XPath, best choice when dealing with data is usually data binding library like JAXB. With it you can pass Stax or SAX reader and ask it to bind xml data into Java beans and instead of messing with xml process Java objects. This is often more convenient, and it is usually quite performance. Only trick with larger files is that you do not want to bind the whole thing at once, but rather bind each sub-tree (in your case, one 'statement' at a time). This is easiest done by iterating Stax XmlStreamReader, then using JAXB to bind.

I've been googling and this seems painfully difficult.

given my xml I think it might just be simpler to:

StringBuilder buffer = new StringBuilder();
for each line in file {
   buffer.append(line)
   if(line.equals(STMT_END_TAG)){
      parse(buffer.toString())
      buffer.delete(0,buffer.length)
   }
 }

 private void parse(String statement){
    //saxParser.parse( new InputSource( new StringReader( xmlText ) );
    // do stuff
    // save string
 }

Why not just use xpath for this?

You could have a fairly simple xpath to get all 'statement' nodes.

Like so:

//statement

EDIT #1: If possible, take a look at dom4j . You could read the String and get all 'statement' nodes fairly simply.

EDIT #2: Using dom4j, this is how you would do it: (from their cookbook)

String text = "your xml here";
Document document = DocumentHelper.parseText(text);

public void bar(Document document) {
   List list = document.selectNodes( "//statement" );
   // loop through node data
}

I had the similar problem and found the solution. I used the solution proposed by @t0r0X but it does not work well in the current implementation in Java 11, the method xmlEvent.writeAsEncodedUnicode creates the invalid string representation of the start element (in the StartElementEvent class) in the result XML fragment, so I had to modify it, but then it seems to work well, what I could immediatelly verify by the parsing of the fragment by DOM and JaxBMarshaller to specific data containers.

In my case I had the huge structure

<Orders>
   <ns2:SyncOrder xmlns:ns2="..." xmlns:ns3="....." ....>
      .....
   </ns2:SyncOrder>
   <ns2:SyncOrder xmlns:ns2="..." xmlns:ns3="....." ....>
      .....
   </ns2:SyncOrder>
   ...
</Orders>

in the file of multiple hundred megabytes (a lot of repeating "SyncOrder" structures), so the usage of DOM would lead to a large memory consumption and slow evaluation. Therefore I used the StAX to split the huge XML to smaller XML pieces, which I have analyzed with DOM and used the JaxbElements generated from the xsd definition of the element SyncOrder (This infrastructure I had from the webservice, which uses the same structure, but it is not important).

In this code there can be seen Where the XML fragment has een created and could be used, I used it directly in other processing...

private static <T> List<T> unmarshallMultipleSyncOrderXmlData(
        InputStream aOrdersXmlContainingSyncOrderItems,
        Function<SyncOrderType, T> aConversionFunction) throws XMLStreamException, ParserConfigurationException, IOException, SAXException {

    DocumentBuilderFactory locDocumentBuilderFactory = DocumentBuilderFactory.newInstance();
    locDocumentBuilderFactory.setNamespaceAware(true);
    DocumentBuilder locDocBuilder = locDocumentBuilderFactory.newDocumentBuilder();

    List<T> locResult = new ArrayList<>();
    XMLInputFactory locFactory = XMLInputFactory.newFactory();
    XMLEventReader locReader = locFactory.createXMLEventReader(aOrdersXmlContainingSyncOrderItems);

    boolean locIsInSyncOrder = false;
    QName locSyncOrderElementQName = null;
    StringWriter locXmlTextBuffer = new StringWriter();
    int locDepth = 0;
    while (locReader.hasNext()) {

        XMLEvent locEvent = locReader.nextEvent();

        if (locEvent.isStartElement()) {
            if (locDepth == 0 && Objects.equals(locEvent.asStartElement().getName().getLocalPart(), "Orders")) {
                locDepth++;
            } else {
                if (locDepth <= 0)
                    throw new IllegalStateException("There has been passed invalid XML stream intot he function. "
                                                                                    + "Expecting the element 'Orders' as the root alament of the document, but found was '"
                                                                                    + locEvent.asStartElement().getName().getLocalPart() + "'.");
                locDepth++;
                if (locSyncOrderElementQName == null) {
                    /* First element after the "Orders" has passed, so we retrieve
                     * the name of the element with the namespace prefix: */
                    locSyncOrderElementQName = locEvent.asStartElement().getName();
                }
                if(Objects.equals(locEvent.asStartElement().getName(), locSyncOrderElementQName)) {
                    locIsInSyncOrder = true;
                }
            }
        } else if (locEvent.isEndElement()) {
            locDepth--;
            if(locDepth == 1 && Objects.equals(locEvent.asEndElement().getName(), locSyncOrderElementQName)) {
                locEvent.writeAsEncodedUnicode(locXmlTextBuffer);
                /* at this moment the call of locXmlTextBuffer.toString() gets the complete fragment 
                 * of XML containing the valid SyncOrder element, but I have continued to other processing,
                 * which immediatelly validates the produced XML fragment is valid and passes the values 
                 * to communication object: */
                Document locDocument = locDocBuilder.parse(new ByteArrayInputStream(locXmlTextBuffer.toString().getBytes()));
                SyncOrderType locItem = unmarshallSyncOrderDomNodeToCo(locDocument);
                locResult.add(aConversionFunction.apply(locItem));
                locXmlTextBuffer = new StringWriter();
                locIsInSyncOrder = false;
            }
        }
        if (locIsInSyncOrder) {
            if (locEvent.isStartElement()) {
                /* here replaced the standard implementation of startElement's method writeAsEncodedUnicode: */ 
                locXmlTextBuffer.write(startElementToStrng(locEvent.asStartElement()));
            } else {
                locEvent.writeAsEncodedUnicode(locXmlTextBuffer);
            }
        }
    }
    return locResult;
}

private static String startElementToStrng(StartElement aStartElement) {

    StringBuilder locStartElementBuffer = new StringBuilder();

    // open element
    locStartElementBuffer.append("<");
    String locNameAsString = null;
    if ("".equals(aStartElement.getName().getNamespaceURI())) {
        locNameAsString = aStartElement.getName().getLocalPart();
    } else if (aStartElement.getName().getPrefix() != null
            && !"".equals(aStartElement.getName().getPrefix())) {
        locNameAsString = aStartElement.getName().getPrefix()
                + ":" + aStartElement.getName().getLocalPart();
    } else {
        locNameAsString = aStartElement.getName().getLocalPart();
    }

    locStartElementBuffer.append(locNameAsString);

    // add any attributes
    Iterator<Attribute> locAttributeIterator = aStartElement.getAttributes();
    Attribute attr;
    while (locAttributeIterator.hasNext()) {
        attr = locAttributeIterator.next();
        locStartElementBuffer.append(" ");
        locStartElementBuffer.append(attr.toString());
    }

    // add any namespaces
    Iterator<Namespace> locNamespaceIterator = aStartElement.getNamespaces();
    Namespace locNamespace;
    while (locNamespaceIterator.hasNext()) {
        locNamespace = locNamespaceIterator.next();
        locStartElementBuffer.append(" ");
        locStartElementBuffer.append(locNamespace.toString());
    }

    // close start tag
    locStartElementBuffer.append(">");

    // return StartElement as a String
    return locStartElementBuffer.toString();
}

public static SyncOrderType unmarshallSyncOrderDomNodeToCo(
        Node aSyncOrderItemNode) {
    Source locSource = new DOMSource(aSyncOrderItemNode);
    Object locUnmarshalledObject = getMarshallerAndUnmarshaller().unmarshal(locSource);
    SyncOrderType locCo = ((JAXBElement<SyncOrderType>) locUnmarshalledObject).getValue();
    return locCo;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM