简体   繁体   English

stax - 获取 xml 节点作为字符串

[英]stax - get xml node as string

xml looks like so: xml 看起来像这样:

<statements>
   <statement account="123">
      ...stuff...
   </statement>
   <statement account="456">
      ...stuff...
   </statement>
</statements>

I'm using stax to process one " <statement> " at a time and I got that working.我正在使用 stax 一次处理一个“ <statement> ”,并且我得到了它。 I need to get that entire statement node as a string so I can create "123.xml" and "456.xml" or maybe even load it into a database table indexed by account.我需要将整个语句节点作为字符串获取,以便我可以创建“123.xml”和“456.xml”,或者甚至将其加载到按帐户索引的数据库表中。

using this approach: http://www.devx.com/Java/Article/30298/1954使用这种方法: http : //www.devx.com/Java/Article/30298/1954

I'm looking to do something like this:我正在做这样的事情:

String statementXml = staxXmlReader.getNodeByName("statement");

//load statementXml into database

I had a similar task and although the original question is older than a year, I couldn't find a satisfying answer. 我有一个类似的任务,虽然最初的问题超过一年,但我找不到令人满意的答案。 The most interesting answer up to now was Blaise Doughan's answer, but I couldn't get it running on the XML I am expecting (maybe some parameters for the underlying parser could change that?). 到目前为止最有趣的答案是Blaise Doughan的答案,但我无法让它在我期望的XML上运行(可能底层解析器的一些参数可能会改变它?)。 Here the XML, very simplyfied: 这里的XML非常简单:

<many-many-tags>
    <description>
        ...
        <p>Lorem ipsum...</p>
        Devils inside...
        ...
    </description>
</many-many-tags>

My solution: 我的解决方案

public static String readElementBody(XMLEventReader eventReader)
    throws XMLStreamException {
    StringWriter buf = new StringWriter(1024);

    int depth = 0;
    while (eventReader.hasNext()) {
        // peek event
        XMLEvent xmlEvent = eventReader.peek();

        if (xmlEvent.isStartElement()) {
            ++depth;
        }
        else if (xmlEvent.isEndElement()) {
            --depth;

            // reached END_ELEMENT tag?
            // break loop, leave event in stream
            if (depth < 0)
                break;
        }

        // consume event
        xmlEvent = eventReader.nextEvent();

        // print out event
        xmlEvent.writeAsEncodedUnicode(buf);
    }

    return buf.getBuffer().toString();
}

Usage example: 用法示例:

XMLEventReader eventReader = ...;
while (eventReader.hasNext()) {
    XMLEvent xmlEvent = eventReader.nextEvent();
    if (xmlEvent.isStartElement()) {
        StartElement elem = xmlEvent.asStartElement();
        String name = elem.getName().getLocalPart();

        if ("DESCRIPTION".equals(name)) {
            String xmlFragment = readElementBody(eventReader);
            // do something with it...
            System.out.println("'" + fragment + "'");
        }
    }
    else if (xmlEvent.isEndElement()) {
        // ...
    }
}

Note that the extracted XML fragment will contain the complete extracted body content, including white space and comments. 请注意,提取的XML片段将包含完整的提取的正文内容,包括空格和注释。 Filtering those on demand, or making the buffer size parametrizable have been left out for code brevity: 为了简洁起见,省略了按需过滤或使缓冲区大小可参数化的问题:

'
    <description>
        ...
        <p>Lorem ipsum...</p>
        Devils inside...
        ...
    </description>
    '

You can use StAX for this. 你可以使用StAX。 You just need to advance the XMLStreamReader to the start element for statement. 您只需要将XMLStreamReader推进到start元素for语句。 Check the account attribute to get the file name. 检查帐户属性以获取文件名。 Then use the javax.xml.transform APIs to transform the StAXSource to a StreamResult wrapping a File. 然后使用javax.xml.transform API将StAXSource转换为包装文件的StreamResult。 This will advance the XMLStreamReader and then just repeat this process. 这将推进XMLStreamReader,然后重复此过程。

import java.io.File;
import java.io.FileReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;

public class Demo {

    public static void main(String[] args) throws Exception  {
        XMLInputFactory xif = XMLInputFactory.newInstance();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
        xsr.nextTag(); // Advance to statements element

        while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
            TransformerFactory tf = TransformerFactory.newInstance();
            Transformer t = tf.newTransformer();
            File file = new File("out" + xsr.getAttributeValue(null, "account") + ".xml");
            t.transform(new StAXSource(xsr), new StreamResult(file));
        }
    }

}

Stax is a low-level access API, and it does not have either lookups or methods that access content recursively. Stax是一种低级访问API,它没有查询或递归访问内容的方法。 But what you actually trying to do? 但你真正想做什么? And why are you considering Stax? 你为什么要考虑Stax?

Beyond using a tree model (DOM, XOM, JDOM, Dom4j), which would work well with XPath, best choice when dealing with data is usually data binding library like JAXB. 除了使用适用于XPath的树模型(DOM,XOM,JDOM,Dom4j)之外,处理数据时的最佳选择通常是数据绑定库,如JAXB。 With it you can pass Stax or SAX reader and ask it to bind xml data into Java beans and instead of messing with xml process Java objects. 有了它,您可以传递Stax或SAX读取器并要求它将xml数据绑定到Java bean中,而不是弄乱xml进程Java对象。 This is often more convenient, and it is usually quite performance. 这通常更方便,而且通常性能相当。 Only trick with larger files is that you do not want to bind the whole thing at once, but rather bind each sub-tree (in your case, one 'statement' at a time). 只有较大文件的技巧是你不想一次绑定整个事物,而是绑定每个子树(在你的情况下,一次一个'语句')。 This is easiest done by iterating Stax XmlStreamReader, then using JAXB to bind. 这是通过迭代Stax XmlStreamReader,然后使用JAXB进行绑定来完成的。

I've been googling and this seems painfully difficult. 我一直在谷歌搜索,这似乎很难。

given my xml I think it might just be simpler to: 鉴于我的xml,我认为它可能更简单:

StringBuilder buffer = new StringBuilder();
for each line in file {
   buffer.append(line)
   if(line.equals(STMT_END_TAG)){
      parse(buffer.toString())
      buffer.delete(0,buffer.length)
   }
 }

 private void parse(String statement){
    //saxParser.parse( new InputSource( new StringReader( xmlText ) );
    // do stuff
    // save string
 }

Why not just use xpath for this? 为什么不直接使用xpath呢?

You could have a fairly simple xpath to get all 'statement' nodes. 你可以有一个相当简单的xpath来获取所有'statement'节点。

Like so: 像这样:

//statement

EDIT #1: If possible, take a look at dom4j . 编辑#1:如果可能的话,看看dom4j You could read the String and get all 'statement' nodes fairly simply. 您可以读取字符串并相当简单地获取所有“语句”节点。

EDIT #2: Using dom4j, this is how you would do it: (from their cookbook) 编辑#2:使用dom4j,你就是这样做的:(来自他们的食谱)

String text = "your xml here";
Document document = DocumentHelper.parseText(text);

public void bar(Document document) {
   List list = document.selectNodes( "//statement" );
   // loop through node data
}

I had the similar problem and found the solution.我遇到了类似的问题并找到了解决方案。 I used the solution proposed by @t0r0X but it does not work well in the current implementation in Java 11, the method xmlEvent.writeAsEncodedUnicode creates the invalid string representation of the start element (in the StartElementEvent class) in the result XML fragment, so I had to modify it, but then it seems to work well, what I could immediatelly verify by the parsing of the fragment by DOM and JaxBMarshaller to specific data containers.我使用了@t0r0X 提出的解决方案,但它在 Java 11 的当前实现中效果xmlEvent.writeAsEncodedUnicode ,方法xmlEvent.writeAsEncodedUnicode在结果 XML 片段中创建了起始元素(在StartElementEvent类中)的无效字符串表示,所以我不得不修改它,但它似乎工作得很好,我可以通过 DOM 和 JaxBMarshaller 将片段解析为特定的数据容器来立即验证。

In my case I had the huge structure就我而言,我有巨大的结构

<Orders>
   <ns2:SyncOrder xmlns:ns2="..." xmlns:ns3="....." ....>
      .....
   </ns2:SyncOrder>
   <ns2:SyncOrder xmlns:ns2="..." xmlns:ns3="....." ....>
      .....
   </ns2:SyncOrder>
   ...
</Orders>

in the file of multiple hundred megabytes (a lot of repeating "SyncOrder" structures), so the usage of DOM would lead to a large memory consumption and slow evaluation.在数百兆的文件中(很多重复的“SyncOrder”结构),因此使用DOM会导致大量内存消耗和缓慢评估。 Therefore I used the StAX to split the huge XML to smaller XML pieces, which I have analyzed with DOM and used the JaxbElements generated from the xsd definition of the element SyncOrder (This infrastructure I had from the webservice, which uses the same structure, but it is not important).因此,我使用 StAX 将巨大的 XML 拆分为较小的 XML 片段,我使用 DOM 对其进行了分析,并使用了从元素SyncOrder的 xsd 定义生成的SyncOrder (我从 webservice 获得的这个基础结构,它使用相同的结构,但是这不重要)。

In this code there can be seen Where the XML fragment has een created and could be used, I used it directly in other processing...在这段代码中,可以看到 XML 片段在哪里创建并可以使用,我直接在其他处理中使用了它......

private static <T> List<T> unmarshallMultipleSyncOrderXmlData(
        InputStream aOrdersXmlContainingSyncOrderItems,
        Function<SyncOrderType, T> aConversionFunction) throws XMLStreamException, ParserConfigurationException, IOException, SAXException {

    DocumentBuilderFactory locDocumentBuilderFactory = DocumentBuilderFactory.newInstance();
    locDocumentBuilderFactory.setNamespaceAware(true);
    DocumentBuilder locDocBuilder = locDocumentBuilderFactory.newDocumentBuilder();

    List<T> locResult = new ArrayList<>();
    XMLInputFactory locFactory = XMLInputFactory.newFactory();
    XMLEventReader locReader = locFactory.createXMLEventReader(aOrdersXmlContainingSyncOrderItems);

    boolean locIsInSyncOrder = false;
    QName locSyncOrderElementQName = null;
    StringWriter locXmlTextBuffer = new StringWriter();
    int locDepth = 0;
    while (locReader.hasNext()) {

        XMLEvent locEvent = locReader.nextEvent();

        if (locEvent.isStartElement()) {
            if (locDepth == 0 && Objects.equals(locEvent.asStartElement().getName().getLocalPart(), "Orders")) {
                locDepth++;
            } else {
                if (locDepth <= 0)
                    throw new IllegalStateException("There has been passed invalid XML stream intot he function. "
                                                                                    + "Expecting the element 'Orders' as the root alament of the document, but found was '"
                                                                                    + locEvent.asStartElement().getName().getLocalPart() + "'.");
                locDepth++;
                if (locSyncOrderElementQName == null) {
                    /* First element after the "Orders" has passed, so we retrieve
                     * the name of the element with the namespace prefix: */
                    locSyncOrderElementQName = locEvent.asStartElement().getName();
                }
                if(Objects.equals(locEvent.asStartElement().getName(), locSyncOrderElementQName)) {
                    locIsInSyncOrder = true;
                }
            }
        } else if (locEvent.isEndElement()) {
            locDepth--;
            if(locDepth == 1 && Objects.equals(locEvent.asEndElement().getName(), locSyncOrderElementQName)) {
                locEvent.writeAsEncodedUnicode(locXmlTextBuffer);
                /* at this moment the call of locXmlTextBuffer.toString() gets the complete fragment 
                 * of XML containing the valid SyncOrder element, but I have continued to other processing,
                 * which immediatelly validates the produced XML fragment is valid and passes the values 
                 * to communication object: */
                Document locDocument = locDocBuilder.parse(new ByteArrayInputStream(locXmlTextBuffer.toString().getBytes()));
                SyncOrderType locItem = unmarshallSyncOrderDomNodeToCo(locDocument);
                locResult.add(aConversionFunction.apply(locItem));
                locXmlTextBuffer = new StringWriter();
                locIsInSyncOrder = false;
            }
        }
        if (locIsInSyncOrder) {
            if (locEvent.isStartElement()) {
                /* here replaced the standard implementation of startElement's method writeAsEncodedUnicode: */ 
                locXmlTextBuffer.write(startElementToStrng(locEvent.asStartElement()));
            } else {
                locEvent.writeAsEncodedUnicode(locXmlTextBuffer);
            }
        }
    }
    return locResult;
}

private static String startElementToStrng(StartElement aStartElement) {

    StringBuilder locStartElementBuffer = new StringBuilder();

    // open element
    locStartElementBuffer.append("<");
    String locNameAsString = null;
    if ("".equals(aStartElement.getName().getNamespaceURI())) {
        locNameAsString = aStartElement.getName().getLocalPart();
    } else if (aStartElement.getName().getPrefix() != null
            && !"".equals(aStartElement.getName().getPrefix())) {
        locNameAsString = aStartElement.getName().getPrefix()
                + ":" + aStartElement.getName().getLocalPart();
    } else {
        locNameAsString = aStartElement.getName().getLocalPart();
    }

    locStartElementBuffer.append(locNameAsString);

    // add any attributes
    Iterator<Attribute> locAttributeIterator = aStartElement.getAttributes();
    Attribute attr;
    while (locAttributeIterator.hasNext()) {
        attr = locAttributeIterator.next();
        locStartElementBuffer.append(" ");
        locStartElementBuffer.append(attr.toString());
    }

    // add any namespaces
    Iterator<Namespace> locNamespaceIterator = aStartElement.getNamespaces();
    Namespace locNamespace;
    while (locNamespaceIterator.hasNext()) {
        locNamespace = locNamespaceIterator.next();
        locStartElementBuffer.append(" ");
        locStartElementBuffer.append(locNamespace.toString());
    }

    // close start tag
    locStartElementBuffer.append(">");

    // return StartElement as a String
    return locStartElementBuffer.toString();
}

public static SyncOrderType unmarshallSyncOrderDomNodeToCo(
        Node aSyncOrderItemNode) {
    Source locSource = new DOMSource(aSyncOrderItemNode);
    Object locUnmarshalledObject = getMarshallerAndUnmarshaller().unmarshal(locSource);
    SyncOrderType locCo = ((JAXBElement<SyncOrderType>) locUnmarshalledObject).getValue();
    return locCo;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM