简体   繁体   English

使用Java将XML文件解析为DOM时如何跳过某些元素

[英]How to skip certain elements when parsing XML file to DOM with Java

I am attempting to parse some XML documents into DOM so that I can run XPath queries against it. 我试图将一些XML文档解析为DOM,以便可以对它运行XPath查询。 My code is in Java and have been using the Xerces org.apache.xerces.parsers.DOMParser implementation. 我的代码使用Java,并且一直在使用Xerces org.apache.xerces.parsers.DOMParser实现。

I am only interested in certain portions of the XML, under element elementICareAbout and can ignore other elements. 我只对XML的某些部分感兴趣,位于elementICareAbout元素下,可以忽略其他元素。

<top>
   <elementICareAbout>...</elementICareAbout>
   <elementToIgnore>...</elementToIgnore>
</top>

The XML file size can be quite large, and I would not like to have to hold onto elements in memory which I would not need as part of the processing, where I would expect an XPath query to /top/elementICareAbout to return data, but /top/elementToIgnore would just return nothing (as I don't need it to). XML文件的大小可能非常大,我不想保留在处理中不需要的内存中的元素,我希望在该元素中对/ top / elementICareAbout的XPath查询可以返回数据,但是/ top / elementToIgnore只会返回任何内容(因为我不需要它)。

Looking over the Xerces DOMParser or the JAXP APIs I don't see any kind of way to explicitly ignore certain elements so that they are not part of the DOM tree in memory after parsed? 查看Xerces DOMParser或JAXP API,我看不到任何方式来显式忽略某些元素,以便它们在解析后不成为内存中DOM树的一部分吗?

Is there a good way to construct a partial DOM Document from an XML file tailored to the parts that I need? 是否有一种很好的方法,可以根据我需要的部分从XML文件构造部分DOM文档?

You could write a SAX filter and insert it into the processing pipeline between the (SAX) parser and the document builder. 您可以编写一个SAX筛选器,并将其插入(SAX)解析器和文档构建器之间的处理管道中。 Or with rather less coding you could write an XSLT 3.0 streaming transformation. 或者用更少的代码,您可以编写XSLT 3.0流转换。 Or you could write an XQuery to select the parts of the document you want, and run it using a query processor that supports document projection. 或者,您可以编写XQuery来选择所需文档的各个部分,然后使用支持文档投影的查询处理器来运行它。 It all depends how wedded you are to Java/DOM coding - my preference would be for higher-level languages than that. 这完全取决于您对Java / DOM编码的偏好-我更喜欢使用高级语言。

You can also get the element by tagname. 您还可以通过标记名获取元素。 For example, if you have a xml files call Question.xml. 例如,如果您有一个xml文件,请调用Question.xml。 Question.xml In the java file, you can do the following: Question.xml在Java文件中,您可以执行以下操作:

      DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
       DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
       InputSource is = new InputSource(new StringReader(responseString));
       Document doc = dBuilder.parse(is);

       doc.getDocumentElement().normalize();

       NodeList nList = doc.getElementsByTagName("Question");

       //get all lessons stored
       for (int temp = 0; temp < nList.getLength(); temp++) {

           Node nNode = nList.item(temp);

           if (nNode.getNodeType() == Node.ELEMENT_NODE) {

               Element eElement = (Element) nNode;

               //Looking through elements by tagname
               String q1 = eElement.getElementsByTagName("q1").item(0).getTextContent();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM