简体   繁体   中英

xpath in java to extract all the xml elements

could any one provide an example of extracting all the elements with their attributes and values from an xml file using xpath in java?

Thanks

I wrote this few years back for my team. Would be helpful.

What is an xPath?

  1. XPath is a language for finding information in an XML document.
  2. XPath is a syntax for defining parts of an XML document.
  3. XPath uses path expressions to navigate in XML documents.
  4. XPath contains a library of standard functions.
  5. XPath is a major element in XSLT.
  6. XPath is a W3C recommendation.

In XPath, there are seven kinds of nodes: element, attribute, text, name-space, processing-instruction, comment, and document (root) nodes. XML documents are treated as trees of nodes. The root of the tree is called the document node (or root node).

Consider the following Xml document.

<information>
    <person id="1">
        <name>Tito George</name>
        <age>25</age>
        <gender>Male</gender>
        <dob>
             <date>25</date>
             <month>october</month>
             <year>1983</year>
        </dob>
    </person>


     <person id="2">
        <name>Kumar</name>
        <age>32</age>
        <gender>Male</gender>
        <dob>
             <date>28</date>
             <month>january</month>
             <year>1975</year>
        </dob>
    </person>


    <person id="3">
        <name>Deepali</name>
        <age>25</age>
        <gender>Female</gender>
        <dob>
             <date>17</date>
             <month>january</month>
             <year>1988</year>
        </dob>
    </person>

</information>

Getting information from the Document

DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
//Getting the instance of DocumentBuilderFactory 
domFactory.setNamespaceAware(true);
//true if the parser produced will provide support for XML namespaces; 
DocumentBuilder builder = domFactory.newDocumentBuilder();
//Creating document builder
Document doc = builder.parse("C:\\JavaTestFiles\\persons.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
//getting instance of xPath
expr = xpath.compile("//@id");
result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
 for (int i = 0; i < nodes.getLength(); i++) {
     System.out.println(nodes.item(i).getNodeValue());
}

The line above in red is the one which is used for compiling xPath expression and //@id is the actual expression. The expression //@id will return and the values of attribute id in the document. ie. out put of the program will be 1 2 and 3. In the below table you can find the various expressions that can be used in this document.

Two important statements in the above code snippet is

  • expr = xpath.compile("//@id"); --> This one compiles the expression. if not compilable this method will throw XPathExpressionException.
  • expr.evaluate(doc, XPathConstants.NODESET); --> Evaluate an XPath expression in the specified context and return the result as the specified type. In this the second argument defines what the method is going to return(returnType ). If returnType is not one of the types defined in XPathConstants ( NUMBER, STRING, BOOLEAN, NODE or NODESET) then an IllegalArgumentException is thrown.

Basically: An XML document is a tree-structured (hierarchical) collection of nodes. As with a hierarchical directory structure, it is useful to specify a path that points to a particular node in the hierarchy (hence the name of the specification: XPath).

In fact, much of the notation of directory paths is carried over intact:

  • The forward slash (/) is used as a path separator.
  • An absolute path from the root of the document starts with a /.
  • A relative path from a given location starts with anything else.
  • A double period (..) indicates the parent of the current node.
  • A single period (.) indicates the current node.

Information

  • //@id --> Selects all attributes that are named id
  • //@* --> Selects all attribute node in the document
  • //@id='1' --> Tests if the node with attribute id = '1' is present in the document. if present the statement will evaluate to true. In this case XPathConstants.BOOLEAN should be used as the return type in evaluate method.
  • /information/person [age='24']name/text() or
    //person[age='24'] name/text() --> Returns 'Kumar'.. Let us split the query first: /information/person[age='24']/name/text() Part 1: Searches for the node 'person' which is having element 'age' = 24 Part 2: Get the element 'name' of that node Part 3: text() -- is an xPath function that will return the text node of the element 'name' Note: Here, information is the root node, if we are starting from the root node one slash is enough, ie it is an absolute path. if we are starting from child node use have to use double slash '//' ie it is a relative path.
  • //person/dob[year>'1978'][year<1985]/../name/text() --> This expression is searching for persons whose YOB is in between 1978 and 1985. Check the text marked in red. This is because element year is not a direct child for person rather it is a sibling or in other words direct parent of year is node. So we need to go one level up for getting element 'name'.
  • //person/dob[year>'1978'][year<1985]/../@id --> This will return the id of the node which satisfies the above condition. Note: No need to call text() method for getting the attribute values
  • //person[age='25']//dob[date=25]/../name/text() --> This expression will return the name of the person whose age = 25 and date = 25.
  • /information/person[1] /name/text() Searches for the name of the first person node.
  • /information/person/ dob/child:: /text() --> This will return all the child nodes of dob. We can also write this like child::information/child::person/child::dob/child:: /text()

Use this XPath expression "//*" in this way

Document doc = ... // the document on which apply XPath
XPathExpression xp = XPathFactory.newInstance().newXPath().compile("//*");
NodeList elements = (NodeList) xp.evaluate(doc, XPathConstants.NODESET);

It return to you all the elements at any level.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM