简体   繁体   中英

Java XML JDOM2 XPath - Read text value from XML attribute and element using XPath expression

The program should be allowed to read from an XML file using XPath expressions. I already started the project using JDOM2, switching to another API is unwanted. The difficulty is, that the program does not know beforehand if it has to read an element or an attribute. Does the API provide any function to receive the content (string) just by giving it the XPath expression? From what I know about XPath in JDOM2, it uses objects of different types to evaluate XPath expressions pointing to attributes or elements. I am only interested in the content of the attribute / element where the XPath expression points to.

Here is an example XML file:

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book category="COOKING">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
  <book category="CHILDREN">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="WEB">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <price>49.99</price>
  </book>
  <book category="WEB">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>

This is what my program looks like:

package exampleprojectgroup;

import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
import org.jdom2.Attribute;
import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.filter.Filters;
import org.jdom2.input.SAXBuilder;
import org.jdom2.input.sax.XMLReaders;
import org.jdom2.xpath.XPathExpression;
import org.jdom2.xpath.XPathFactory;


public class ElementAttribute2String
{
    ElementAttribute2String()
    {
        run();
    }

    public void run()
    {
        final String PATH_TO_FILE = "c:\\readme.xml";
        /* It is essential that the program has to work with a variable amount of XPath expressions. */
        LinkedList<String> xPathExpressions = new LinkedList<>();
        /* Simulate user input.
         * First XPath expression points to attribute,
         * second one points to element.
         * Many more expressions follow in a real situation.
         */
        xPathExpressions.add( "/bookstore/book/@category" );
        xPathExpressions.add( "/bookstore/book/price" );

        /* One list should be sufficient to store the result. */
        List<Element> elementsResult = null;
        List<Attribute> attributesResult = null;
        List<Object> objectsResult = null;
        try
        {
            SAXBuilder saxBuilder = new SAXBuilder( XMLReaders.NONVALIDATING );
            Document document = saxBuilder.build( PATH_TO_FILE );
            XPathFactory xPathFactory = XPathFactory.instance();
            int i = 0;
            for ( String string : xPathExpressions )
            {
                /* Works only for elements, uncomment to give it a try. */
//                XPathExpression<Element> xPathToElement = xPathFactory.compile( xPathExpressions.get( i ), Filters.element() );
//                elementsResult = xPathToElement.evaluate( document );
//                for ( Element element : elementsResult )
//                {
//                    System.out.println( "Content of " + string + ": " + element.getText() );
//                }

                /* Works only for attributes, uncomment to give it a try. */
//                XPathExpression<Attribute> xPathToAttribute = xPathFactory.compile( xPathExpressions.get( i ), Filters.attribute() );
//                attributesResult = xPathToAttribute.evaluate( document );
//                for ( Attribute attribute : attributesResult )
//                {
//                    System.out.println( "Content of " + string + ": " + attribute.getValue() );
//                }

                /* I want to receive the content of the XPath expression as a string
                 * without having to know if it is an attribute or element beforehand.
                 */
                XPathExpression<Object> xPathExpression = xPathFactory.compile( xPathExpressions.get( i ) );
                objectsResult = xPathExpression.evaluate( document );
                for ( Object object : objectsResult )
                {
                    if ( object instanceof Attribute )
                    {
                        System.out.println( "Content of " + string + ": " + ((Attribute)object).getValue() );
                    }
                    else if ( object instanceof Element )
                    {
                        System.out.println( "Content of " + string + ": " + ((Element)object).getText() );
                    }
                }
                i++;
            }
        }
        catch ( IOException ioException )
        {
            ioException.printStackTrace();
        }
        catch ( JDOMException jdomException )
        {
            jdomException.printStackTrace();
        }
    }
}

Another thought is to search for the '@' character in the XPath expression, to determine if it is pointing to an attribute or element. This gives me the desired result, though I wish there was a more elegant solution. Does the JDOM2 API provide anything useful for this problem? Could the code be redesigned to meet my requirements?

Thank you in advance!

XPath expressions are hard to type/cast because they need to be compiled in a system that is sensitive to the return type of the XPath functions/values that are in the expression. JDOM relies on third-party code to do that, and that third party code does not have a mechanism to correlate those types at your JDOM code's compile time. Note that XPath expressions can return a number of different types of content, including String, boolean, Number, and Node-List-like content.

In most cases, the XPath expression return type is known before the expression is evaluated, and the programmer has the "right" casting/expectations for processing the results.

In your case, you don't, and the expression is more dynamic.

I recommend that you declare a helper function to process the content:

private static final Function extractValue(Object source) {
    if (source instanceof Attribute) {
        return ((Attribute)source).getValue();
    }
    if (source instanceof Content) {
        return ((Content)source).getValue();
    }
    return String.valueOf(source);
} 

This at least will neaten up your code, and if you use Java8 streams, can be quite compact:

List<String> values = xPathExpression.evaluate( document )
                      .stream()
                      .map(o -> extractValue(o))
                      .collect(Collectors.toList());

Note that the XPath spec for Element nodes is that the string-value is the concatination of the Element's text() content as well as all child elements' content. Thus, in the following XML snippet:

<a>bilbo <b>samwise</b> frodo</a>

the getValue() on the a element will return bilbo samwise frodo , but the getText() will return bilbo frodo . Choose which mechanism you use for the value extraction carefully.

I had the exact same problem and took the approach of recognizing when an attribute is the focus of the Xpath. I solved with two functions. The first complied the XPathExpression for later use:

    XPathExpression xpExpression;
    if (xpath.matches(  ".*/@[\\w]++$")) {
        // must be an attribute value we're after.. 
        xpExpression = xpfac.compile(xpath, Filters.attribute(), null, myNSpace);
    } else { 
        xpExpression = xpfac.compile(xpath, Filters.element(), null, myNSpace);
    }

The second evaluates and returns a value:

Object target = xpExpression.evaluateFirst(baseEl);
if (target != null) {
    String value = null;
    if (target instanceof Element) {
        Element targetEl = (Element) target;
        value = targetEl.getTextNormalize();
    } else if (target instanceof Attribute) {
        Attribute targetAt = (Attribute) target;
        value = targetAt.getValue();
    }

I suspect its a matter of coding style whether you prefer the helper function suggested in the previous answer or this approach. Either will work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM