简体   繁体   中英

Xpath always return first child node of XML in JAVA

I have an xml like this:

<div class="row mt-5">
<div class="col-lg-cus col-6">
    <div class="product-box lazyload-wrap">
        <div class="remove-wishlist" data-id="7080">
            <i class="fa fa-times" aria-hidden="true"></i>
        </div>
        <div class="productAvatar">
            <a href="/vi/classic-fullface-royal-m18" title="Classic FULLFACE ROYAL M18">
                <div class="img img-background text-left origin product-img lazy-bg-img lazyload-item" style="background-image:url('/Uploads/default-image.jpg')" alt="Classic FULLFACE ROYAL M18" data-pil-src="https://fanfan.vn/Uploads/t/fa/fanfan0non-bao-hiem-cafe-racer-classic-fullface-royal-m18-5_0061169_235.jpg">
                </div>
            </a>
            <a class="btn btnQuickView "
               onclick="OpenCustomBootstrapModal('/vi/_Details?productId=7080', null, 1000, 'productPopup')">
                <span class="txtOutOfStock">Hết h&#224;ng</span>
                <span class="txtQuickView">Xem nhanh</span>
            </a>
        </div>
        <p class="mb-1 brand-name">
            <a href="/vi/royal-helmet" class="a-black" title="ROYAL HELMET">ROYAL HELMET</a>
        </p>
        <p class="name text-uppercase mb-0">
            <a href="/vi/classic-fullface-royal-m18" class="a-black">Classic FULLFACE ROYAL M18</a>
        </p>
        <div class="rating my-2">
            <span class="star-raty" data-score="0" data-readOnly="true"></span>
        </div>
        <p>
            <span>1.100.000 ₫</span>
        </p>
    </div>
</div>
<div class="col-lg-cus col-6">
    <div class="product-box lazyload-wrap">
        <div class="remove-wishlist" data-id="6855">
            <i class="fa fa-times" aria-hidden="true"></i>
        </div>
        <div class="productAvatar">
            <a href="/vi/non-bao-hiem-34-royal-m01-tem" title="N&#243;n bảo hiểm 3/4 Royal M01 Tem">
                <div class="img img-background text-left origin product-img lazy-bg-img lazyload-item" style="background-image:url('/Uploads/default-image.jpg')" alt="N&#243;n bảo hiểm 3/4 Royal M01 Tem" data-pil-src="https://fanfan.vn/Uploads/t/fa/fanfan0mu-non-bao-hiem-3-4-di-xe-may-royal-m01-tem-helmet-with-texture-4-do-xam-red-si_0060108_235.jpg">
                </div>
            </a>
            <a class="btn btnQuickView "
               onclick="OpenCustomBootstrapModal('/vi/_Details?productId=6855', null, 1000, 'productPopup')">
                <span class="txtOutOfStock">Hết h&#224;ng</span>
                <span class="txtQuickView">Xem nhanh</span>
            </a>
        </div>
        <p class="mb-1 brand-name">
            <a href="/vi/royal-helmet" class="a-black" title="ROYAL HELMET">ROYAL HELMET</a>
        </p>
        <p class="name text-uppercase mb-0">
            <a href="/vi/non-bao-hiem-34-royal-m01-tem" class="a-black">N&#243;n bảo hiểm 3/4 Royal M01 Tem</a>
        </p>
        <div class="rating my-2">
            <span class="star-raty" data-score="0" data-readOnly="true"></span>
        </div>
        <p>
            <span>400.000 ₫</span>
        </p>
    </div>
</div>

Im trying to parse them fetch some data from it and parse them to jaxb, here how I did it:

 public static void fetchFanFanData(String dataFilePath, String type) {
    try {
        Document doc = DocParser(dataFilePath);
        XPath xpath = getXPath();

        String query = "//div[@class=\"col-lg-cus col-6\"]";
        NodeList list = (NodeList) xpath.evaluate(query, doc, XPathConstants.NODESET);
        NodeList list = doc.getDocumentElement().getChildNodes();
        System.out.println(list.getLength());

        Products products = new Products();
        for (int i = 0; i < list.getLength(); i++) {
            Node node = list.item(i);
            String url = xpath.evaluate("//p[@class=\"name text-uppercase mb-0\"]/a/@href", node, XPathConstants.STRING).toString();
            String name = xpath.evaluate("//p[@class=\"name text-uppercase mb-0\"]/a", node, XPathConstants.STRING).toString();
            String producer = xpath.evaluate("//p[@class=\"mb-1 brand-name\"]", node, XPathConstants.STRING).toString();
            String image_url = xpath.evaluate("//div[@class=\"productAvatar\"]/a/div/@data-pil-src", node, XPathConstants.STRING).toString();
            String price = xpath.evaluate("//p/span", node, XPathConstants.STRING).toString();
            Product product = new Product();
            product.setName(name);
            product.setImage(image_url);
            product.setUrl(url);
            product.setPrice(price);
            product.setProducer(producer);
            product.setStore("FanFan");
            product.setType(type);

            products.getProduct().add(product);
        }
        marshallJAXB(products, dataFilePath);
    } catch (ParserConfigurationException | SAXException | IOException | XPathExpressionException | JAXBException ex) {
        Logger.getLogger(XMLUtilities.class.getName()).log(Level.SEVERE, null, ex);
    }
}

private static void marshallJAXB(Products products, String path) throws JAXBException, FileNotFoundException {
    JAXBContext context = JAXBContext.newInstance(Products.class);
    Marshaller m = context.createMarshaller();
    m.setProperty(Marshaller.JAXB_ENCODING, "UTF-8");
    m.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
    m.marshal(products, new File(ServletActionContext.getServletContext().getRealPath("/" + "WEB-INF\\result.xml")));
}

public static XPath getXPath() {
    XPathFactory factory = XPathFactory.newInstance();
    XPath xPath = factory.newXPath();
    return xPath;
}

public static Document DocParser(String filePath)
        throws ParserConfigurationException, SAXException, IOException {
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document doc = builder.parse(filePath);
    return doc;
}

The marshaller to just verify if the jaxb is correct, but what I got is always the first node like this:

<products xmlns="http://www.example.org/product">
<product type="helmet">
    <name>Classic FULLFACE ROYAL M18</name>
    <url>/vi/classic-fullface-royal-m18</url>
    <image>https://fanfan.vn/Uploads/t/fa/fanfan0non-bao-hiem-cafe-racer-classic-fullface-royal-m18-5_0061169_235.jpg</image>
    <price>1.100.000 ₫</price>
    <producer>ROYAL HELMET</producer>
    <store>FanFan</store>
</product>
<product type="helmet">
    <name>Classic FULLFACE ROYAL M18</name>
    <url>/vi/classic-fullface-royal-m18</url>
    <image>https://fanfan.vn/Uploads/t/fa/fanfan0non-bao-hiem-cafe-racer-classic-fullface-royal-m18-5_0061169_235.jpg</image>
    <price>1.100.000 ₫</price>
    <producer>ROYAL HELMET</producer>
    <store>FanFan</store>
</product>
<product type="helmet">
    <name>Classic FULLFACE ROYAL M18</name>
    <url>/vi/classic-fullface-royal-m18</url>
    <image>https://fanfan.vn/Uploads/t/fa/fanfan0non-bao-hiem-cafe-racer-classic-fullface-royal-m18-5_0061169_235.jpg</image>
    <price>1.100.000 ₫</price>
    <producer>ROYAL HELMET</producer>
    <store>FanFan</store>
</product>

I tried many ways but its no hope now. Anyone know why? Please help. I cant find out how xpath work in this situation although Im using them in a specific context?

I suppose the problem occurs in the fetchFanFanData()-method inside the for-loop by accessing the values for url, name etc. Here you have to replace " // " with " .// " for all accesses, eg replace

 String url = xpath.evaluate("//p[@class=\"name text-uppercase mb-0\"]/a/@href", node, XPathConstants.STRING).toString();

with

 String url = xpath.evaluate(".//p[@class=\"name text-uppercase mb-0\"]/a/@href", node, XPathConstants.STRING).toString();

The difference between " // " and " .// " is:

" //para " selects [...] all para elements in the same document as the context node

" .//para " selects the para element descendants of the context node

from https://www.w3.org/TR/2017/REC-xpath-31-20170321/ in general and in particular chapter 3.3.5, section "examples". And also https://docs.oracle.com/javase/10/docs/api/javax/xml/xpath/package-summary.html .

The reason why you always get the same values is: The expression

 "//p[@class=\"name text-uppercase mb-0\"]/a/@href" 

applied to a node of your list returns also a list containing all the hits from the whole document (instead of a single hit). Moreover, this list is identical for each node. In combination with the return type XPathConstants.STRING always the first (identical) hit is selected. Thus, the same result is returned for each node.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM