簡體   English   中英

如何獲取內部的文本節點的值 <div> 使用XPATH和Jtidy的元素

[英]How to get the value of text node inside <div> element using XPATH and Jtidy

我已經兩天不停地動腦筋了。 我有一個XHTML網頁,我想從該網頁中抓取一些數據,我正在使用JTidy到DOMParse,然后使用XPathFactory來查找使用XPath的節點

Xhtml代碼段是這樣的

    <div style="line-height: 22px;" id="dvTitle" class="titlebtmbrdr01">BAJAJ AUTO LTD.</div>

現在我想要那個BAJAJ AUTO LTD。

我正在使用的代碼是:

    import java.io.IOException;
    import java.net.MalformedURLException;
    import java.net.URL;
    import java.util.Vector;

    import javax.xml.xpath.XPath;
    import javax.xml.xpath.XPathConstants;
    import javax.xml.xpath.XPathExpression;
    import javax.xml.xpath.XPathExpressionException;
    import javax.xml.xpath.XPathFactory;

     import org.w3c.dom.Document;
    import org.w3c.dom.Node;
    import org.w3c.dom.NodeList;


   public class BSEQuotesExtractor implements valueExtractor {

@Override
public Vector<String> getName(Document d) throws XPathExpressionException {
    // TODO Auto-generated method stub
    XPathFactory factory = XPathFactory.newInstance();
    XPath xpath = factory.newXPath();
    XPathExpression expr = xpath.compile("//div[@id='dvTitle']/text()");
    Object result = expr.evaluate(d, XPathConstants.NODESET);
    NodeList nodes = (NodeList)result;
    for(int i=0;i<nodes.getLength();i++)
    {
        System.out.println(nodes.item(i).getNodeValue());
    }

    return null;
}

public static void main(String[] args) throws MalformedURLException, IOException, XPathExpressionException{
    BSEQuotesExtractor q = new BSEQuotesExtractor();
    DOMParser parser = new DOMParser(new URL("http://www.bseindia.com/bseplus/StockReach/StockQuote/Equity/BAJAJ%20AUTO%20LTD/BAJAJAUT/532977/Scrips").openStream());
    Document d = parser.getDocument();
    q.getName(d);

}

    }

但是我得到一個空輸出,而不是BAJAJ AUTO LTD。 請救救我

您必須使用XPathConstants.STRING而不是XPathConstants.NODESET 您想要獲取單個元素(div)的值,而不是節點列表。 寫:

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String divContent = (String) path.evaluate("//div[@id='dvTitle']", document, XPathConstants.STRING);

進入divContent您將獲得“ BAJAJ AUTO LTD。”。

嘗試這個。

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//div[@id='dvTitle']");
Object result = expr.evaluate(d, XPathConstants.NODE);
Node node = (Node)result;
System.out.println(node.getTextContent());

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM