如何获取内部的文本节点的值 <div> 使用XPATH和Jtidy的元素

Question

I have been banging my head over for two days now. 我已经两天不停地动脑筋了。 I have a XHTML web-page from which i want to scrap some data I am using JTidy to DOMParse and then XPathFactory to find nodes using XPath 我有一个XHTML网页，我想从该网页中抓取一些数据，我正在使用JTidy到DOMParse，然后使用XPathFactory来查找使用XPath的节点

The Xhtml snippet is something like this Xhtml代码段是这样的

    <div style="line-height: 22px;" id="dvTitle" class="titlebtmbrdr01">BAJAJ AUTO LTD.</div>

Now i want that BAJAJ AUTO LTD. 现在我想要那个BAJAJ AUTO LTD。

The code that i am using is : 我正在使用的代码是：

    import java.io.IOException;
    import java.net.MalformedURLException;
    import java.net.URL;
    import java.util.Vector;

    import javax.xml.xpath.XPath;
    import javax.xml.xpath.XPathConstants;
    import javax.xml.xpath.XPathExpression;
    import javax.xml.xpath.XPathExpressionException;
    import javax.xml.xpath.XPathFactory;

     import org.w3c.dom.Document;
    import org.w3c.dom.Node;
    import org.w3c.dom.NodeList;


   public class BSEQuotesExtractor implements valueExtractor {

@Override
public Vector<String> getName(Document d) throws XPathExpressionException {
    // TODO Auto-generated method stub
    XPathFactory factory = XPathFactory.newInstance();
    XPath xpath = factory.newXPath();
    XPathExpression expr = xpath.compile("//div[@id='dvTitle']/text()");
    Object result = expr.evaluate(d, XPathConstants.NODESET);
    NodeList nodes = (NodeList)result;
    for(int i=0;i<nodes.getLength();i++)
    {
        System.out.println(nodes.item(i).getNodeValue());
    }

    return null;
}

public static void main(String[] args) throws MalformedURLException, IOException, XPathExpressionException{
    BSEQuotesExtractor q = new BSEQuotesExtractor();
    DOMParser parser = new DOMParser(new URL("http://www.bseindia.com/bseplus/StockReach/StockQuote/Equity/BAJAJ%20AUTO%20LTD/BAJAJAUT/532977/Scrips").openStream());
    Document d = parser.getDocument();
    q.getName(d);

}

    }

But i gett a null output and not BAJAJ AUTO LTD. 但是我得到一个空输出，而不是BAJAJ AUTO LTD。 Please rescue me 请救救我

Answer 1

you must use XPathConstants.STRING instead of XPathConstants.NODESET . 您必须使用XPathConstants.STRING而不是XPathConstants.NODESET 。 You want to get a value of a single element (div), not a list of nodes. 您想要获取单个元素（div）的值，而不是节点列表。 Write: 写：

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String divContent = (String) path.evaluate("//div[@id='dvTitle']", document, XPathConstants.STRING);

Into divContent you get "BAJAJ AUTO LTD.". 进入divContent您将获得“ BAJAJ AUTO LTD。”。

Answer 2

try this. 尝试这个。

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//div[@id='dvTitle']");
Object result = expr.evaluate(d, XPathConstants.NODE);
Node node = (Node)result;
System.out.println(node.getTextContent());

如何获取内部的文本节点的值 <div> 使用XPATH和Jtidy的元素

问题描述

2 个解决方案

解决方案1
1 2012-07-09 09:45:45

解决方案2
0 已采纳 2012-07-09 09:29:04

如何获取内部的文本节点的值 <div> 使用XPATH和Jtidy的元素

问题描述

2 个解决方案

解决方案1 1 2012-07-09 09:45:45

解决方案2 0 已采纳 2012-07-09 09:29:04

解决方案1
1 2012-07-09 09:45:45

解决方案2
0 已采纳 2012-07-09 09:29:04