[英]How to get the value of text node inside <div> element using XPATH and Jtidy
I have been banging my head over for two days now. 我已经两天不停地动脑筋了。 I have a XHTML web-page from which i want to scrap some data I am using JTidy to DOMParse and then XPathFactory to find nodes using XPath
我有一个XHTML网页,我想从该网页中抓取一些数据,我正在使用JTidy到DOMParse,然后使用XPathFactory来查找使用XPath的节点
The Xhtml snippet is something like this Xhtml代码段是这样的
<div style="line-height: 22px;" id="dvTitle" class="titlebtmbrdr01">BAJAJ AUTO LTD.</div>
Now i want that BAJAJ AUTO LTD. 现在我想要那个BAJAJ AUTO LTD。
The code that i am using is : 我正在使用的代码是:
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Vector;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class BSEQuotesExtractor implements valueExtractor {
@Override
public Vector<String> getName(Document d) throws XPathExpressionException {
// TODO Auto-generated method stub
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//div[@id='dvTitle']/text()");
Object result = expr.evaluate(d, XPathConstants.NODESET);
NodeList nodes = (NodeList)result;
for(int i=0;i<nodes.getLength();i++)
{
System.out.println(nodes.item(i).getNodeValue());
}
return null;
}
public static void main(String[] args) throws MalformedURLException, IOException, XPathExpressionException{
BSEQuotesExtractor q = new BSEQuotesExtractor();
DOMParser parser = new DOMParser(new URL("http://www.bseindia.com/bseplus/StockReach/StockQuote/Equity/BAJAJ%20AUTO%20LTD/BAJAJAUT/532977/Scrips").openStream());
Document d = parser.getDocument();
q.getName(d);
}
}
But i gett a null output and not BAJAJ AUTO LTD. 但是我得到一个空输出,而不是BAJAJ AUTO LTD。 Please rescue me
请救救我
you must use XPathConstants.STRING
instead of XPathConstants.NODESET
. 您必须使用
XPathConstants.STRING
而不是XPathConstants.NODESET
。 You want to get a value of a single element (div), not a list of nodes. 您想要获取单个元素(div)的值,而不是节点列表。 Write:
写:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String divContent = (String) path.evaluate("//div[@id='dvTitle']", document, XPathConstants.STRING);
Into divContent
you get "BAJAJ AUTO LTD.". 进入
divContent
您将获得“ BAJAJ AUTO LTD。”。
try this. 尝试这个。
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//div[@id='dvTitle']");
Object result = expr.evaluate(d, XPathConstants.NODE);
Node node = (Node)result;
System.out.println(node.getTextContent());
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.