[英]Java HTML XPath selector
I am trying to find a library like C# htmlagilitypack
for java to parse HTML and select elements using XPath.
我已经阅读了许多库,但它们都不是 HTML 的独立 XPath 选择器,我发现的所有库都需要使用htmlunit
之类的方法解析 HTML。
如果有人可以通过 XPath 2.0 或 3.0 和 HTML 解析的简单示例来指导我,我将不胜感激。
Java 支持Xpath 。 通常用于解析 XML 文件。 但是,它也应该适用于 HTML。
HTML 样品:
<html lang="en">
<head>
<title>Index page</title>
</head>
<body>
<div>
<br/>
<h1>Hello <span id="my-demo">User!</span></h1>
<br/>
<img src="https://s3.amazonaws.com/acloudguru-opsworkslab/ACG_Austin.JPG" alt="photo"/>
</div>
</body>
</html>
代码片段:
public class HtmlXpathParser {
private DocumentBuilder builder;
private XPath path;
public HtmlXpathParser() throws ParserConfigurationException {
DocumentBuilderFactory dbfactory = DocumentBuilderFactory.newInstance();
builder = dbfactory.newDocumentBuilder();
XPathFactory xpfactory = XPathFactory.newInstance();
path = xpfactory.newXPath();
}
public Optional<String> parse(String fileName) throws SAXException, IOException, XPathExpressionException {
File file = new File(fileName);
Document doc = builder.parse(file);
String result = path.evaluate("//img/@src", doc);
return Optional.of(result);
}
public static void main(String[] args) throws ParserConfigurationException, XPathExpressionException, SAXException, IOException {
HtmlXpathParser parser = new HtmlXpathParser();
Optional<String> srcResult = parser.parse("src/main/resources/index.html");
srcResult.ifPresent(System.out::println);
}
}
Output:
https://s3.amazonaws.com/acloudguru-opsworkslab/ACG_Austin.JPG
它适用于 XPath 版本 1。如果需要,可以使用xpath2-parser 之类的东西。
有用的参考资料:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.