[英]Extract nested html tag in Java?
I have the following HTML fragment: 我有以下HTML片段:
String source = "<p>dsdds</p>"
+ "<ul class=\"some-class-name\">"
+ "<li>data</li>"
+ "<li><div><ul><li>data</li></ul></div></li>"
+ "</ul>"
+ "<p>data</p>"
+ "<ul>data</ul><div>data</div>";
What I want to achieve is to get the result as: 我想要实现的结果是:
<ul class="some-class-name">
<li>data</li>
<li><div><ul><li>data</li></ul></div></li>
</ul>
What I have tried so far: 到目前为止我尝试过的是:
String endTag = "</ul>";
int origin = source.indexOf("<ul class=\"some-class-name\">");
int currentFrom = origin;
int to = source.indexOf(endTag, currentFrom);
while (true) {
int curIndex = source.indexOf("<ul", currentFrom + 1);
if (curIndex > -1) {
currentFrom = curIndex;
to = source.indexOf(endTag, currentFrom);
} else {
to = source.indexOf(endTag, to);
break;
}
}
System.out.println(source.substring(origin, to + endTag.length()));
But I always get: 但是我总是得到:
<ul class="some-class-name">
<li>data</li>
<li><div><ul><li>data</li></ul></div></li>
</ul>
<p>data</p>
<ul>data</ul>
Can anyone help me fix my code? 谁能帮我修复我的代码? Or suggest another approach. 或建议另一种方法。
Edit: Please do not suggest built in libraries such as Jsoup. 编辑:请不要建议内置于Jsoup之类的库中。
Luckily, your fragment is valid XHTML, which means it is valid XML. 幸运的是,您的片段是有效的XHTML,这意味着它是有效的XML。
XPath is specifically designed to extract nodes from XML: XPath专为从XML提取节点而设计:
// Must have a single root in order to parse.
String input = "<div>" + source + "</div>";
XPath xpath = XPathFactory.newInstance().newXPath();
Node node = (Node)
xpath.evaluate("//ul[@class='some-class-name']",
new InputSource(new StringReader(input)),
XPathConstants.NODE);
StringWriter result = new StringWriter();
Transformer transformer =
TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(node), new StreamResult(result));
String fragment = result.toString();
You should use jsoup: Java HTML Parser like this. 您应该使用jsoup:Java HTML Parser这样。
Document doc = Jsoup.parse(source);
Element e = doc.select("ul.some-class-name").first();
System.out.println(e);
result: 结果:
<ul class="some-class-name">
<li>data</li>
<li>
<div>
<ul>
<li>data</li>
</ul>
</div></li>
</ul>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.