htmlparser.Parser, I have the snippet of html(see below) and i need to get the content of the there a bunch of these container divs with unqiue id's in my file. I can get the divs and their inner html just fine. I can not figure out how to get the whats between the H3 tags
this snippet of code works for divs but not the h3: if finds the h3 with the correct ID, i just can not figure out how to get the innerHTML or whats between the tags.
thanks for any help
parser = new Parser();
parser.setInputHTML(inHTML);
parser.setEncoding("UTF-8");
lstNodes = parser.extractAllNodesThatMatch( new AndFilter(new TagNameFilter("h3"),
new HasAttributeFilter("id", "h3_"+num)));
This finds it but does not return the data between the h3's
<div class="container" id="container_2">
<h3 id="h3_2">Adding a few</h3>
<div class="maindiv" id="div_2">
...new articles in here jus tto flesh it out.
</div><!--end of div_2-->
</div>
我最终创建了自己的标签
class H3Tag extends CompositeTag
You're almost there. You can cast it to HeadingTag
manually, and use getStringText()
to get text between tags.
NodeList nodes = parser.extractAllNodesThatMatch(new AndFilter(new TagNameFilter("h3"),
new HasAttributeFilter("id", "h3_"+num)));
SimpleNodeIterator nodeIterator = nodes.elements();
while (nodeIterator.hasMoreNodes()) {
Node node = nodeIterator.nextNode();
HeadingTag tag = (HeadingTag)node;
System.out.println(tag.getStringText());
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.