简体   繁体   中英

java - org.htmlparser.Parser , need to get whats between the h3's

htmlparser.Parser, I have the snippet of html(see below) and i need to get the content of the there a bunch of these container divs with unqiue id's in my file. I can get the divs and their inner html just fine. I can not figure out how to get the whats between the H3 tags

this snippet of code works for divs but not the h3: if finds the h3 with the correct ID, i just can not figure out how to get the innerHTML or whats between the tags.

thanks for any help

    parser = new Parser();
    parser.setInputHTML(inHTML);
    parser.setEncoding("UTF-8");
    lstNodes = parser.extractAllNodesThatMatch(  new AndFilter(new TagNameFilter("h3"),
                                                  new HasAttributeFilter("id", "h3_"+num)));

This finds it but does not return the data between the h3's

 <div class="container" id="container_2">
      <h3 id="h3_2">Adding a few</h3>       
      <div class="maindiv" id="div_2">
          ...new articles in here jus tto flesh it out.
      </div><!--end of div_2-->
  </div>

我最终创建了自己的标签

class H3Tag extends CompositeTag

You're almost there. You can cast it to HeadingTag manually, and use getStringText() to get text between tags.

NodeList nodes = parser.extractAllNodesThatMatch(new AndFilter(new TagNameFilter("h3"),
    new HasAttributeFilter("id", "h3_"+num)));
SimpleNodeIterator nodeIterator = nodes.elements();
while (nodeIterator.hasMoreNodes()) {
    Node node = nodeIterator.nextNode();
    HeadingTag tag = (HeadingTag)node;
    System.out.println(tag.getStringText());
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM