简体   繁体   中英

Jsoup get comment before element

Say I have this html:

<!-- some comment -->
<div class="someDiv">
... other html
</div>
<!-- some comment 2 -->
<div class="someDiv">
... other html
</div>

I'm currently getting all divs where class == someDiv and scraping them for information. To do that I'm simply doing this:

Document doc = Jsoup.connect(url).get();
Elements elements = doc.select(".someDiv");
for (Element element : elements) {
    //scrape stuff
}

Within the for loop, is there any way to get the comment tag found before the particular div.someDiv element I'm on?

If this isn't possible, should I go about parsing this html structure differently with this requirement?

Thanks for any advice.

Though this question is a few month old here my answer for completeness. How about using previousSibling to get the preceding Node . Of course in the real code you probably want to check, whether you really get a Comment there.

String html = "<!-- some comment --><div class=\"someDiv\">... other html</div><!-- some comment 2 --><div class=\"someDiv\">... other html</div>";
Document doc = Jsoup.parseBodyFragment(html);
Elements elements = doc.select(".someDiv");
for (Element element : elements) {
    System.out.println(((Comment) element.previousSibling()).getData());
}

This produces:

some comment 
some comment 2 

(tested with jsoup 1.6.1 and 1.6.3)

Try something like this, Iterate over all comments and check if their sibling is the div you were after

for (int i = 0; i < doc.childNodes().size(); i++) {
        Node child = doc.childNode(i);
        if (child.nodeName().equals("#comment")) {
            //do some checking on child.nextSibling() , like hasAttr or attr to figure out if it the div you were expecting for...
        }
}

Take a look at the jsoup Node docs

Elements elements = doc.select("div.someDiv");

http://jsoup.org/cookbook/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM