简体   繁体   中英

Extract text from only some divs in the same class with jsoup

I would like to extract a text from specific <div> of a website using jsoup, but I'm not sure how.

The problem is, that I want to get a text from div that has a class="name" .

But, there can be more <div> s with this class (and I don't want to get the text from those).

It looks like this in the HTML file:

.  
.
<div class="name">
Some text I don't want
<span class="a">Tree</span>
</div>
.  
.
<div class="name">Some text I do want</div>
.  
.

So the only difference there is that the <div> I want the text from does not have <span> inside of it. But I have not found a way to use that as a key to extract the text in jsoup.

Is it possible?

Use JSoup's selector syntax . For instance to select all div's with class = "name" use

Elements nameElements = doc.select("div.name");

Note that your text you "do" and "don't" want above are in the same relative HTML locations, and in fact I have no clue why you want one or the other. HTML and JSoup will see them the same.

If you want to avoid elements containing span elements, then one way is to iterate through the elements obtained above and test by selector if they have span elements or not:

    Elements nameElements = doc.select("div.name");

    for (Element element : nameElements) {
        if (element.select("span").isEmpty()) {
            System.out.println("No span");
            System.out.println(element.text());
            System.out.println();
        } else {
            System.out.println("span");
            System.out.println(element.text());
            System.out.println();
        }
    }

You can select all div elements with class="name", and then loop through them. Check if an element has child elements - if not, this is the div you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM