简体   繁体   English

使用jsoup仅从同一类的某些div中提取文本

[英]Extract text from only some divs in the same class with jsoup

I would like to extract a text from specific <div> of a website using jsoup, but I'm not sure how. 我想使用jsoup从网站的特定<div>中提取文本,但不确定如何。

The problem is, that I want to get a text from div that has a class="name" . 问题是,我想从div获得具有class="name"的文本。

But, there can be more <div> s with this class (and I don't want to get the text from those). 但是,此类中可以有更多的<div> (并且我不想从中获取文本)。

It looks like this in the HTML file: HTML文件中的内容如下所示:

.  
.
<div class="name">
Some text I don't want
<span class="a">Tree</span>
</div>
.  
.
<div class="name">Some text I do want</div>
.  
.

So the only difference there is that the <div> I want the text from does not have <span> inside of it. 因此,唯一的区别是我要从中获取文本的<div> <span>里面没有<span> But I have not found a way to use that as a key to extract the text in jsoup. 但是我还没有找到一种使用它作为键来提取jsoup中文本的方法。

Is it possible? 可能吗?

Use JSoup's selector syntax . 使用JSoup的选择器语法 For instance to select all div's with class = "name" use 例如,选择所有class =“ name”的div使用

Elements nameElements = doc.select("div.name");

Note that your text you "do" and "don't" want above are in the same relative HTML locations, and in fact I have no clue why you want one or the other. 请注意,您在上方“想要”和“不要”想要的文本位于相同的相对 HTML位置,实际上我不知道您为什么想要一个或另一个。 HTML and JSoup will see them the same. HTML和JSoup会看到相同的内容。

If you want to avoid elements containing span elements, then one way is to iterate through the elements obtained above and test by selector if they have span elements or not: 如果要避免元素包含span元素,则一种方法是遍历上面获得的元素,并通过选择器测试它们是否具有span元素:

    Elements nameElements = doc.select("div.name");

    for (Element element : nameElements) {
        if (element.select("span").isEmpty()) {
            System.out.println("No span");
            System.out.println(element.text());
            System.out.println();
        } else {
            System.out.println("span");
            System.out.println(element.text());
            System.out.println();
        }
    }

You can select all div elements with class="name", and then loop through them. 您可以选择所有带有class =“ name”的div元素,然后循环遍历它们。 Check if an element has child elements - if not, this is the div you want. 检查元素是否具有子元素-如果没有,则这是您想要的div。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM