简体   繁体   English

如何通过jsoup对div中的特定元素进行解析?

[英]How to parce by jsoup for specific element from div?

I have a website with titles ".entry-title" in div "td_module_5".我在div“td_module_5”中有一个标题为“.entry-title”的网站。

I writed a code for parcing by jsoup ".entry-title", but this "entry-title" are contained in another div, for example "td_mega_menu".我写了一个通过jsoup“.entry-title”进行parcing的代码,但是这个“entry-title”包含在另一个div中,例如“td_mega_menu”。

doc = Jsoup.connect(blogUrl).get();
title = doc.select(".entry-title");
titleList.clear();
for (Element titles : title) {
titleList.add(titles.text());
}
} catch (IOException e) {
e.printStackTrace();
}

How to parce by jsoup ".entry-title" only from div "td_module_5"?如何仅从div“td_module_5”中通过jsoup“.entry-title”进行parce?

Example html-code:示例 html 代码:

<div class="td_module_5 td_module_wrap td-animation-stack" >
            <div class="td-module-image td-module-image-float">
                <div class="td-module-thumb"><a class="td-admin-edit" href="https://unspecific.ru/wp-admin/post.php?post=7148&amp;action=edit">edit</a><a href="https://unspecific.ru/bakteriofagi-smogut-vylechit-nyak-i-bk/" rel="bookmark" title="Бактериофаги смогут вылечить НЯК и БК?"><img width="260" height="195" class="entry-thumb" src="https://unspecific.ru/wp-content/uploads/2018/07/bacf-260x195.jpg" srcset="https://unspecific.ru/wp-content/uploads/2018/07/bacf-260x195.jpg 260w, https://unspecific.ru/wp-content/uploads/2018/07/bacf-300x225.jpg 300w, https://unspecific.ru/wp-content/uploads/2018/07/bacf-80x60.jpg 80w, https://unspecific.ru/wp-content/uploads/2018/07/bacf-245x184.jpg 245w, https://unspecific.ru/wp-content/uploads/2018/07/bacf.jpg 640w" sizes="(max-width: 260px) 100vw, 260px" alt="Бактериофаг и бактерия" title="Бактериофаги смогут вылечить НЯК и БК?"/></a></div>            </div>

            <div class="td-item-details td-category-small">
                <a href="https://unspecific.ru/category/news/" class="td-post-category">Новости в лечении ВЗК</a>                
                <h3 class="entry-title td-module-title"><a href="https://unspecific.ru/bakteriofagi-smogut-vylechit-nyak-i-bk/" rel="bookmark" title="Бактериофаги смогут вылечить НЯК и БК?">Бактериофаги смогут вылечить НЯК и БК?</a></h3>

You can use the below css path selector :您可以使用以下 css 路径选择器:

    Element title = doc.select("div > .entry-title").first();
    System.out.println(title.text());

Or if you want to find all titles:或者,如果您想查找所有标题:

    Elements titles = doc.select("div > .entry-title");

    for (Element title: titles) {
        System.out.println(title.text());
    }

In your case because you want to select under specific div with specific css class you should use below:在您的情况下,因为您想在具有特定 css 类的特定 div 下选择,您应该在下面使用:

    Elements titles = doc.select("div.td_module_5.td_module_wrap.td-animation-stack > div > .entry-title");

    for (Element title: titles) {
        System.out.println(title.text());
    }

The output is :输出是:

Бактериофаги смогут вылечить НЯК и БК?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM