简体   繁体   English

来自支持门户的 JSOUP Web 抓取

[英]JSOUP Web scraping from support portal

I'm new in using jSoup and now I'm trying to make a web scraping form this portal.我是使用 jSoup 的新手,现在我正在尝试从这个门户网站制作网络抓取。

https://supportforums.cisco.com/t5/lan-switching-and-routing/bd-p/6016-discussions-lan-switching-routing https://supportforums.cisco.com/t5/lan-switching-and-routing/bd-p/6016-discussions-lan-switching-routing

On this portal, I want to receive information from this list, which will show solved problems, I mean the topics which have the special image of solving like this.在这个门户上,我想从这个列表中接收信息,它会显示已解决的问题,我的意思是具有像这样解决的特殊图像的主题。

Solved task must look in such way解决的任务必须这样看

I created a connection to this page in such way and checked the title of this page to be sure that I'm in the right place.我以这种方式创建了与此页面的连接并检查了此页面的标题以确保我在正确的位置。

        document = Jsoup.connect("https://supportforums.cisco.com/t5/lan-switching-and-routing/bd-p/6016-discussions-lan-switching-routing").get();
        String title = document.title();
        print("Title: " + title);

After that I began to look into HTML side and i understood that this topics must be element in list inside div class messageList.MessageList.lia-component-forums-widget-message-list.lia-forum-message-list.lia-component-message-list but I'm not sure about it.之后我开始研究 HTML 方面,我明白这个主题必须是 div 类messageList.MessageList.lia-component-forums-widget-message-list.lia-forum-message-list.lia-component-message-list但我不确定。 Then I figured out that each topic contain unique id and I'm stuck on it.然后我发现每个主题都包含唯一的 id,我一直坚持下去。

Could you please help me how to receive all these elements, topics?你能帮我如何接收所有这些元素、主题吗? And how to filter solved topics among all of them?以及如何在所有主题中过滤已解决的主题? In the beginning, I just want to output the titles of these topics using Console in Java.一开始,我只想在 Java 中使用 Console 输出这些主题的标题。

And sorry if I asked a silly question.对不起,如果我问了一个愚蠢的问题。

The topics that are solved are represented by row with class lia-list-row-thread-solved .解决的主题由具有lia-list-row-thread-solved类的行表示。 The main thread list is in element with id grid .主线程列表位于 ID 为grid元素中。

        Document doc = Jsoup.connect(
                "https://supportforums.cisco.com/t5/lan-switching-and-routing/bd-p/6016-discussions-lan-switching-routing")
                .get();
        for (Element e : doc.select("#grid tr.lia-list-row-thread-solved")) {
            String text = e.text();
            System.out.println(text);
        }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM