Java-如何使用Jsoup提取Google新闻标题和链接？

Question

I am very new to using jsoup and html. 我对使用jsoup和html很陌生。 I was wondering how to extract the titles and links (if possible) from the stories on the front page of google news. 我想知道如何从Google新闻首页上的故事中提取标题和链接（如果可能）。 Here is my code: 这是我的代码：

    org.jsoup.nodes.Document doc = null;
                try {
                    doc = (org.jsoup.nodes.Document) Jsoup.connect("https://news.google.com/").get();
                } catch (IOException e1) {
                    // TODO Auto-generated catch block
                    e1.printStackTrace();
                }
                Elements titles = doc.select("titletext");

                System.out.println("Titles: " + titles.text());


                //non existent
                for (org.jsoup.nodes.Element e: titles) {
                    System.out.println("Title: " + e.text());
                    System.out.println("Link: " + e.attr("href"));
                }

For some reason I think my program is unable to find titletext , since this is the output when the code runs: Titles: 由于某种原因，我认为我的程序无法找到titletext ，因为这是代码运行时的输出： Titles:

I would really appreciate your help, thanks. 非常感谢您的帮助，谢谢。

Answer 1

First get all nodes/elements which start with h2 html tag 首先获取所有以h2 html标记开头的节点/元素

Elements elem = html.select("h2");

Now you have element it has some child element(s) (id, href, originalhref and so on). 现在您有了一个具有一些子元素的元素（id，href，originalhref等）。 Here you need retrieve these data which you need 在这里您需要检索这些所需的数据

 for(Element e: elem){
         System.out.println(e.select("[class=titletext]").text());
         System.out.println(e.select("a").attr("href"));
     }

Java-如何使用Jsoup提取Google新闻标题和链接？

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-08-29 19:39:30

Java-如何使用Jsoup提取Google新闻标题和链接？

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-08-29 19:39:30

解决方案1
0 已采纳 2016-08-29 19:39:30