[英]Java - How do I extract Google News Titles and Links using Jsoup?
I am very new to using jsoup and html. 我对使用jsoup和html很陌生。 I was wondering how to extract the titles and links (if possible) from the stories on the front page of google news.
我想知道如何从Google新闻首页上的故事中提取标题和链接(如果可能)。 Here is my code:
这是我的代码:
org.jsoup.nodes.Document doc = null;
try {
doc = (org.jsoup.nodes.Document) Jsoup.connect("https://news.google.com/").get();
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
Elements titles = doc.select("titletext");
System.out.println("Titles: " + titles.text());
//non existent
for (org.jsoup.nodes.Element e: titles) {
System.out.println("Title: " + e.text());
System.out.println("Link: " + e.attr("href"));
}
For some reason I think my program is unable to find titletext
, since this is the output when the code runs: Titles:
由于某种原因,我认为我的程序无法找到
titletext
,因为这是代码运行时的输出: Titles:
I would really appreciate your help, thanks. 非常感谢您的帮助,谢谢。
First get all nodes/elements which start with h2 html tag 首先获取所有以h2 html标记开头的节点/元素
Elements elem = html.select("h2");
Now you have element it has some child element(s) (id, href, originalhref and so on). 现在您有了一个具有一些子元素的元素(id,href,originalhref等)。 Here you need retrieve these data which you need
在这里您需要检索这些所需的数据
for(Element e: elem){
System.out.println(e.select("[class=titletext]").text());
System.out.println(e.select("a").attr("href"));
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.