使用jsoup从URL中提取适当的内容

Question

I'm looking how I can extract the content of news articles like CNN or NewYork times using Jsoup . 我正在寻找如何使用Jsoup提取CNN或NewYork时报等新闻文章的内容。

In fact I had tried the following code: 实际上，我尝试了以下代码：

Document document = Jsoup.connect("http://edition.cnn.com/2013/11/10/world/asia/philippines-typhoon-haiyan/index.html").get();

Element contents = document.select("#content").first();

System.out.println(contents.html()); 

System.out.println(contents.text());

I had received this error: 我收到此错误：

Exception in thread "main" java.lang.NullPointerException
at com.clearforest.Test.main(Test.java:36)

Have you an idea please How I can extract a proper text from articles. 请问您有什么主意，我该如何从文章中提取适当的文字。

Answer 1

在select调用之后，您的document.select("div.cnn_strycntntlft") contents Element为空-您指定的选择器在从CNN下载的文档中不返回任何匹配项-尝试执行诸如document.select("div.cnn_strycntntlft") ，该操作返回故事div的内容。

使用jsoup从URL中提取适当的内容

问题描述

1 个解决方案

解决方案1
1 2013-11-12 17:27:38

使用jsoup从URL中提取适当的内容

问题描述

1 个解决方案

解决方案1 1 2013-11-12 17:27:38

解决方案1
1 2013-11-12 17:27:38