JSoup核心Web文本提取

Question

我是JSoup的新手，很抱歉，如果我的問題太瑣碎了。 我正在嘗試從http://www.nytimes.com/提取文章文本，但是在打印分析文檔時，我無法在分析輸出中看到任何文章

public class App 
{

    public static void main( String[] args )
    {
        String url = "http://www.nytimes.com/";
        Document document;
        try {
            document = Jsoup.connect(url).get();

            System.out.println(document.html()); // Articles not getting printed
            //System.out.println(document.toString()); // Same here
            String title = document.title();
            System.out.println("title : " + title); // Title is fine

    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

}

好的，我嘗試解析“ http://en.wikipedia.org/wiki/Big_data ”以檢索Wiki數據，這里也存在同樣的問題，但我沒有得到輸出的Wiki數據。 任何幫助或提示將不勝感激。

謝謝。

Answer 1

以下是獲取所有<p class="summary>文本的方法：

final String url = "http://www.nytimes.com/";
Document doc = Jsoup.connect(url).get();

for( Element element : doc.select("p.summary") )
{
    if( element.hasText() ) // Skip those tags without text
    {
        System.out.println(element.text());
    }
}

如果需要所有 <p>標記，而不進行任何過濾，則可以改用doc.select("p") 。 但是在大多數情況下，最好只選擇您需要的那些（請參閱此處以獲取Jsoup Selector文檔）。

JSoup核心Web文本提取

問題描述

1 個解決方案

解決方案1
0 已采納 2013-06-21 13:34:15

JSoup核心Web文本提取

問題描述

1 個解決方案

解決方案1 0 已采納 2013-06-21 13:34:15

解決方案1
0 已采納 2013-06-21 13:34:15