如何使用Jaunt库从网站上抓取数据？

Question

我想从这个网站获得标题： http ： //feeds.foxnews.com/foxnews/latest

像这样的例子：

<title><![CDATA[SUCCESSFUL INTERCEPT Pentagon confirms it shot down ICBM-type target]]></title>

它会显示如下文字：

五角大楼说：“成功截获五角大楼确认击落ICBM型目标美国进行成功的导弹拦截试验”

这是我的代码。 我用过jaunt库。

我不知道它为什么只显示文字“foxnew.com”

import com.jaunt.JauntException;
import com.jaunt.UserAgent;

public class p8_1
{

    public static void main(String[] args)
    {
        try
        {
            UserAgent userAgent = new UserAgent();
            userAgent.visit("http://feeds.foxnews.com/foxnews/latest"); 
            String title = userAgent.doc.findFirst
("<title><![CDATA[SUCCESSFUL INTERCEPT Pentagon confirms it shot down ICBM-type target]]></title>").getText();
              System.out.println("\n " + title); 


        } catch (JauntException e)
        {
            System.err.println(e);
        }

    }

}

Answer 1

搜索元素类型，而不是值。

请尝试以下操作以获取Feed中每个项目的标题文本：

public static void main(String[] args) {
    try {
        UserAgent userAgent = new UserAgent();
        userAgent.visit("http://feeds.foxnews.com/foxnews/latest");

        Elements items = userAgent.doc.findEach("<item>");
        Elements titles = items.findEach("<title>");

        for (Element title : titles) {
            String titleText = title.getComment(0).getText();
            System.out.println(titleText);
        }
    } catch (JauntException e) {
        System.err.println(e);
    }
}

如何使用Jaunt库从网站上抓取数据？

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-05-31 06:44:45

如何使用Jaunt库从网站上抓取数据？

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-05-31 06:44:45

解决方案1
0 已采纳 2017-05-31 06:44:45