使用jSoup解析内部html标签

Question

I want to find the important links in a site using Jsoup library. 我想在使用Jsoup库的网站中找到重要的链接。 So for this suppose we have following code: 因此，假设有以下代码：

<h1><a href="http://example.com">This is important </a></h1>

Now while parsing how can we find that the tag a is inside the h1 tag? 现在，在解析的同时，我们如何发现标签a在h1标签内？

Answer 1

You can do it this way: 您可以这样操作：

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

Elements headlinesCat1 = doc.getElementsByTag("h1");
for (Element headline : headlinesCat1) {
    Elements importantLinks = headline.getElementsByTag("a");
    for (Element link : importantLinks) {
        String linkHref = link.attr("href");
        String linkText = link.text();
        System.out.println(linkHref);
    }
}

Taken from the JSoup Cookbook . 摘自JSoup Cookbook 。

Answer 2

使用选择器：

Elements elements = doc.select("h1 > a");

使用jSoup解析内部html标签

问题描述

2 个解决方案

解决方案1
1 2015-06-10 11:25:55

解决方案2
0 2015-06-10 11:33:03

使用jSoup解析内部html标签

问题描述

2 个解决方案

解决方案1 1 2015-06-10 11:25:55

解决方案2 0 2015-06-10 11:33:03

解决方案1
1 2015-06-10 11:25:55

解决方案2
0 2015-06-10 11:33:03