[英]Parse the inner html tags using jSoup
I want to find the important links in a site using Jsoup library. 我想在使用Jsoup库的网站中找到重要的链接。 So for this suppose we have following code: 因此,假设有以下代码:
<h1><a href="http://example.com">This is important </a></h1>
Now while parsing how can we find that the tag a is inside the h1 tag? 现在,在解析的同时,我们如何发现标签a在h1标签内?
You can do it this way: 您可以这样操作:
File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
Elements headlinesCat1 = doc.getElementsByTag("h1");
for (Element headline : headlinesCat1) {
Elements importantLinks = headline.getElementsByTag("a");
for (Element link : importantLinks) {
String linkHref = link.attr("href");
String linkText = link.text();
System.out.println(linkHref);
}
}
Taken from the JSoup Cookbook . 摘自JSoup Cookbook 。
使用选择器:
Elements elements = doc.select("h1 > a");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.