Jsoup.parse（）与Jsoup.parse（） - 或者如何在Jsoup中使用URL检测？

Question

Jsoup has 2 html parse() methods: Jsoup有2个html parse（）方法：

parse(String html) - "As no base URI is specified, absolute URL detection relies on the HTML including a tag." parse（String html） - “由于没有指定基URI，绝对URL检测依赖于包含标记的HTML。”
parse(String html, String baseUri) - "The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a tag." parse（String html，String baseUri） - “检索HTML的URL。用于解析在HTML声明标记之前发生的绝对URL的相对URL。”

I am having a difficulty understanding the meaning of the difference between the two: 我很难理解两者之间差异的含义：

In the 2nd parse() version, what does "resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag" mean? 在第二个parse()版本中，“解析绝对URL的相对URL，在HTML声明<base href>标记之前发生的”是什么意思？ What if a <base href> tag never occurs in the page? 如果<base href>标签永远不会出现在页面中怎么办？
What is the purpose of absolute URL detection? 绝对URL检测的目的是什么？ Why does Jsoup need to find the absolute URL? 为什么Jsoup需要找到绝对URL？
Lastly, but most importantly: Is baseUri the full URL of HTML page (as phrased in original documentation) or is it the base URL of the HTML page? 最后，但最重要的是： baseUri是HTML页面的完整URL（如原始文档中所述）还是HTML页面的基本 URL？

Answer 1

It's used for among others Element#absUrl() so that you can retrieve the (intended) absolute URL of an <a href> , <img src> , <link href> , <script src> , etc. Eg 它用于Element#absUrl()以便您可以检索<a href> ， <img src> ， <link href> ， <script src>等的（预期）绝对URL。例如

for (Element link : document.select("a")) {
    System.out.println(link.absUrl("href"));
}

This is very useful if you want to download and/or parse the linked resources as well. 如果您还想下载和/或解析链接的资源，这非常有用。

In the 2nd parse() version, what does "resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag" mean? 在第二个parse（）版本中，“解析绝对URL的相对URL，在HTML声明<base href>标记之前发生的”是什么意思？ What if a <base href> tag never occurs in the page? 如果<base href>标签永远不会出现在页面中怎么办？

Some (poor) websites may have declared a <link> or <script> with a relative URL before the <base> tag. 某些（差）网站可能已在<base>标记之前声明了<link>或<script>以及相对URL。 Or if there is no means of a <base> tag, then just the given baseUri will be used for resolving relative URLs of the entire document. 或者，如果没有<base>标记的方法，那么只有给定的baseUri将用于解析整个文档的相对URL。

What is the purpose of absolute URL detection? 绝对URL检测的目的是什么？ Why does Jsoup need to find the absolute URL? 为什么Jsoup需要找到绝对URL？

In order to return the right URL on Element#absUrl() . 为了在Element#absUrl()上返回正确的URL。 This is purely for enduser's convenience. 这纯粹是为了最终用户的便利。 Jsoup doesn't need it in order to successfully parse the HTML at its own. Jsoup不需要它来成功解析HTML。

Lastly, but most importantly: Is baseUri the full URL of HTML page (as phrased in original documentation) or is it the base URL of the HTML page? 最后，但最重要的是：baseUri是HTML页面的完整URL（如原始文档中所述）还是HTML页面的基本URL？

The former. 前者。 If the latter, then documentation would be lying. 如果是后者，那么文件就会撒谎。 The baseUri must not to be confused with <base href> . baseUri不得与<base href>混淆。

Jsoup.parse（）与Jsoup.parse（） - 或者如何在Jsoup中使用URL检测？

问题描述

1 个解决方案

解决方案1
4 已采纳 2011-08-22 00:57:46

Jsoup.parse（）与Jsoup.parse（） - 或者如何在Jsoup中使用URL检测？

问题描述

1 个解决方案

解决方案1 4 已采纳 2011-08-22 00:57:46

解决方案1
4 已采纳 2011-08-22 00:57:46