[英]Jsoup.parse() vs. Jsoup.parse() - or How does URL detection work in Jsoup?
Jsoup has 2 html parse() methods: Jsoup有2个html parse()方法:
I am having a difficulty understanding the meaning of the difference between the two: 我很难理解两者之间差异的含义 :
parse()
version, what does "resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href>
tag" mean? parse()
版本中,“解析绝对URL的相对URL,在HTML声明<base href>
标记之前发生的”是什么意思? What if a <base href>
tag never occurs in the page? <base href>
标签永远不会出现在页面中怎么办? baseUri
the full URL of HTML page (as phrased in original documentation) or is it the base URL of the HTML page? baseUri
是HTML页面的完整URL(如原始文档中所述)还是HTML页面的基本 URL? It's used for among others Element#absUrl()
so that you can retrieve the (intended) absolute URL of an <a href>
, <img src>
, <link href>
, <script src>
, etc. Eg 它用于
Element#absUrl()
以便您可以检索<a href>
, <img src>
, <link href>
, <script src>
等的(预期)绝对URL。例如
for (Element link : document.select("a")) {
System.out.println(link.absUrl("href"));
}
This is very useful if you want to download and/or parse the linked resources as well. 如果您还想下载和/或解析链接的资源,这非常有用。
In the 2nd parse() version, what does "resolve relative URLs to absolute URLs, that occur before the HTML declares a
<base href>
tag" mean?在第二个parse()版本中,“解析绝对URL的相对URL,在HTML声明
<base href>
标记之前发生的”是什么意思? What if a<base href>
tag never occurs in the page?如果
<base href>
标签永远不会出现在页面中怎么办?
Some (poor) websites may have declared a <link>
or <script>
with a relative URL before the <base>
tag. 某些(差)网站可能已在
<base>
标记之前声明了<link>
或<script>
以及相对URL。 Or if there is no means of a <base>
tag, then just the given baseUri
will be used for resolving relative URLs of the entire document. 或者,如果没有
<base>
标记的方法,那么只有给定的baseUri
将用于解析整个文档的相对URL。
What is the purpose of absolute URL detection?
绝对URL检测的目的是什么? Why does Jsoup need to find the absolute URL?
为什么Jsoup需要找到绝对URL?
In order to return the right URL on Element#absUrl()
. 为了在
Element#absUrl()
上返回正确的URL。 This is purely for enduser's convenience. 这纯粹是为了最终用户的便利。 Jsoup doesn't need it in order to successfully parse the HTML at its own.
Jsoup不需要它来成功解析HTML。
Lastly, but most importantly: Is baseUri the full URL of HTML page (as phrased in original documentation) or is it the base URL of the HTML page?
最后,但最重要的是:baseUri是HTML页面的完整URL(如原始文档中所述)还是HTML页面的基本URL?
The former. 前者。 If the latter, then documentation would be lying.
如果是后者,那么文件就会撒谎。 The
baseUri
must not to be confused with <base href>
. baseUri
不得与<base href>
混淆。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.